1959 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Jakub Kicinski
|
c6f9b7138b |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZTp12QAKCRDbK58LschI g8BrAQDifqp5liEEdXV8jdReBwJtqInjrL5tzy5LcyHUMQbTaAEA6Ph3Ct3B+3oA mFnIW/y6UJiJrby0Xz4+vV5BXI/5WQg= =pLCV -----END PGP SIGNATURE----- Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
ec4c20ca09 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Daniel Borkmann
|
5c1b994de4 |
tools: Sync if_link uapi header
Sync if_link uapi header to the latest version as we need the refresher in tooling for netkit device. Given it's been a while since the last sync and the diff is fairly big, it has been done as its own commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Daniel Borkmann
|
35dfaad718 |
netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Raghavendra Rao Ananta
|
9f4b3273df |
tools: Import arm_pmuv3.h
Import kernel's include/linux/perf/arm_pmuv3.h, with the definition of PMEVN_SWITCH() additionally including an assert() for the 'default' case. The following patches will use macros defined in this header. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-9-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
Linus Torvalds
|
4f82870119 |
20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues
or aren't considered necessary for earlier kernel versions. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZTfz/QAKCRDdBJ7gKXxA joMyAP99hLaLYeJbjlf+4tLJzhlpbVoFra1ieun2D+ZgFE78xQD/T4T3PYrZhYqD WdrxGT9fiKOykXM5pmQRH9Zr4EvJBA0= =Obbk -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues or aren't considered necessary for earlier kernel versions" * tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: add GFP_KERNEL to allocations in mas_expected_entries() selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier mailmap: correct email aliasing for Oleksij Rempel mailmap: map Bartosz's old address to the current one mm/damon/sysfs: check DAMOS regions update progress from before_terminate() MAINTAINERS: Ondrej has moved kasan: disable kasan_non_canonical_hook() for HW tags kasan: print the original fault addr when access invalid shadow hugetlbfs: close race between MADV_DONTNEED and page fault hugetlbfs: extend hugetlb_vma_lock to private VMAs hugetlbfs: clear resv_map pointer if mmap fails mm: zswap: fix pool refcount bug around shrink_worker() mm/migrate: fix do_pages_move for compat pointers riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set riscv: handle VM_FAULT_[HWPOISON|HWPOISON_LARGE] faults instead of panicking mmap: fix error paths with dup_anon_vma() mmap: fix vma_iterator in error path of vma_merge() mm: fix vm_brk_flags() to not bail out while holding lock mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer mm/page_alloc: correct start page when guard page debug is enabled |
||
Linus Torvalds
|
84186fcb83 |
Urgent pull request for nolibc into v6.6
This pull request contains the following fixes: o tools/nolibc: i386: Fix a stack misalign bug on _start o MAINTAINERS: nolibc: update tree location o tools/nolibc: mark start_c as weak to avoid linker errors -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmUtvC8THHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jMyED/9nsHjSYKUvzdn8kb8Xjr+OkUlx6DCl ITRqAScxl/Q+TTAKTSL508b/fVB56+h0ZmqOHeV0+askVI9c3G2wmLYCJ06P1bpI siy6pqBtcaDVvU38ielbAVYtAeSahj9Jro44gCwBD9OE2TPi4ehl7PMIsX1vG39a hmlbOSw3GG6jFHc5HkTlrOiOy1UB7oIPFI7qfH0XsKJ35vvmDSWPpiHIGwZyx3iv hInVPV4kEBREAXONjru7Ginn9dnxZXFqOwQeqW3ZfYudDUHeKzLMrtYsE6pqZxRP UEyFiI6bhr2fDPoXHGYSm63OrYAm3uZ0jsgC72ZjrLH2ISY0oCuGGf6HP5SjkRfP jPcqMD5h3K+aEHLN3XZV0v3FwelE3qjHVcDQhpu+nJCNDlWK3DNMOjwadynqDOHJ 5FIbusDk5h4rCgOh607zuRPBv0EmtLw3oXGzBzLl8bqRaj58iZP1Te+tnGZEYVMj YydEqXPlKWNaSeBL82gyCpWneYT+vdMjJdbl9b/EKXQxogYLFx4yC4z3h+7K7G56 InDXbxBu0BIMYMgJTWn7nsBgdjenho4PUrper3v6VMr6TKuXuEzmhpqgovaHLM1g ITmdl/+ExPBUBI8u2s1qqLIdEFe5SXkOhFpYnC1E7RsS+GRCho9XCmtfcaPWfiU5 KcFVXKBlIJ4cNQ== =PDcP -----END PGP SIGNATURE----- Merge tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc fixes from Paul McKenney: - tools/nolibc: i386: Fix a stack misalign bug on _start - MAINTAINERS: nolibc: update tree location - tools/nolibc: mark start_c as weak to avoid linker errors * tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/nolibc: mark start_c as weak MAINTAINERS: nolibc: update tree location tools/nolibc: i386: Fix a stack misalign bug on _start |
||
Breno Leitao
|
ba6e0e5cb5 |
selftests/net: Extract uring helpers to be reusable
Instead of defining basic io_uring functions in the test case, move them to a common directory, so, other tests can use them. This simplify the test code and reuse the common liburing infrastructure. This is basically a copy of what we have in io_uring_zerocopy_tx with some minor improvements to make checkpatch happy. A follow-up test will use the same helpers in a BPF sockopt test. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231016134750.1381153-8-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Breno Leitao
|
7746a6adfc |
tools headers: Grab copy of io_uring.h
This file will be used by mini_uring.h and allow tests to run without the need of installing liburing to run the tests. This is needed to run io_uring tests in BPF, such as (tools/testing/selftests/bpf/prog_tests/sockopt.c). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231016134750.1381153-7-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
David Laight
|
598f0ac150 |
compiler.h: move __is_constexpr() to compiler.h
Prior to f747e6667ebb2 __is_constexpr() was in its only user minmax.h. That commit moved it to const.h - but that file just defines ULL(x) and UL(x) so that constants can be defined for .S and .c files. So apart from the word 'const' it wasn't really a good location. Instead move the definition to compiler.h just before the similar is_signed_type() and is_unsigned_type(). This may not be a good long-term home, but the three definitions belong together. Link: https://lkml.kernel.org/r/2a6680bbe2e84459816a113730426782@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Muhammad Usama Anjum
|
b58aa0f4fe |
tools headers UAPI: update linux/fs.h with the kernel sources
New IOCTL and macros has been added in the kernel sources. Update the tools header file as well. Link: https://lkml.kernel.org/r/20230821141518.870589-5-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrei Vagin <avagin@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Miroslaw <emmir@google.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul Gofman <pgofman@codeweavers.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Andrew Morton
|
5ef8f1b2b4 | Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. | ||
Liam R. Howlett
|
099d7439ce |
maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
Users complained about OOM errors during fork without triggering compaction. This can be fixed by modifying the flags used in mas_expected_entries() so that the compaction will be triggered in low memory situations. Since mas_expected_entries() is only used during fork, the extra argument does not need to be passed through. Additionally, the two test_maple_tree test cases and one benchmark test were altered to use the correct locking type so that allocations would not trigger sleeping and thus fail. Testing was completed with lockdep atomic sleep detection. The additional locking change requires rwsem support additions to the tools/ directory through the use of pthreads pthread_rwlock_t. With this change test_maple_tree works in userspace, as a module, and in-kernel. Users may notice that the system gave up early on attempting to start new processes instead of attempting to reclaim memory. Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4 Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com> Cc: <jason.sim@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Adrian Hunter
|
a91c987254 |
perf tools: Add get_unaligned_leNN()
Add get_unaligned_le16(), get_unaligned_le32 and get_unaligned_le64, same as include/asm-generic/unaligned.h. And add include/asm-generic/unaligned.h to check-headers.sh bringing tools/include/asm-generic/unaligned.h up to date so that the kernel and tools versions match. Use diagnostic pragmas to ignore -Wpacked used by perf build. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-2-adrian.hunter@intel.com Link: https://lore.kernel.org/r/20231010142234.20061-1-adrian.hunter@intel.com [ squashed check-header.sh addition ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Jakub Kicinski
|
a3c2dd9648 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZS1d4wAKCRDbK58LschI g4DSAP441CdKh8fd+wNKUSKHFbpCQ6EvocR6Nf+Sj2DFUx/w/QEA7mfju7Abqjc3 xwDEx0BuhrjMrjV5MmEpxc7lYl9XcQU= =vuWk -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-16 We've added 90 non-merge commits during the last 25 day(s) which contain a total of 120 files changed, 3519 insertions(+), 895 deletions(-). The main changes are: 1) Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs, from Jiri Olsa. 2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is for systemd to reimplement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services, from Daan De Meyer. 3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich. 4) Improve BPF verifier log output for scalar registers to better disambiguate their internal state wrt defaults vs min/max values matching, from Andrii Nakryiko. 5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving the source IP address with a new BPF_FIB_LOOKUP_SRC flag, from Martynas Pumputis. 6) Add support for open-coded task_vma iterator to help with symbolization for BPF-collected user stacks, from Dave Marchevsky. 7) Add libbpf getters for accessing individual BPF ring buffers which is useful for polling them individually, for example, from Martin Kelly. 8) Extend AF_XDP selftests to validate the SHARED_UMEM feature, from Tushar Vyavahare. 9) Improve BPF selftests cross-building support for riscv arch, from Björn Töpel. 10) Add the ability to pin a BPF timer to the same calling CPU, from David Vernet. 11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments, from Alexandre Ghiti. 12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen. 13) Fix bpftool's skeleton code generation to guarantee that ELF data is 8 byte aligned, from Ian Rogers. 14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4 security mitigations in BPF verifier, from Yafang Shao. 15) Annotate struct bpf_stack_map with __counted_by attribute to prepare BPF side for upcoming __counted_by compiler support, from Kees Cook. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits) bpf: Ensure proper register state printing for cond jumps bpf: Disambiguate SCALAR register state output in verifier logs selftests/bpf: Make align selftests more robust selftests/bpf: Improve missed_kprobe_recursion test robustness selftests/bpf: Improve percpu_alloc test robustness selftests/bpf: Add tests for open-coded task_vma iter bpf: Introduce task_vma open-coded iterator kfuncs selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c bpf: Don't explicitly emit BTF for struct btf_iter_num bpf: Change syscall_nr type to int in struct syscall_tp_t net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set bpf: Avoid unnecessary audit log for CPU security mitigations selftests/bpf: Add tests for cgroup unix socket address hooks selftests/bpf: Make sure mount directory exists documentation/bpf: Document cgroup unix socket address hooks bpftool: Add support for cgroup unix socket address hooks libbpf: Add support for cgroup unix socket address hooks bpf: Implement cgroup sockaddr hooks for unix sockets bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf bpf: Propagate modified uaddrlen from cgroup sockaddr programs ... ==================== Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Thomas Weißschuh
|
63aa531716 |
tools/nolibc: add support for constructors and destructors
With the startup code moved to C, implementing support for constructors and deconstructors is fairly easy to implement. Examples for code size impact: text data bss dec hex filename 21837 104 88 22029 560d nolibc-test.before 22135 120 88 22343 5747 nolibc-test.after 21970 104 88 22162 5692 nolibc-test.after-only-crt.h-changes The sections are defined by [0]. [0] https://refspecs.linuxfoundation.org/elf/gabi4+/ch5.dynamic.html Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20231007-nolibc-constructors-v2-1-ef84693efbc1@weissschuh.net/ |
||
Thomas Weißschuh
|
eaa8c9a8b4 |
tools/nolibc: automatically detect necessity to use pselect6
We can automatically detect if pselect6 is needed or not from the kernel headers. This removes the need to manually specify it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-4-03863d509b9a@weissschuh.net |
||
Thomas Weißschuh
|
e7b28f2516 |
tools/nolibc: don't define new syscall number
All symbols created by nolibc are also visible to user code. Syscall constants are expected to come from the kernel headers and should not be made up by nolibc. Refactor the logic to avoid defining syscall numbers. Also the new code is easier to understand. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-3-03863d509b9a@weissschuh.net |
||
Thomas Weißschuh
|
535b70c143 |
tools/nolibc: avoid unused parameter warnings for ENOSYS fallbacks
The ENOSYS fallback code does not use its functions parameters. This can lead to compiler warnings about unused parameters. Explicitly avoid these warnings. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-2-03863d509b9a@weissschuh.net |
||
Ammar Faizi
|
bc61614de0 |
tools/nolibc: string: Remove the _nolibc_memcpy_up() function
This function is only called by memcpy(), there is no real reason to have this wrapper. Delete this function and move the code to memcpy() directly. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
Ammar Faizi
|
5dfc79b20e |
tools/nolibc: string: Remove the _nolibc_memcpy_down() function
This nolibc internal function is not used. Delete it. It was probably supposed to handle memmove(), but today the memmove() has its own implementation. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
Ammar Faizi
|
12108aa8c1 |
tools/nolibc: x86-64: Use rep stosb for memset()
Simplify memset() on the x86-64 arch. The x86-64 arch has a 'rep stosb' instruction, which can perform memset() using only a single instruction, given: %al = value (just like the second argument of memset()) %rdi = destination %rcx = length Before this patch: ``` 00000000000010c9 <memset>: 10c9: 48 89 f8 mov %rdi,%rax 10cc: 48 85 d2 test %rdx,%rdx 10cf: 74 0e je 10df <memset+0x16> 10d1: 31 c9 xor %ecx,%ecx 10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1) 10d7: 48 ff c1 inc %rcx 10da: 48 39 ca cmp %rcx,%rdx 10dd: 75 f4 jne 10d3 <memset+0xa> 10df: c3 ret ``` After this patch: ``` 0000000000001511 <memset>: 1511: 96 xchg %eax,%esi 1512: 48 89 d1 mov %rdx,%rcx 1515: 57 push %rdi 1516: f3 aa rep stos %al,%es:(%rdi) 1518: 58 pop %rax 1519: c3 ret ``` v2: - Use pushq %rdi / popq %rax (Alviro). - Use xchg %eax, %esi (Willy). Link: https://lore.kernel.org/lkml/ZO9e6h2jjVIMpBJP@1wt.eu Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Suggested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
Ammar Faizi
|
553845eebd |
tools/nolibc: x86-64: Use rep movsb for memcpy() and memmove()
Simplify memcpy() and memmove() on the x86-64 arch. The x86-64 arch has a 'rep movsb' instruction, which can perform memcpy() using only a single instruction, given: %rdi = destination %rsi = source %rcx = length Additionally, it can also handle the overlapping case by setting DF=1 (backward copy), which can be used as the memmove() implementation. Before this patch: ``` 00000000000010ab <memmove>: 10ab: 48 89 f8 mov %rdi,%rax 10ae: 31 c9 xor %ecx,%ecx 10b0: 48 39 f7 cmp %rsi,%rdi 10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx 10b7: 48 85 d2 test %rdx,%rdx 10ba: 74 25 je 10e1 <memmove+0x36> 10bc: 48 83 c9 01 or $0x1,%rcx 10c0: 48 39 f0 cmp %rsi,%rax 10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 10ca: 48 0f 43 fa cmovae %rdx,%rdi 10ce: 48 01 cf add %rcx,%rdi 10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b 10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1) 10d9: 48 01 cf add %rcx,%rdi 10dc: 48 ff ca dec %rdx 10df: 75 f0 jne 10d1 <memmove+0x26> 10e1: c3 ret 00000000000010e2 <memcpy>: 10e2: 48 89 f8 mov %rdi,%rax 10e5: 48 85 d2 test %rdx,%rdx 10e8: 74 12 je 10fc <memcpy+0x1a> 10ea: 31 c9 xor %ecx,%ecx 10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil 10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1) 10f4: 48 ff c1 inc %rcx 10f7: 48 39 ca cmp %rcx,%rdx 10fa: 75 f0 jne 10ec <memcpy+0xa> 10fc: c3 ret ``` After this patch: ``` // memmove is an alias for memcpy 000000000040133b <memcpy>: 40133b: 48 89 d1 mov %rdx,%rcx 40133e: 48 89 f8 mov %rdi,%rax 401341: 48 89 fa mov %rdi,%rdx 401344: 48 29 f2 sub %rsi,%rdx 401347: 48 39 ca cmp %rcx,%rdx 40134a: 72 03 jb 40134f <memcpy+0x14> 40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40134e: c3 ret 40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi 401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi 401359: fd std 40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40135c: fc cld 40135d: c3 ret ``` v3: - Make memmove as an alias for memcpy (Willy). - Make the forward copy the likely case (Alviro). v2: - Fix the broken memmove implementation (David). Link: https://lore.kernel.org/lkml/20230902062237.GA23141@1wt.eu Link: https://lore.kernel.org/lkml/5a821292d96a4dbc84c96ccdc6b5b666@AcuMS.aculab.com Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
Thomas Weißschuh
|
b56a9492d0 |
tools/nolibc: add stdarg.h header
This allows nolic to work with `-nostdinc` avoiding any reliance on system headers. The implementation has been lifted from musl libc 1.2.4. There is already an implementation of stdarg.h in include/linux/stdarg.h but that is GPL licensed and therefore not suitable for nolibc. The used compiler builtins have been validated to be at least available since GCC 4.1.2 and clang 3.0.0. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> |
||
Thomas Weißschuh
|
921992229b |
tools/nolibc: mark start_c as weak
Otherwise the different instances of _start_c from each compilation unit will lead to linker errors: /usr/bin/ld: /tmp/ccSNvRqs.o: in function `_start_c': nolibc-test-foo.c:(.text.nolibc_memset+0x9): multiple definition of `_start_c'; /tmp/ccG25101.o:nolibc-test.c:(.text+0x1ea3): first defined here Fixes: 17336755150b ("tools/nolibc: add new crt.h with _start_c") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/20231012-nolibc-start_c-multiple-v1-1-fbfc73e0283f@weissschuh.net/ Link: https://lore.kernel.org/lkml/20231012-nolibc-linkage-test-v1-1-315e682768b4@weissschuh.net/ Acked-by: Willy Tarreau <w@1wt.eu> |
||
Ammar Faizi
|
d873a364ef |
tools/nolibc: i386: Fix a stack misalign bug on _start
The ABI mandates that the %esp register must be a multiple of 16 when executing a 'call' instruction. Commit 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") simplified the _start function, but it didn't take care of the %esp alignment, causing SIGSEGV on SSE and AVX programs that use aligned move instruction (e.g., movdqa, movaps, and vmovdqa). The 'and $-16, %esp' aligns the %esp at a multiple of 16. Then 'push %eax' will subtract the %esp by 4; thus, it breaks the 16-byte alignment. Make sure the %esp is correctly aligned after the push by subtracting 12 before the push. Extra: Add 'add $12, %esp' before the 'and $-16, %esp' to avoid over-estimating for particular cases as suggested by Willy. A test program to validate the %esp alignment on _start can be found at: https://lore.kernel.org/lkml/ZOoindMFj1UKqo+s@biznet-home.integral.gnuweeb.org [ Thomas: trim Fixes tag commit id ] Cc: Zhangjin Wu <falcon@tinylab.org> Fixes: 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") Reported-by: Nicholas Rosenberg <inori@vnlx.org> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
Daan De Meyer
|
859051dd16 |
bpf: Implement cgroup sockaddr hooks for unix sockets
These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Martynas Pumputis
|
dab4e1f06c |
bpf: Derive source IP addr via bpf_*_fib_lookup()
Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
David Vernet
|
d6247ecb6c |
bpf: Add ability to pin bpf timer to calling CPU
BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com |
||
Florent Revest
|
24e41bf8a6 |
mm: add a NO_INHERIT flag to the PR_SET_MDWE prctl
This extends the current PR_SET_MDWE prctl arg with a bit to indicate that the process doesn't want MDWE protection to propagate to children. To implement this no-inherit mode, the tag in current->mm->flags must be absent from MMF_INIT_MASK. This means that the encoding for "MDWE but without inherit" is different in the prctl than in the mm flags. This leads to a bit of bit-mangling in the prctl implementation. Link: https://lkml.kernel.org/r/20230828150858.393570-6-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexey Izbyshev <izbyshev@ispras.ru> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ayush Jain <ayush.jain3@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: Topi Miettinen <toiwoton@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Florent Revest
|
0da668333f |
mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
Defining a prctl flag as an int is a footgun because on a 64 bit machine and with a variadic implementation of prctl (like in musl and glibc), when used directly as a prctl argument, it can get casted to long with garbage upper bits which would result in unexpected behaviors. This patch changes the constant to an unsigned long to eliminate that possibilities. This does not break UAPI. I think that a stable backport would be "nice to have": to reduce the chances that users build binaries that could end up with garbage bits in their MDWE prctl arguments. We are not aware of anyone having yet encountered this corner case with MDWE prctls but a backport would reduce the likelihood it happens, since this sort of issues has happened with other prctls. But If this is perceived as a backporting burden, I suppose we could also live without a stable backport. Link: https://lkml.kernel.org/r/20230828150858.393570-5-revest@chromium.org Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl") Signed-off-by: Florent Revest <revest@chromium.org> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ayush Jain <ayush.jain3@amd.com> Cc: Greg Thelen <gthelen@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: Topi Miettinen <toiwoton@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Jakub Kicinski
|
2606cf059c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Sohil Mehta
|
ccab211af3 |
syscalls: Cleanup references to sys_lookup_dcookie()
commit 'be65de6b03aa ("fs: Remove dcookies support")' removed the syscall definition for lookup_dcookie. However, syscall tables still point to the old sys_lookup_dcookie() definition. Update syscall tables of all architectures to directly point to sys_ni_syscall() instead. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> # for perf Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
Linus Torvalds
|
5c519bc075 |
perf tools fixes for v6.6: 1st batch
Build: - Update header files in the tools/**/include directory to sync with the kernel sources as usual. - Remove unused bpf-prologue files. While it's not strictly a fix, but the functionality was removed in this cycle so better to get rid of the code together. - Other minor build fixes. Misc: - Fix uninitialized memory access in PMU parsing code - Fix segfaults on software event Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZRIFKAAKCRCMstVUGiXM g/pXAP9HLB2s+beBTK5iQU4/NfqmAVSl303QCoR9xLByo38vfAEAlLiRIh061pTi PRlXVuY9bUQPyCSYsiBHv/fmLqdQdwU= =ti6G -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Build: - Update header files in the tools/**/include directory to sync with the kernel sources as usual. - Remove unused bpf-prologue files. While it's not strictly a fix, but the functionality was removed in this cycle so better to get rid of the code together. - Other minor build fixes. Misc: - Fix uninitialized memory access in PMU parsing code - Fix segfaults on software event" * tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf jevent: fix core dump on software events on s390 perf pmu: Ensure all alias variables are initialized perf jevents metric: Fix type of strcmp_cpuid_str perf trace: Avoid compile error wrt redefining bool perf bpf-prologue: Remove unused file tools headers UAPI: Update tools's copy of drm.h headers tools arch x86: Sync the msr-index.h copy with the kernel sources perf bench sched-seccomp-notify: Use the tools copy of seccomp.h UAPI tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources perf tools: Update copy of libbpf's hashmap.c |
||
Jiri Olsa
|
3acf8ace68 |
bpf: Add missed value to kprobe perf link info
Add missed value to kprobe attached through perf link info to hold the stats of missed kprobe handler execution. The kprobe's missed counter gets incremented when kprobe handler is not executed due to another kprobe running on the same cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230920213145.1941596-4-jolsa@kernel.org |
||
Jiri Olsa
|
e2b2cd592a |
bpf: Add missed value to kprobe_multi link info
Add missed value to kprobe_multi link info to hold the stats of missed kprobe_multi probe. The missed counter gets incremented when fprobe fails the recursion check or there's no rethook available for return probe. In either case the attached bpf program is not executed. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Song Liu <song@kernel.org> Reviewed-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230920213145.1941596-3-jolsa@kernel.org |
||
Paolo Abeni
|
e9cbc89067 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Linus Torvalds
|
27bbf45eae |
Networking fixes for 6.6-rc2, including fixes from netfilter and bpf
Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUMFG8SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOksHAP+QE2eNf5yxo86dIS+3RQnOQ8kFBnNbEn 04lrheGnzG7PpNnGoCoTZna+xYQPYVLgbmmip2/CFnQvnQIsKyLQfCui85sfV2V9 KjUeE/kTgeC+jUQOWNDyz3zDP/MPC2LmiK8Gwyggvm9vFYn5tVZXC36aPZBZ7Vok /DUW6iXyl31SeVGOOEKakcwn0GIYJSABhVFNsjrDe4tV+leUwvf8obAq3ZWxOGaU D94ez28lSXgfOSWfQQ/l1rHI/yC0fr8HYyWJ60dNG2uS3fNEqT8LyqZfAUK24kVz XbAGZa+GA7CDq3cVsU7vCWNWbB5fO+kXtmGOwPtuKtJQM5LPo4X77CuSHlpzdyvq TuW0vxeVfdzAYVb3Zg+2QgWxDJjY0B8ujwdDWrnnKTPu4Ylhn6HLISXIlkMBoGwT 1/47TCnmn9t+lGagkMADppRRnJotHWObQG5wkzksqVa2CUB0HTESgbrm4rsxe6Ku JiZhHbTiiPWy7LgY6EFtj/YGPvLs0CSltvh4QUsd+QtDTM/EN7y3HcHqkv88ropG bSvJIh6WXdEJkwfSUdA0LECXSC6dizzZW2Y1glnT+7FMlhE1jVY4gruNJ37mCYMb 0gh9Zr76c2KYLA5vljGp6uo3j3A7wARJTdLfRFVcaFoz6NQmuFf9ZdBfDNDcymxs AGvO3j55JAZf =AoVg -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq|txq_stats reference ... |
||
Nick Desaulniers
|
c0bb9fb0e5 |
bpf: Fix BTF_ID symbol generation collision in tools/
Marcus and Satya reported an issue where BTF_ID macro generates same symbol in separate objects and that breaks final vmlinux link. ld.lld: error: ld-temp.o <inline asm>:14577:1: symbol '__BTF_ID__struct__cgroup__624' is already defined This can be triggered under specific configs when __COUNTER__ happens to be the same for the same symbol in two different translation units, which is already quite unlikely to happen. Add __LINE__ number suffix to make BTF_ID symbol more unique, which is not a complete fix, but it would help for now and meanwhile we can work on better solution as suggested by Andrii. Cc: stable@vger.kernel.org Reported-by: Satya Durga Srinivasu Prabhala <quic_satyap@quicinc.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://github.com/ClangBuiltLinux/linux/issues/1913 Debugged-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bzb5KQ2_LmhN769ifMeSJaWfebccUasQOfQKaOd0nQ51tw@mail.gmail.com/ Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230915-bpf_collision-v3-2-263fc519c21f@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Stanislav Fomichev
|
a9c2a60854 |
bpf: expose information about supported xdp metadata kfunc
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Mike Rapoport (IBM)
|
55122e0130 |
memblock tests: fix warning ‘struct seq_file’ declared inside parameter list
Building memblock tests produces the following warning: cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c In file included from tests/common.h:9, from tests/basic_api.h:5, from main.c:2: ./linux/memblock.h:601:50: warning: ‘struct seq_file’ declared inside parameter list will not be visible outside of this definition or declaration 601 | static inline void memtest_report_meminfo(struct seq_file *m) { } | ^~~~~~~~ Add declaration of 'struct seq_file' to tools/include/linux/seq_file.h to fix it. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> |
||
Mike Rapoport (IBM)
|
5e1bffbdb6 |
memblock tests: fix warning: "__ALIGN_KERNEL" redefined
Building memblock tests produces the following warning: cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c In file included from ../../include/linux/pfn.h:5, from ./linux/memory_hotplug.h:6, from ./linux/init.h:7, from ./linux/memblock.h:11, from tests/common.h:8, from tests/basic_api.h:5, from main.c:2: ../../include/linux/mm.h:14: warning: "__ALIGN_KERNEL" redefined 14 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1) | In file included from ../../include/linux/mm.h:6, from ../../include/linux/pfn.h:5, from ./linux/memory_hotplug.h:6, from ./linux/init.h:7, from ./linux/memblock.h:11, from tests/common.h:8, from tests/basic_api.h:5, from main.c:2: ../../include/uapi/linux/const.h:31: note: this is the location of the previous definition 31 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (__typeof__(x))(a) - 1) | Remove definitions of __ALIGN_KERNEL and __ALIGN_KERNEL_MASK from tools/include/linux/mm.h to fix it. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> |
||
Rong Tao
|
4b2d631236 |
memblock tests: Fix compilation errors.
This patch fix the follow errors. commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") pass nid parameter to reserve_bootmem_region(), $ make -C tools/testing/memblock/ ... memblock.c: In function ‘memmap_init_reserved_pages’: memblock.c:2111:25: error: too many arguments to function ‘reserve_bootmem_region’ 2111 | reserve_bootmem_region(start, end, nid); | ^~~~~~~~~~~~~~~~~~~~~~ ../../include/linux/mm.h:32:6: note: declared here 32 | void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); | ^~~~~~~~~~~~~~~~~~~~~~ memblock.c:2122:17: error: too many arguments to function ‘reserve_bootmem_region’ 2122 | reserve_bootmem_region(start, end, nid); | ^~~~~~~~~~~~~~~~~~~~~~ commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") call accept_memory() in memblock.c $ make -C tools/testing/memblock/ ... cc -fsanitize=address -fsanitize=undefined main.o memblock.o \ lib/slab.o mmzone.o slab.o tests/alloc_nid_api.o \ tests/alloc_helpers_api.o tests/alloc_api.o tests/basic_api.o \ tests/common.o tests/alloc_exact_nid_api.o -o main /usr/bin/ld: memblock.o: in function `memblock_alloc_range_nid': memblock.c:(.text+0x7ae4): undefined reference to `accept_memory' Signed-off-by: Rong Tao <rongtao@cestc.cn> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Link: https://lore.kernel.org/r/tencent_6F19BC082167F15DF2A8D8BEFE8EF220F60A@qq.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> |
||
Arnaldo Carvalho de Melo
|
c2122b687c |
tools headers UAPI: Update tools's copy of drm.h headers
Picking the changes from: ad9ee11fdf113f96 ("drm/doc: document that PRIME import/export is always supported") 2ff4f6d410afa762 ("drm/doc: document drm_event and its types") 9a2eabf48ade4fba ("drm/doc: use proper cross-references for sections") c7a4722971691562 ("drm/syncobj: add IOCTL to register an eventfd") Addressing these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated tables will be able to translate this new ioctl code into a string: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after --- before 2023-09-13 08:54:45.170134002 -0300 +++ after 2023-09-13 08:55:06.612712776 -0300 @@ -108,6 +108,7 @@ [0xCC] = "SYNCOBJ_TRANSFER", [0xCD] = "SYNCOBJ_TIMELINE_SIGNAL", [0xCE] = "MODE_GETFB2", + [0xCF] = "SYNCOBJ_EVENTFD", [DRM_COMMAND_BASE + 0x00] = "I915_INIT", [DRM_COMMAND_BASE + 0x01] = "I915_FLUSH", [DRM_COMMAND_BASE + 0x02] = "I915_FLIP", $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Simon Ser <contact@emersion.fr> Link: https://lore.kernel.org/lkml/ZQGkh9qlhpKA%2FSMY@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
417ecb614f |
tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems
The new 'perf bench' for sched-seccomp-notify uses defines and types not available in older systems where we want to have perf available, so grab a copy of this UAPI from the kernel sources to allow that. This will be checked in the future for drift from the original when we build the perf tool, that will warn when that happens like: make: Entering directory '/var/home/acme/git/perf-tools/tools/perf' BUILD: Doing 'make -j32' parallel build Warning: Kernel ABI header differences: Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZQGhMXtwX7RvV3ya@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
f7875966dc |
tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources
To pick the changes in these csets: c35559f94ebc3e3b ("x86/shstk: Introduce map_shadow_stack syscall") 78252deb023cf087 ("arch: Register fchmodat2, usually as syscall 452") That add support for this new syscall in tools such as 'perf trace'. For instance, this is now possible: # perf trace -v -e fchmodat*,map_shadow_stack --max-events=4 Using CPUID AuthenticAMD-25-21-0 Reusing "openat" BPF sys_enter augmenter for "fchmodat" event qualifier tracepoint filter: (common_pid != 3499340 && common_pid != 11259) && (id == 268 || id == 452 || id == 453) ^C# And it'll work as with other syscalls, for instance openat: # perf trace -e openat* --max-events=4 0.000 ( 0.015 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 0.068 ( 0.019 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 0.119 ( 0.008 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.current", flags: RDONLY|CLOEXEC) = 11 0.138 ( 0.006 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.min", flags: RDONLY|CLOEXEC) = 11 # That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -E fchmodat\|sys_map_shadow_stack tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:452 n64 fchmodat2 sys_fchmodat2 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat tools/perf/arch/s390/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat sys_fchmodat tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:453 64 map_shadow_stack sys_map_shadow_stack $ $ grep -Ew map_shadow_stack\|fchmodat2 /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c [452] = "fchmodat2", [453] = "map_shadow_stack", $ This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/lkml/ZP8bE7aXDBu%2Fdrak@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Stanislav Fomichev
|
7cb779a686 |
bpf: Clarify error expectations from bpf_clone_redirect
Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped packets") exposed the fact that bpf_clone_redirect is capable of returning raw NET_XMIT_XXX return codes. This is in the conflict with its UAPI doc which says the following: "0 on success, or a negative error in case of failure." Update the UAPI to reflect the fact that bpf_clone_redirect can return positive error numbers, but don't explicitly define their meaning. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com |
||
Yonghong Song
|
9bc95a95ab |
bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr' can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated. Also make changes in selftests/bpf/test_bpftool_synctypes.py and selftest libbpf_str to fix otherwise test errors. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
bd6c11bc43 |
Networking changes for 6.6.
Core ---- - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations. - Store netdevs in an xarray, to simplify iterating over netdevs. - Refactor nexthop selection for multipath routes. - Improve sched class lifetime handling. - Add backup nexthop ID support for bridge. - Implement drop reasons support in openvswitch. - Several data races annotations and fixes. - Constify the sk parameter of routing functions. - Prepend kernel version to netconsole message. Protocols --------- - Implement support for TCP probing the peer being under memory pressure. - Remove hard coded limitation on IPv6 specific info placement inside the socket struct. - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor. - Scaling-up the IPv6 expired route GC via a separated list of expiring routes. - In-kernel support for the TLS alert protocol. - Better support for UDP reuseport with connected sockets. - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size. - Get rid of additional ancillary per MPTCP connection struct socket. - Implement support for BPF-based MPTCP packet schedulers. - Format MPTCP subtests selftests results in TAP. - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation. BPF --- - Multi-buffer support in AF_XDP. - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds. - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it. - Add SO_REUSEPORT support for TC bpf_sk_assign. - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64. - Support defragmenting IPv(4|6) packets in BPF. - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling. - Introduce bpf map element count and enable it for all program types. - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy. - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress. - Add uprobe support for the bpf_get_func_ip helper. - Check skb ownership against full socket. - Support for up to 12 arguments in BPF trampoline. - Extend link_info for kprobe_multi and perf_event links. Netfilter --------- - Speed-up process exit by aborting ruleset validation if a fatal signal is pending. - Allow NLA_POLICY_MASK to be used with BE16/BE32 types. Driver API ---------- - Page pool optimizations, to improve data locality and cache usage. - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers. - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info. - Extend and use the yaml devlink specs to [re]generate the split ops. - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes. - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool. - Remove phylink legacy mode support. - Support offload LED blinking to phy. - Add devlink port function attributes for IPsec. New hardware / drivers ---------------------- - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers ------- - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP DCW8VGFOZjZM =9CsR -----END PGP SIGNATURE----- Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ... |
||
Linus Torvalds
|
1c59d38339 |
linux-kselftest-nolibc-6.6-rc1
This nolibc update for Linux 6.6-rc1 consists of: Nolibc: - improved portability by removing build errors with -ENOSYS - added syscall6() on MIPS to support pselect6() and mmap() - added setvbuf(), rmdir(), pipe(), pipe2() - add support for ppc/ppc64 - environ is no longer optional - fixed frame pointer issues at -O0 - dropped sys_stat() in favor of sys_statx() - centralized _start_c() to remove lots of asm code - switched size_t to __SIZE_TYPE__ Selftests: - improved status reporting (success/warning/failure counts, path to log file) - various code cleanups (indent, unused variables, ...) - more consistent test numbering - enabled compiler warnings - dropped unreliable chmod_net test - improved reliability (create /dev/zero & /tmp, rely less on /proc) - new tests (brk/sbrk/mmap/munmap) - improved compatibility with musl - new run-nolibc-test target to build and run natively - new run-libc-test target to build and run against native libc - made the cmdline parser more reliable against boolean arguments - dropped dependency on memfd for vfprintf() test - nolibc-test is no longer stripped - added support for extending ARCH via XARCH Other: - add Thomas as co-maintainer -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmTs4xUACgkQCwJExA0N QxyVjg//XGpM5P8JXfn2lsLXa+vPOkIKCY26RRTLUb8TJygTLpQY/aOh98pKCGbr naudKubMLlhYhS7HSJXvH6rNkidsU7YiW8RTbHICHsB1Lqbqic2C8y34EOBJpalN mYfidcwVdmUeJLBEiL5W4T5ip2FIHy5lhN4NeN7x0bHBkcXvkzL8/7QngjPB7Nmy ibyPmtv/1m83+jmAOzDgNHJPIc0utjGynkKWBAd7SWClCWvS+22V5YgqCHaXN+1R 5RMDlJ+AqknniEmUouF7qfsbAI02uhKy8Gpmju6ghLjqN/4V3NNSuftXXT+x1L+x L277L5s/RNfaXgzaXoUfIdnFeHhDhL0dJxBek/5LZdCy4rQWbn3VN1VI1NtGDnM1 BTHdDNirqxkKsXleSi3zXQptxvGLeP/E0loNd74ZlmOqNhZdt3bQdv53evDBSjGa UvCfGC3r+AyKsxzXUbz+fPuQ+9yw5g1VJBKMol7lbiBmmYb3y7nXj2nPTkRU84Ge twRo2ctF62a2d9hw/U/D7hFGYnzOosl3NmbrzAKWAGqXQ5jpWPbfeU/7bjVeW7d1 cuTGJtPHSVtCTaMnktHT+F9+Uh9SOFtUj/WMESvu6Y7iQRERqoXcHJuBhrms6r7S DGKxHuVwopcuGXdNoKI/nyT4lApee33S5WcKMV2t85eoXAfe4EY= =OCH0 -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull nolibc updates from Shuah Khan: "Nolibc: - improved portability by removing build errors with -ENOSYS - added syscall6() on MIPS to support pselect6() and mmap() - added setvbuf(), rmdir(), pipe(), pipe2() - add support for ppc/ppc64 - environ is no longer optional - fixed frame pointer issues at -O0 - dropped sys_stat() in favor of sys_statx() - centralized _start_c() to remove lots of asm code - switched size_t to __SIZE_TYPE__ Selftests: - improved status reporting (success/warning/failure counts, path to log file) - various code cleanups (indent, unused variables, ...) - more consistent test numbering - enabled compiler warnings - dropped unreliable chmod_net test - improved reliability (create /dev/zero & /tmp, rely less on /proc) - new tests (brk/sbrk/mmap/munmap) - improved compatibility with musl - new run-nolibc-test target to build and run natively - new run-libc-test target to build and run against native libc - made the cmdline parser more reliable against boolean arguments - dropped dependency on memfd for vfprintf() test - nolibc-test is no longer stripped - added support for extending ARCH via XARCH Other: - add Thomas as co-maintainer" * tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (103 commits) tools/nolibc: avoid undesired casts in the __sysret() macro tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret() tools/nolibc: silence ppc64 compile warnings selftests/nolibc: libc-test: use HOSTCC instead of CC tools/nolibc: stackprotector.h: make __stack_chk_init static selftests/nolibc: allow report with existing test log selftests/nolibc: add test support for ppc64 selftests/nolibc: add test support for ppc64le selftests/nolibc: add test support for ppc selftests/nolibc: add XARCH and ARCH mapping support tools/nolibc: add support for powerpc64 tools/nolibc: add support for powerpc MAINTAINERS: nolibc: add myself as co-maintainer selftests/nolibc: enable compiler warnings selftests/nolibc: don't strip nolibc-test selftests/nolibc: prevent out of bounds access in expect_vfprintf selftests/nolibc: use correct return type for read() and write() selftests/nolibc: avoid sign-compare warnings selftests/nolibc: avoid unused parameter warnings selftests/nolibc: make functions static if possible ... |