502 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
WANG Xuerui
|
98b90ede59 |
LoongArch: Humanize the ESTAT line when showing registers
Example output looks like: [ xx.xxxxxx] ESTAT: 00001000 [INT] (IS=12 ECode=0 EsubCode=0) Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
5e3e784d35 |
LoongArch: Humanize the ECFG line when showing registers
Example output looks like: [ xx.xxxxxx] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
9718d96c03 |
LoongArch: Humanize the EUEN line when showing registers
Example output looks like: [ xx.xxxxxx] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
ce7f0b18b0 |
LoongArch: Humanize the PRMD line when showing registers
Example output looks like: [ xx.xxxxxx] PRMD: 00000004 (PPLV0 +PIE -PWE) Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
efada2afac |
LoongArch: Humanize the CRMD line when showing registers
Example output looks like: [ xx.xxxxxx] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) Some initial machinery for this pretty-printing format has been included in this patch as well. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
05fa8d4977 |
LoongArch: Fix format of CSR lines during show_regs()
Use uppercase CSR names throughout for consistency with the manual wording, and right-align the keys. The "CSR" part is inferrable from context, hence dropped for more horizontal space. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
863b3795ef |
LoongArch: Print symbol info for $ra and CSR.ERA only for kernel-mode contexts
Otherwise the addresses wouldn't make sense at all. While at it, align the "map keys" to maintain right-alignment with the "estat:" line too; also swap the ERA and ra lines so all CSRs are shown together. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
f6a79b6036 |
LoongArch: Print GPRs with ABI names when showing registers
Show PC (CSR.ERA) in place of $zero, and also show the syscall restart flag (conveniently stuffed in regs[0]) if non-zero. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
aa552254cf |
LoongArch: Define regular names for BCE/WATCH/HVC/GSPR exceptions
Define them according to the ISA manual, in order to enable matching the sub-exceptions for humanization purposes later. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
WANG Xuerui
|
9e36fa4299 |
LoongArch: Clean up the architectural interrupt definitions
While interrupts are assigned ECodes `64 + interrupt number`, all existing use sites of interrupt numbers want the 64 subtracted. Re-arrange the definitions so that the actual interrupt number is used everywhere, and make EXCCODE_INT_END inclusive as it is more intuitive that way. While at it, according to the asm/loongarch.h definitions, the total number of architectural interrupts should be 14, but various other places indicate otherwise (13 or 15). Those places have been adjusted to 14 as well for consistency. Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Uros Bizjak
|
d994f2c8e2 |
locking/arch: Wire up local_try_cmpxchg()
Implement target specific support for local_try_cmpxchg() and local_cmpxchg() using typed C wrappers that call their _local counterpart and provide additional checking of their input arguments. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230405141710.3551-4-ubizjak@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrzej Hajda
|
068550631f |
locking/arch: Rename all internal __xchg() names to __arch_xchg()
Decrease the probability of this internal facility to be used by driver code. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv] Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
f20730efbd |
SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK PA5+V7HcQRk= =Wp7f -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default |
||
Linus Torvalds
|
2aff7c706c |
Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically. - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it. - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code. - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown/panic functions. - Misc improvements & fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi fRho4CqzrtM= =NwEV -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ... |
||
Linus Torvalds
|
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ... |
||
Linus Torvalds
|
6e98b09da9 |
Networking changes for 6.4.
Core ---- - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances. - Reduce compound page head access for zero-copy data transfers. - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible. - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance. - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking. - Add lockless accesses annotation to sk_err[_soft]. - Optimize again the skb struct layout. - Extends the skb drop reasons to make it usable by multiple subsystems. - Better const qualifier awareness for socket casts. BPF --- - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses. - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward. - Add more precise memory usage reporting for all BPF map types. - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params. - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton. - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities. - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc. - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps. - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps. - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree. - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them. - Add ARM32 USDT support to libbpf. - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations. Protocols --------- - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address. - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition. - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf. - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures. - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers. - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction. - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore. - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter --------- - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged. - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support. - The iptables 32bit compat interface isn't compiled in by default anymore. - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used. - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device. Driver API ---------- - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time. - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them. - Allow the page_pool to directly recycle the pages from safely localized NAPI. - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization. - Add YNL support for user headers and struct attrs. - Add partial YNL specification for devlink. - Add partial YNL specification for ethtool. - Add tc-mqprio and tc-taprio support for preemptible traffic classes. - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device. - Add basic LED support for switch/phy. - Add NAPI documentation, stop relaying on external links. - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space. - Add transceiver support and improve the error messages for CAN-FD controllers. New hardware / drivers ---------------------- - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers ------- - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors. - add support for configuring max SDU for each Tx queue. - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll. - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates. - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRI/mUSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkgO0QAJGxpuN67YgYV0BIM+/atWKEEexJYG7B 9MMpU4jMO3EW/pUS5t7VRsBLUybLYVPmqCZoHodObDfnu59jiPOegb6SikJv/ZwJ Zw62PVk5MvDnQjlu4e6kDcGwkplteN08TlgI+a49BUTedpdFitrxHAYGW8f2fRO6 cK2XSld+ZucMoym5vRwf8yWS1BwdxnslPMxDJ+/8ZbWBZv44qAnG2vMB/kIx7ObC Vel/4m6MzTwVsLYBsRvcwMVbNNlZ9GuhztlTzEbfGA4ZhTadIAMgb5VTWXB84Ws7 Aic5wTdli+q+x6/2cxhbyeoVuB9HHObYmLBAciGg4GNljP5rnQBY3X3+KVZ/x9TI HQB7CmhxmAZVrO9pLARFV+ECrMTH2/dy3NyrZ7uYQ3WPOXJi8hJZjOTO/eeEGL7C eTjdz0dZBWIBK2gON/6s4nExXVQUTEF2ZsPi52jTTClKjfe5pz/ddeFQIWaY1DTm pInEiWPAvd28JyiFmhFNHsuIBCjX/Zqe2JuMfMBeBibDAC09o/OGdKJYUI15AiRf F46Pdb7use/puqfrYW44kSAfaPYoBiE+hj1RdeQfen35xD9HVE4vdnLNeuhRlFF9 aQfyIRHYQofkumRDr5f8JEY66cl9NiKQ4IVW1xxQfYDNdC6wQqREPG1md7rJVMrJ vP7ugFnttneg =ITVa -----END PGP SIGNATURE----- Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ... |
||
Linus Torvalds
|
53b5e72b9d |
asm-generic updates for 6.4
These are various cleanups, fixing a number of uapi header files to no longer reference CONFIG_* symbols, and one patch that introduces the new CONFIG_HAS_IOPORT symbol for architectures that provide working inb()/outb() macros, as a preparation for adding driver dependencies on those in the following release. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRG8IkACgkQYKtH/8kJ Uid15Q/9E/neIIEqEk6IvtyhUicrJiIZUM0rGoYtWXiz75ggk6Kx9+3I+j8zIQ/E kf2TzAG7q9Md7nfTDFLr4FSr0IcNDj+VG4nYxUyDHdKGcARO+g9Kpdvscxip3lgU Rw5w74Gyd30u4iUKGS39OYuxcCgl9LaFjMA9Gh402Oiaoh+OYLmgQS9h/goUD5KN Nd+AoFvkdbnHl0/SpxthLRyL5rFEATBmAY7apYViPyMvfjS3gfDJwXJR9jkKgi6X Qs4t8Op8BA3h84dCuo6VcFqgAJs2Wiq3nyTSUnkF8NxJ2RFTpeiVgfsLOzXHeDgz SKDB4Lp14o3mlyZyj00MWq1uMJRRetUgNiVb6iHOoKQ/E4demBdh+mhIFRybjM5B XNTWFcg9PWFCMa4W9jnLfZBc881X4+7T+qUF8I0W/1AbRJUmyGj8HO6jLceC4yGD UYLn5oFPM6OWXHp6DqJrCr9Yw8h6fuviQZFEbl/ARlgVGt+J4KbYweJYk8DzfX6t PZIj8LskOqyIpRuC2oDA1PHxkaJ1/z+N5oRBHq1uicSh4fxY5HW7HnyzgF08+R3k cf+fjAhC3TfGusHkBwQKQJvpxrxZjPuvYXDZ0GxTvNKJRB8eMeiTm1n41E5oTVwQ swSblSCjZj/fMVVPXLcjxEW4SBNWRxa9Lz3tIPXb3RheU10Lfy8= =H3k4 -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are various cleanups, fixing a number of uapi header files to no longer reference CONFIG_* symbols, and one patch that introduces the new CONFIG_HAS_IOPORT symbol for architectures that provide working inb()/outb() macros, as a preparation for adding driver dependencies on those in the following release" * tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Kconfig: introduce HAS_IOPORT option and select it as necessary scripts: Update the CONFIG_* ignore list in headers_install.sh pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header Move bp_type_idx to include/linux/hw_breakpoint.h Move ep_take_care_of_epollwakeup() to fs/eventpoll.c Move COMPAT_ATM_ADDPARTY to net/atm/svc.c |
||
Linus Torvalds
|
e7989789c6 |
Timers and timekeeping updates:
- Improve the VDSO build time checks to cover all dynamic relocations VDSO does not allow dynamic relcations, but the build time check is incomplete and fragile. It's based on architectures specifying the relocation types to search for and does not handle R_*_NONE relocation entries correctly. R_*_NONE relocations are injected by some GNU ld variants if they fail to determine the exact .rel[a]/dyn_size to cover trailing zeros. R_*_NONE relocations must be ignored by dynamic loaders, so they should be ignored in the build time check too. Remove the architecture specific relocation types to check for and validate strictly that no other relocations than R_*_NONE end up in the VSDO .so file. - Prefer signal delivery to the current thread for CLOCK_PROCESS_CPUTIME_ID based posix-timers Such timers prefer to deliver the signal to the main thread of a process even if the context in which the timer expires is the current task. This has the downside that it might wake up an idle thread. As there is no requirement or guarantee that the signal has to be delivered to the main thread, avoid this by preferring the current task if it is part of the thread group which shares sighand. This not only avoids waking idle threads, it also distributes the signal delivery in case of multiple timers firing in the context of different threads close to each other better. - Align the tick period properly (again) For a long time the tick was starting at CLOCK_MONOTONIC zero, which allowed users space applications to either align with the tick or to place a periodic computation so that it does not interfere with the tick. The alignement of the tick period was more by chance than by intention as the tick is set up before a high resolution clocksource is installed, i.e. timekeeping is still tick based and the tick period advances from there. The early enablement of sched_clock() broke this alignement as the time accumulated by sched_clock() is taken into account when timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is not longer a multiple of tick periods, which breaks applications which relied on that behaviour. Cure this by aligning the tick starting point to the next multiple of tick periods, i.e 1000ms/CONFIG_HZ. - A set of NOHZ fixes and enhancements - Cure the concurrent writer race for idle and IO sleeptime statistics The statitic values which are exposed via /proc/stat are updated from the CPU local idle exit and remotely by cpufreq, but that happens without any form of serialization. As a consequence sleeptimes can be accounted twice or worse. Prevent this by restricting the accumulation writeback to the CPU local idle exit and let the remote access compute the accumulated value. - Protect idle/iowait sleep time with a sequence count Reading idle/iowait sleep time, e.g. from /proc/stat, can race with idle exit updates. As a consequence the readout may result in random and potentially going backwards values. Protect this by a sequence count, which fixes the idle time statistics issue, but cannot fix the iowait time problem because iowait time accounting races with remote wake ups decrementing the remote runqueues nr_iowait counter. The latter is impossible to fix, so the only way to deal with that is to document it properly and to remove the assertion in the selftest which triggers occasionally due to that. - Restructure struct tick_sched for better cache layout - Some small cleanups and a better cache layout for struct tick_sched - Implement the missing timer_wait_running() callback for POSIX CPU timers For unknown reason the introduction of the timer_wait_running() callback missed to fixup posix CPU timers, which went unnoticed for almost four years. While initially only targeted to prevent livelocks between a timer deletion and the timer expiry function on PREEMPT_RT enabled kernels, it turned out that fixing this for mainline is not as trivial as just implementing a stub similar to the hrtimer/timer callbacks. The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems there is a livelock issue independent of RT. CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers out from hard interrupt context to task work, which is handled before returning to user space or to a VM. The expiry mechanism moves the expired timers to a stack local list head with sighand lock held. Once sighand is dropped the task can be preempted and a task which wants to delete a timer will spin-wait until the expiry task is scheduled back in. In the worst case this will end up in a livelock when the preempting task and the expiry task are pinned on the same CPU. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way. Add a per task mutex to struct posix_cputimers_work, let the expiry task hold it accross the expiry function and let the deleting task which waits for the expiry to complete block on the mutex. In the non-contended case this results in an extra mutex_lock()/unlock() pair on both sides. This avoids spin-waiting on a task which is scheduled out, prevents the livelock and cures the problem for RT and !RT systems. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0 0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+ 00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H 5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40 6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2 tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx 9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB nIJuHl8mxSetag== =SVnj -----END PGP SIGNATURE----- Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers and timekeeping updates from Thomas Gleixner: - Improve the VDSO build time checks to cover all dynamic relocations VDSO does not allow dynamic relocations, but the build time check is incomplete and fragile. It's based on architectures specifying the relocation types to search for and does not handle R_*_NONE relocation entries correctly. R_*_NONE relocations are injected by some GNU ld variants if they fail to determine the exact .rel[a]/dyn_size to cover trailing zeros. R_*_NONE relocations must be ignored by dynamic loaders, so they should be ignored in the build time check too. Remove the architecture specific relocation types to check for and validate strictly that no other relocations than R_*_NONE end up in the VSDO .so file. - Prefer signal delivery to the current thread for CLOCK_PROCESS_CPUTIME_ID based posix-timers Such timers prefer to deliver the signal to the main thread of a process even if the context in which the timer expires is the current task. This has the downside that it might wake up an idle thread. As there is no requirement or guarantee that the signal has to be delivered to the main thread, avoid this by preferring the current task if it is part of the thread group which shares sighand. This not only avoids waking idle threads, it also distributes the signal delivery in case of multiple timers firing in the context of different threads close to each other better. - Align the tick period properly (again) For a long time the tick was starting at CLOCK_MONOTONIC zero, which allowed users space applications to either align with the tick or to place a periodic computation so that it does not interfere with the tick. The alignement of the tick period was more by chance than by intention as the tick is set up before a high resolution clocksource is installed, i.e. timekeeping is still tick based and the tick period advances from there. The early enablement of sched_clock() broke this alignement as the time accumulated by sched_clock() is taken into account when timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is not longer a multiple of tick periods, which breaks applications which relied on that behaviour. Cure this by aligning the tick starting point to the next multiple of tick periods, i.e 1000ms/CONFIG_HZ. - A set of NOHZ fixes and enhancements: * Cure the concurrent writer race for idle and IO sleeptime statistics The statitic values which are exposed via /proc/stat are updated from the CPU local idle exit and remotely by cpufreq, but that happens without any form of serialization. As a consequence sleeptimes can be accounted twice or worse. Prevent this by restricting the accumulation writeback to the CPU local idle exit and let the remote access compute the accumulated value. * Protect idle/iowait sleep time with a sequence count Reading idle/iowait sleep time, e.g. from /proc/stat, can race with idle exit updates. As a consequence the readout may result in random and potentially going backwards values. Protect this by a sequence count, which fixes the idle time statistics issue, but cannot fix the iowait time problem because iowait time accounting races with remote wake ups decrementing the remote runqueues nr_iowait counter. The latter is impossible to fix, so the only way to deal with that is to document it properly and to remove the assertion in the selftest which triggers occasionally due to that. * Restructure struct tick_sched for better cache layout * Some small cleanups and a better cache layout for struct tick_sched - Implement the missing timer_wait_running() callback for POSIX CPU timers For unknown reason the introduction of the timer_wait_running() callback missed to fixup posix CPU timers, which went unnoticed for almost four years. While initially only targeted to prevent livelocks between a timer deletion and the timer expiry function on PREEMPT_RT enabled kernels, it turned out that fixing this for mainline is not as trivial as just implementing a stub similar to the hrtimer/timer callbacks. The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems there is a livelock issue independent of RT. CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers out from hard interrupt context to task work, which is handled before returning to user space or to a VM. The expiry mechanism moves the expired timers to a stack local list head with sighand lock held. Once sighand is dropped the task can be preempted and a task which wants to delete a timer will spin-wait until the expiry task is scheduled back in. In the worst case this will end up in a livelock when the preempting task and the expiry task are pinned on the same CPU. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way. Add a per task mutex to struct posix_cputimers_work, let the expiry task hold it accross the expiry function and let the deleting task which waits for the expiry to complete block on the mutex. In the non-contended case this results in an extra mutex_lock()/unlock() pair on both sides. This avoids spin-waiting on a task which is scheduled out, prevents the livelock and cures the problem for RT and !RT systems * tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Implement the missing timer_wait_running callback selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity selftests/proc: Remove idle time monotonicity assertions MAINTAINERS: Remove stale email address timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() timers/nohz: Add a comment about broken iowait counter update race timers/nohz: Protect idle/iowait sleep time under seqcount timers/nohz: Only ever update sleeptime from idle exit timers/nohz: Restructure and reshuffle struct tick_sched tick/common: Align tick period with the HZ tick. selftests/timers/posix_timers: Test delivery of signals across threads posix-timers: Prefer delivery of signals to the current thread vdso: Improve cmd_vdso_check to check all dynamic relocations |
||
Jakub Kicinski
|
681c5b51dc |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Thomas Zimmermann
|
84998fc1c3 |
arch/loongarch: Implement <asm/fb.h> with generic helpers
Replace the architecture's fbdev helpers with the generic ones from <asm-generic/fb.h>. No functional changes. v2: * use default implementation for fb_pgprotect() (Arnd) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230417125651.25126-7-tzimmermann@suse.de |
||
Enze Li
|
213ef669d1 |
LoongArch: Replace hard-coded values in comments with VALEN
According to LoongArch documentation [1], CSR.PGDL and CSR.PGDH are concerned with the VA's MSB which is VALEN-1 instead of always being 47. Fix comments to avoid misleading others. [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#page-global-directory-base-address-for-lower-half-address-space Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Enze Li <lienze@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Tiezhu Yang
|
afca6e0649 |
LoongArch: Clean up plat_swiotlb_setup() related code
After commit c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"), plat_swiotlb_setup() has been deleted, so clean up the related code. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Tiezhu Yang
|
370a3b8f58 |
LoongArch: Check unwind_error() in arch_stack_walk()
We can see the following messages with CONFIG_PROVE_LOCKING=y on LoongArch: BUG: MAX_STACK_TRACE_ENTRIES too low! turning off the locking correctness validator. This is because stack_trace_save() returns a big value after call arch_stack_walk(), here is the call trace: save_trace() stack_trace_save() arch_stack_walk() stack_trace_consume_entry() arch_stack_walk() should return immediately if unwind_next_frame() failed, no need to do the useless loops to increase the value of c->len in stack_trace_consume_entry(), then we can fix the above problem. Cc: stable@vger.kernel.org Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Qing Zhang
|
e32b3b8222 |
LoongArch: Adjust user_regset_copyin parameter to the correct offset
Ensure that user_watch_state can be set correctly by the user. Signed-off-by: Qing Zhang <zhangqing@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Qing Zhang
|
ff9f3d7aef |
LoongArch: Adjust user_watch_state for explicit alignment
This is done in order to easily calculate the number of breakpoints in hw_break_get()/hw_break_set(). Signed-off-by: Qing Zhang <zhangqing@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Aneesh Kumar K.V
|
0b376f1e0f |
mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Huacai Chen
|
93eb1215ed |
LoongArch: module: set section addresses to 0x0
These got*, plt* and .text.ftrace_trampoline sections specified for LoongArch have non-zero addressses. Non-zero section addresses in a relocatable ELF would confuse GDB when it tries to compute the section offsets and it ends up printing wrong symbol addresses. Therefore, set them to zero, which mirrors the change in commit 5d8591bc0fbaeb6ded ("arm64 module: set plt* section addresses to 0x0"). Cc: stable@vger.kernel.org Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Chong Qiao <qiaochong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Huacai Chen
|
dce5ea1d0f |
LoongArch: Mark 3 symbol exports as non-GPL
vm_map_base, empty_zero_page and invalid_pmd_table could be accessed widely by some out-of-tree non-GPL but important file systems or drivers (e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL() to export them, so as to avoid build errors. 1, Details about vm_map_base: This is a LoongArch-specific symbol and may be referenced through macros PCI_IOBASE, VMALLOC_START and VMALLOC_END. 2, Details about empty_zero_page: As it stands today, only 3 architectures export empty_zero_page as a GPL symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by inheriting from MIPS, and the MIPS export was first introduced in commit 497d2adcbf50b ("[MIPS] Export empty_zero_page for sake of the ext4 module."). The IA64 export was similar: commit a7d57ecf4216e ("[IA64] Export three symbols for module use") did so for kvm. In both IA64 and MIPS, the export of empty_zero_page was done for satisfying some in-kernel component built as module (kvm and ext4 respectively), and given its reasonably low-level nature, GPL is a reasonable choice. But looking at the bigger picture it is evident most other architectures do not regard it as GPL, so in effect the symbol probably should not be treated as such, in favor of consistency. 3, Details about invalid_pmd_table: Keep consistency with invalid_pte_table and make it be possible by some modules. Cc: stable@vger.kernel.org Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Huacai Chen
|
1c1378a409 |
LoongArch: Enable PG when wakeup from suspend
Some firmwares don't enable PG when wakeup from suspend, so do it in kernel. This can improve code compatibility for boot kernel. Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Qing Zhang
|
6637775ca3 |
LoongArch: Fix _CONST64_(x) as unsigned
Addresses should all be of unsigned type to avoid unnecessary conversions. Signed-off-by: Qing Zhang <zhangqing@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Huacai Chen
|
1cf62488f5 |
LoongArch: Fix build error if CONFIG_SUSPEND is not set
We can see the following build error on LoongArch if CONFIG_SUSPEND is not set: ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare': sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start' Here is the call trace: acpi_pm_prepare() __acpi_pm_prepare() acpi_sleep_prepare() acpi_get_wakeup_address() loongarch_wakeup_start() Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/ suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_ SUSPEND is not set. Fixes: 366bb35a8e48 ("LoongArch: Add suspend (ACPI S3) support") Reviewed-by: WANG Xuerui <git@xen0n.name> Reported-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/all/11215033-fa3c-ecb1-2fc0-e9aeba47be9b@infradead.org/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Huacai Chen
|
df83033604 |
LoongArch: Fix probing of the CRC32 feature
Not all LoongArch processors support CRC32 instructions. This feature is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the previous versions of the ISA manual (and so does in loongarch.h). The CRC32 feature is set unconditionally now, so fix it. BTW, expose the CRC32 feature in /proc/cpuinfo. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Huacai Chen
|
16c52e5030 |
LoongArch: Make WriteCombine configurable for ioremap()
LoongArch maintains cache coherency in hardware, but when paired with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar to WriteCombine) is out of the scope of cache coherency machanism for PCIe devices (this is a PCIe protocol violation, which may be fixed in newer chipsets). This means WUC can only used for write-only memory regions now, so this option is disabled by default, making WUC silently fallback to SUC for ioremap(). You can enable this option if the kernel is ensured to run on hardware without this bug. Kernel parameter writecombine=on/off can be used to override the Kconfig option. Cc: stable@vger.kernel.org Suggested-by: WANG Xuerui <kernel@xen0n.name> Reviewed-by: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Jakub Kicinski
|
800e68c44f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts: tools/testing/selftests/net/config 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config") Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Mike Rapoport (IBM)
|
7ce6048d3a |
loongarch: drop ranges for definition of ARCH_FORCE_MAX_ORDER
LoongArch defines insane ranges for ARCH_FORCE_MAX_ORDER allowing MAX_ORDER up to 63, which implies maximal contiguous allocation size of 2^63 pages. Drop bogus definitions of ranges for ARCH_FORCE_MAX_ORDER and leave it a simple integer with sensible defaults. Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER will be able to do so but they won't be mislead by the bogus ranges. Link: https://lkml.kernel.org/r/20230322081727.2516291-1-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Kirill A. Shutemov
|
23baf831a3 |
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Niklas Schnelle
|
fcbfe8121a
|
Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390. The following architectures do not select HAS_IOPORT: * ARC * C-SKY * Hexagon * Nios II * OpenRISC * s390 * User-Mode Linux * Xtensa All other architectures select HAS_IOPORT at least conditionally. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
George Guo
|
a6f6a95f25 |
LoongArch, bpf: Fix jit to skip speculation barrier opcode
Just skip the opcode(BPF_ST | BPF_NOSPEC) in the BPF JIT instead of failing to JIT the entire program, given LoongArch currently has no couterpart of a speculation barrier instruction. To verify the issue, use the ltp testcase as shown below. Also, Wang says: I can confirm there's currently no speculation barrier equivalent on LonogArch. (Loongson says there are builtin mitigations for Spectre-V1 and V2 on their chips, and AFAIK efforts to port the exploits to mips/LoongArch have all failed a few years ago.) Without this patch: $ ./bpf_prog02 [...] bpf_common.c:123: TBROK: Failed verification: ??? (524) [...] Summary: passed 0 failed 0 broken 1 skipped 0 warnings 0 With this patch: $ ./bpf_prog02 [...] Summary: passed 0 failed 0 broken 0 skipped 0 warnings 0 Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: WANG Xuerui <git@xen0n.name> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/bpf/20230328071335.2664966-1-guodongtai@kylinos.cn |
||
Valentin Schneider
|
4c8c3c7f70 |
treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it into an smp_send_reschedule() that contains a tracepoint. Changes to include the declaration of the tracepoint were driven by the following coccinelle script: @func_use@ @@ smp_send_reschedule(...); @include@ @@ #include <trace/events/ipi.h> @no_include depends on func_use && !include@ @@ #include <...> + + #include <trace/events/ipi.h> [csky bits] [riscv bits] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com |
||
Fangrui Song
|
aff69273af |
vdso: Improve cmd_vdso_check to check all dynamic relocations
The actual intention is that no dynamic relocation exists in the VDSO. For this the VDSO build validates that the resulting .so file does not have any relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture, which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative relocations too. However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If a port fails to determine the exact .rel[a].dyn size, the trailing zeros become R_*_NONE relocations. E.g. ld's powerpc port recently fixed https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are generally a no-op in the dynamic loaders. So just ignore them. Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting .so file does not contain any R_* relocation entries except R_*_NONE. Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64 Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64 Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com |
||
Tony Nguyen
|
e485f3a6ea |
ixgb: Remove ixgb driver
There are likely no users of this driver as the hardware has been discontinued since 2010. Remove the driver and all references to it in documentation. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Jakub Kicinski
|
d0ddf5065f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Josh Poimboeuf
|
071c44e427 |
sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead() return"), in Xen, when a previously offlined CPU was brought back online, it unexpectedly resumed execution where it left off in the middle of the idle loop. There were some hacks to make that work, but the behavior was surprising as do_idle() doesn't expect an offlined CPU to return from the dead (in arch_cpu_idle_dead()). Now that Xen has been fixed, and the arch-specific implementations of arch_cpu_idle_dead() also don't return, give it a __noreturn attribute. This will cause the compiler to complain if an arch-specific implementation might return. It also improves code generation for both caller and callee. Also fixes the following warning: vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
Jakub Kicinski
|
36e5e391a2 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG 2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc= =EjJQ -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Josh Poimboeuf
|
6c0f2d071e |
loongarch/cpu: Mark play_dead() __noreturn
play_dead() doesn't return. Annotate it as such. By extension this also makes arch_cpu_idle_dead() noreturn. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/4da55acfdec8a9132c4e21ffb7edb1f846841193.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
Josh Poimboeuf
|
13bf7923a4 |
loongarch/cpu: Make sure play_dead() doesn't return
play_dead() doesn't return. Make that more explicit with a BUG(). BUG() is preferable to unreachable() because BUG() is a more explicit failure mode and avoids undefined behavior like falling off the edge of the function into whatever code happens to be next. Link: https://lore.kernel.org/r/21245d687ffeda34dbcf04961a2df3724f04f7c8.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
Linus Torvalds
|
a8356cdb5b |
LoongArch changes for v6.3
1, Make -mstrict-align configurable; 2, Add kernel relocation and KASLR support; 3, Add single kernel image implementation for kdump; 4, Add hardware breakpoints/watchpoints support; 5, Add kprobes/kretprobes/kprobes_on_ftrace support; 6, Add LoongArch support for some selftests. -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmP+9H0WHGNoZW5odWFj YWlAa2VybmVsLm9yZwAKCRAChivD8uImerz+D/98MjkLXM4qtgfAxuBKpVdEVA4U bzO19UlpqWlwTJbwrhf0GYsRrAis37PTVJG4eNORJairJ/oTkMtEEBPhwq0D9Whc URDEh+VrjzFztLsu2OlvzOA9gE7lpg+xAx2LKflP7ixlOELOWeercDLW3octp5/J CJDE8wPaw9tJrMHFWuiVybs03yZmY3YFV55JdWL9hY8Ryy4DY5997mruOfzjvHpl EfDgQM2zCn2JSQwaD+Kl3MHxHyRx07Tj2wnZAh9ptaGeptK/yplc7nqRwhe7BevS QwClhJNPICcOi+evZ7cDUY0PTL4evpw2KRnF1N4zw+58RhZECjVrCEJNdf6L1scj muptQngWKrE/TJvn4way3cJr44stSCtT71elPhn629S23my/CauMmFqCqKpYOPOf pxwzzCaqDcaZKwMu96qBkZS76tIrhoNeNFntj+C9RS+8ezY3+o144S3vF1A6A9Zb M4gwa2NiQuLqnCUwKK6dZkLQVX2NMIMViUkYNKdUStxNWx/K7fFmXcl0ycAFpGYp 8Q95LLH34jUrpSgqMSCmcylsPvNiN1QnuXFnw8Tu+zDthp5dOzio60tORLPM1ZUq gobPeGjeTQInq4eMCf2B5HH8fOMVtJyj6H4K9G1M6HUMg64UtcBp6BvEbwPxTxNN sIOFUjDfDnBiIXWF4w== =SzL5 -----END PGP SIGNATURE----- Merge tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Make -mstrict-align configurable - Add kernel relocation and KASLR support - Add single kernel image implementation for kdump - Add hardware breakpoints/watchpoints support - Add kprobes/kretprobes/kprobes_on_ftrace support - Add LoongArch support for some selftests. * tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (23 commits) selftests/ftrace: Add LoongArch kprobe args string tests support selftests/seccomp: Add LoongArch selftesting support tools: Add LoongArch build infrastructure samples/kprobes: Add LoongArch support LoongArch: Mark some assembler symbols as non-kprobe-able LoongArch: Add kprobes on ftrace support LoongArch: Add kretprobes support LoongArch: Add kprobes support LoongArch: Simulate branch and PC* instructions LoongArch: ptrace: Add hardware single step support LoongArch: ptrace: Add function argument access API LoongArch: ptrace: Expose hardware breakpoints to debuggers LoongArch: Add hardware breakpoints/watchpoints support LoongArch: kdump: Add crashkernel=YM handling LoongArch: kdump: Add single kernel image implementation LoongArch: Add support for kernel address space layout randomization (KASLR) LoongArch: Add support for kernel relocation LoongArch: Add la_abs macro implementation LoongArch: Add JUMP_VIRT_ADDR macro implementation to avoid using la.abs LoongArch: Use la.pcrel instead of la.abs when it's trivially possible ... |
||
Tiezhu Yang
|
fcf77d0162 |
LoongArch: Mark some assembler symbols as non-kprobe-able
Some assembler symbols are not kprobe safe, such as handle_syscall (used as syscall exception handler), *memset*/*memcpy*/*memmove* (may cause recursive exceptions), they can not be instrumented, just blacklist them for kprobing. Here is a related problem and discussion: Link: https://lore.kernel.org/lkml/20230114143859.7ccc45c1c5d9ce302113ab0a@kernel.org/ Tested-by: Jeff Xie <xiehuan09@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Tiezhu Yang
|
09e679c28a |
LoongArch: Add kprobes on ftrace support
Add kprobe_ftrace_handler() and arch_prepare_kprobe_ftrace() to support kprobes on ftrace, the code is similar with x86 and riscv. Here is a simple example: # echo 'p:myprobe kernel_clone' > /sys/kernel/debug/tracing/kprobe_events # echo 'r:myretprobe kernel_clone $retval' >> /sys/kernel/debug/tracing/kprobe_events # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable # echo 1 > /sys/kernel/debug/tracing/tracing_on # cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:4 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | bash-488 [002] ..... 2041.190681: myprobe: (kernel_clone+0x0/0x40c) bash-488 [002] ..... 2041.190788: myretprobe: (__do_sys_clone+0x84/0xb8 <- kernel_clone) arg1=0x200 Tested-by: Jeff Xie <xiehuan09@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Tiezhu Yang
|
3f55368600 |
LoongArch: Add kretprobes support
Use the generic kretprobe trampoline handler to add kretprobes support for LoongArch. Tested-by: Jeff Xie <xiehuan09@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |