linux

iv/linux

Author	SHA1	Message	Date
Palmer Dabbelt	52b77c2806	Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8" Jisheng Zhang <jszhang@kernel.org> says: Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc.[1] One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future once we see such platforms in mainline. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. patch1 allows kmalloc() caches aligned to the smallest value. patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC. After this series: As for coherent platforms, kmalloc-{8,16,32,96} caches come back on coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO. As for noncoherent RV32 platforms, nothing changed. As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the above kmalloc caches also come back if > 4GB memory or users pass "swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How much mmnn should be depends on the specific platform, it needs to be tried and tested all possible usage case on the specific hardware. For example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock. * b4-shazam-merge: riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1] Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-31 00:18:35 -07:00
Palmer Dabbelt	7f7d3ea6eb	Merge patch series "riscv: KCFI support" Sami Tolvanen <samitolvanen@google.com> says: This series adds KCFI support for RISC-V. KCFI is a fine-grained forward-edge control-flow integrity scheme supported in Clang >=16, which ensures indirect calls in instrumented code can only branch to functions whose type matches the function pointer type, thus making code reuse attacks more difficult. Patch 1 implements a pt_regs based syscall wrapper to address function pointer type mismatches in syscall handling. Patches 2 and 3 annotate indirectly called assembly functions with CFI types. Patch 4 implements error handling for indirect call checks. Patch 5 disables CFI for arch/riscv/purgatory. Patch 6 finally allows CONFIG_CFI_CLANG to be enabled for RISC-V. Note that Clang 16 has a generic architecture-agnostic KCFI implementation, which does work with the kernel, but doesn't produce a stable code sequence for indirect call checks, which means potential failures just trap and won't result in informative error messages. Clang 17 includes a RISC-V specific back-end implementation for KCFI, which emits a predictable code sequence for the checks and a .kcfi_traps section with locations of the traps, which patch 5 uses to produce more useful errors. The type mismatch fixes and annotations in the first three patches also become necessary in future if the kernel decides to support fine-grained CFI implemented using the hardware landing pad feature proposed in the in-progress Zicfisslp extension. Once the specification is ratified and hardware support emerges, implementing runtime patching support that replaces KCFI instrumentation with Zicfisslp landing pads might also be feasible (similarly to KCFI to FineIBT patching on x86_64), allowing distributions to ship a unified kernel binary for all devices. * b4-shazam-merge: riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions riscv: Implement syscall wrappers Link: https://lore.kernel.org/r/20230710183544.999540-8-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-31 00:18:32 -07:00
Palmer Dabbelt	9389e6715f	Merge patch series "support allocating crashkernel above 4G explicitly on riscv" Chen Jiahao <chenjiahao16@huawei.com> says: On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Hence this patchset introduces the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Besides, there are few rules need to take notice: 1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is specified. 2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed and there is enough memory to be allocated under 4G. 3. When allocating crashkernel above 4G and no "crashkernel=X,low" is specified, a 128M low memory will be allocated automatically for swiotlb bounce buffer. See Documentation/admin-guide/kernel-parameters.txt for more information. To verify loading the crashkernel, adapted kexec-tools is attached below: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 Following test cases have been performed as expected: 1) crashkernel=256M //low=256M 2) crashkernel=1G //low=1G 3) crashkernel=4G //high=4G, low=128M(default) 4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored 5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored 6) crashkernel=4G,high //high=4G, low=128M(default) 7) crashkernel=256M,low //low=0M, invalid 8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M 9) crashkernel=4G,high crashkernel=4G,low //high=0M, low=0M, invalid 10) crashkernel=512M@0xd0000000 //low=512M 11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M * b4-shazam-merge: docs: kdump: Update the crashkernel description for riscv riscv: kdump: Implement crashkernel=X,[high,low] Link: https://lore.kernel.org/r/20230726175000.2536220-1-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-31 00:18:28 -07:00
Palmer Dabbelt	82dfb5fde6	Merge patch series "riscv: kprobes: simulate some instructions" Nam Cao <namcaov@gmail.com> says: Simulate some currently rejected instructions. Still to be simulated are: - c.jal - c.ebreak * b4-shazam-merge: riscv: kprobes: simulate c.beqz and c.bnez riscv: kprobes: simulate c.jr and c.jalr instructions riscv: kprobes: simulate c.j instruction Link: https://lore.kernel.org/r/cover.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-31 00:18:27 -07:00
Nam Cao	6b289a3ffa	riscv: remove redundant mv instructions Some mv instructions were useful when first introduced to preserve a0 and a1 before function calls. However the code has changed and they are now redundant. Remove them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230725053835.138910-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-31 00:18:25 -07:00
Jisheng Zhang	2926715163	riscv: allow kmalloc() caches aligned to the smallest value Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc.[1] One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-23 14:22:00 -07:00
Sami Tolvanen	af0ead42f6	riscv: Add CFI error handling With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately before each function and a check to validate the target function type before indirect calls: ; type preamble .word <id> function: ... ; indirect call check lw t1, -4(a0) lui t2, <hi20> addiw t2, t2, <lo12> beq t1, t2, .Ltmp0 ebreak .Ltmp0: jarl a0 Implement error handling code for the ebreak traps emitted for the checks. This produces the following oops on a CFI failure (generated using lkdtm): [ 21.177245] CFI failure at lkdtm_indirect_call+0x22/0x32 [lkdtm] (target: lkdtm_increment_int+0x0/0x18 [lkdtm]; expected type: 0x3ad55aca) [ 21.178483] Kernel BUG [#1] [ 21.178671] Modules linked in: lkdtm [ 21.179037] CPU: 1 PID: 104 Comm: sh Not tainted 6.3.0-rc6-00037-g37d5ec6297ab #1 [ 21.179511] Hardware name: riscv-virtio,qemu (DT) [ 21.179818] epc : lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.180106] ra : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.180426] epc : ffffffff01387092 ra : ffffffff01386f14 sp : ff20000000453cf0 [ 21.180792] gp : ffffffff81308c38 tp : ff6000000243f080 t0 : ff20000000453b78 [ 21.181157] t1 : 000000003ad55aca t2 : 000000007e0c52a5 s0 : ff20000000453d00 [ 21.181506] s1 : 0000000000000001 a0 : ffffffff0138d170 a1 : ffffffff013870bc [ 21.181819] a2 : b5fea48dd89aa700 a3 : 0000000000000001 a4 : 0000000000000fff [ 21.182169] a5 : 0000000000000004 a6 : 00000000000000b7 a7 : 0000000000000000 [ 21.182591] s2 : ff20000000453e78 s3 : ffffffffffffffea s4 : 0000000000000012 [ 21.183001] s5 : ff600000023c7000 s6 : 0000000000000006 s7 : ffffffff013882a0 [ 21.183653] s8 : 0000000000000008 s9 : 0000000000000002 s10: ffffffff0138d878 [ 21.184245] s11: ffffffff0138d878 t3 : 0000000000000003 t4 : 0000000000000000 [ 21.184591] t5 : ffffffff8133df08 t6 : ffffffff8133df07 [ 21.184858] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 21.185415] [<ffffffff01387092>] lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.185772] [<ffffffff01386f14>] lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.186093] [<ffffffff01383552>] lkdtm_do_action+0x22/0x34 [lkdtm] [ 21.186445] [<ffffffff0138350c>] direct_entry+0x128/0x13a [lkdtm] [ 21.186817] [<ffffffff8033ed8c>] full_proxy_write+0x58/0xb2 [ 21.187352] [<ffffffff801d4fe8>] vfs_write+0x14c/0x33a [ 21.187644] [<ffffffff801d5328>] ksys_write+0x64/0xd4 [ 21.187832] [<ffffffff801d53a6>] sys_write+0xe/0x1a [ 21.188171] [<ffffffff80003996>] ret_from_syscall+0x0/0x2 [ 21.188595] Code: 0513 0f65 a303 ffc5 53b7 7e0c 839b 2a53 0363 0073 (9002) 9582 [ 21.189178] ---[ end trace 0000000000000000 ]--- [ 21.189590] Kernel panic - not syncing: Fatal exception Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> # ISA bits Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-23 14:16:39 -07:00
Sami Tolvanen	f3a0c23f25	riscv: Add ftrace_stub_graph Commit 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()") added a separate ftrace_stub_graph function for CFI_CLANG. Add the stub to fix FUNCTION_GRAPH_TRACER compatibility with CFI. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-11-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-23 14:16:38 -07:00
Sami Tolvanen	5f59c6855b	riscv: Add types to indirectly called assembly functions With CONFIG_CFI_CLANG, assembly functions indirectly called from C code must be annotated with type identifiers to pass CFI checking. Use the SYM_TYPED_START macro to add types to the relevant functions. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-10-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-23 14:16:37 -07:00
Sami Tolvanen	08d0ce30e0	riscv: Implement syscall wrappers Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved syscall handling to C code, which exposed function pointer type mismatches that trip fine-grained forward-edge Control-Flow Integrity (CFI) checks as syscall handlers are all called through the same syscall_t pointer type. To fix the type mismatches, implement pt_regs based syscall wrappers similarly to x86 and arm64. This patch is based on arm64 syscall wrappers added in commit 4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal was to minimize the risk of userspace-controlled values being used under speculation. This may be a concern for riscv in future as well. Following other architectures, the syscall wrappers generate three functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs pointer and extracts arguments from registers, __se_<compat_>sys_<name> is a sign-extension wrapper that casts the long arguments to the correct types for the real syscall implementation, which is named __do_<compat_>sys_<name>. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-23 14:16:36 -07:00
Chen Jiahao	5882e5acf1	riscv: kdump: Implement crashkernel=X,[high,low] On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Here introduce the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20230726175000.2536220-2-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-16 07:51:48 -07:00
Nam Cao	d943705fba	riscv: kprobes: simulate c.beqz and c.bnez kprobes currently rejects instruction c.beqz and c.bnez. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/1d879dba4e4ee9a82e27625d6483b5c9cfed684f.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-16 07:48:40 -07:00
Nam Cao	b18256d9b7	riscv: kprobes: simulate c.jr and c.jalr instructions kprobes currently rejects c.jr and c.jalr instructions. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/db8b7787e9208654cca50484f68334f412be2ea9.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-16 07:48:39 -07:00
Nam Cao	a93892974f	riscv: kprobes: simulate c.j instruction kprobes currently rejects c.j instruction. Implement it. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/6ef76cd9984b8015826649d13f870f8ac45a2d0d.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-16 07:48:38 -07:00
Justin Stitt	12d61a1bc2	RISC-V: cpu: refactor deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1]. Favor not copying strings onto stack and instead use strings directly. This avoids hard-coding sizes and buffer lengths all together. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230802-arch-riscv-kernel-v2-1-24266e85bc96@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-08-02 13:45:28 -07:00
Conor Dooley	496ea826d1	RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa" As it says on the tin, provide Kconfig option to control parsing the "riscv,isa" devicetree property. If either option is used, the kernel will fall back to parsing "riscv,isa", where "riscv,isa-base" and "riscv,isa-extensions" are not present. The Kconfig options are set up so that the default kernel configuration will enable the fallback path, without needing the commandline option. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-aviator-plausibly-a35662485c2c@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:25 -07:00
Conor Dooley	c98f136aed	RISC-V: try new extension properties in of_early_processor_hartid() To fully deprecate the kernel's use of "riscv,isa", of_early_processor_hartid() needs to first try using the new properties, before falling back to "riscv,isa". Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-tablet-jimmy-987fea0eb2e1@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:24 -07:00
Conor Dooley	90700a4fbf	RISC-V: enable extension detection from dedicated properties Add support for parsing the new riscv,isa-extensions property in riscv_fill_hwcap(), by means of a new "property" member of the riscv_isa_ext_data struct. For now, this shadows the name of the extension for all users, however this may not be the case for all extensions, based on how the dt-binding is written. For the sake of backwards compatibility, fall back to the old scheme if the new properties are not detected. For now, just inform, rather than warn, when that happens. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-vocation-profane-39a74b3c2649@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:23 -07:00
Conor Dooley	4265b0ec5e	RISC-V: split riscv_fill_hwcap() in 3 Before adding more complexity to it, split riscv_fill_hwcap() into 3 distinct sections: - riscv_fill_hwcap() still is the top level function, into which the additional complexity will be added. - riscv_fill_hwcap_from_isa_string() handles getting the information from the riscv,isa/ACPI equivalent across harts & the various quirks there - riscv_parse_isa_string() does what it says on the tin. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-daylight-puritan-37aeb41a4d9b@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:22 -07:00
Conor Dooley	effc122ad1	RISC-V: add single letter extensions to riscv_isa_ext So that riscv_fill_hwcap() can use riscv_isa_ext to probe for single letter extensions, add them to it. As a result, what gets spat out in /proc/cpuinfo will become borked, as single letter extensions will be printed as part of the base extensions and while printing from riscv_isa_arr. Take the opportunity to unify the printing of the isa string, using the new member of riscv_isa_ext_data in the process. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-despite-bright-de00ac888cc7@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:21 -07:00
Conor Dooley	37f988dcec	RISC-V: repurpose riscv_isa_ext array in riscv_fill_hwcap() In riscv_fill_hwcap() riscv_isa_ext array can be looped over, rather than duplicating the list of extensions with individual SET_ISA_EXT_MAP() usage. While at it, drop the statement-of-the-obvious comments from the struct, rename uprop to something more suitable for its new use & constify the members. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-dastardly-affiliate-4cf819dccde2@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:19 -07:00
Conor Dooley	8135ade32c	RISC-V: shunt isa_ext_arr to cpufeature.c To facilitate using one struct to define extensions, rather than having several, shunt isa_ext_arr to cpufeature.c, where it will be used for probing extension presence also. As that scope of the array as widened, prefix it with riscv & drop the type from the variable name. Since the new array is const, print_isa() needs a wee bit of cleanup to avoid complaints about losing the const qualifier. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Evan Green <evan@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-spirits-upside-a2c61c65fd5a@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:18 -07:00
Conor Dooley	131033689d	RISC-V: drop a needless check in print_isa_ext() isa_ext_arr cannot be empty, as some of the extensions within it are always built into the kernel. When this code was first added, back in commit a9b202606c69 ("RISC-V: Improve /proc/cpuinfo output for ISA extensions"), the array was empty and needed a dummy item & thus there could be no extensions present. When the first multi-letter ones did get added, it was Sscofpmf - which didn't have a Kconfig symbol to disable it. Remove this check, as it has been redundant since Sscofpmf was added. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-veggie-mug-3d3bf6787ae2@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:17 -07:00
Heiko Stuebner	67270fb388	RISC-V: don't parse dt/acpi isa string to get rv32/rv64 When filling hwcap the kernel already expects the isa string to start with rv32 if CONFIG_32BIT and rv64 if CONFIG_64BIT. So when recreating the runtime isa-string we can also just go the other way to get the correct starting point for it. Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Evan Green <evan@rivosinc.com> Co-developed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-masculine-saddlebag-67a94966b091@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:16 -07:00
Palmer Dabbelt	2305989396	RISC-V: Provide a more helpful error message on invalid ISA strings Right now we provide a somewhat unhelpful error message on systems with invalid error messages, something along the lines of CPU with hartid=0 is not available ------------[ cut here ]------------ kernel BUG at arch/riscv/kernel/smpboot.c:174! Kernel BUG [#1] Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc1-00096-ge0097d2c62d5-dirty #1 Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) epc : of_parse_and_init_cpus+0x16c/0x16e ra : of_parse_and_init_cpus+0x9a/0x16e epc : ffffffff80c04e0a ra : ffffffff80c04d38 sp : ffffffff81603e20 gp : ffffffff8182d658 tp : ffffffff81613f80 t0 : 000000000000006e t1 : 0000000000000064 t2 : 0000000000000000 s0 : ffffffff81603e80 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000001fff a6 : 0000000000001fff a7 : ffffffff816148b0 s2 : 0000000000000001 s3 : ffffffff81492a4c s4 : ffffffff81a4b090 s5 : ffffffff81506030 s6 : 0000000000000040 s7 : 0000000000000000 s8 : 00000000bfb6f046 s9 : 0000000000000001 s10: 0000000000000000 s11: 00000000bf389700 t3 : 0000000000000000 t4 : 0000000000000000 t5 : ffffffff824dd188 t6 : ffffffff824dd187 status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff80c04e0a>] of_parse_and_init_cpus+0x16c/0x16e [<ffffffff80c04c96>] setup_smp+0x1e/0x26 [<ffffffff80c03ffe>] setup_arch+0x6e/0xb2 [<ffffffff80c00384>] start_kernel+0x72/0x400 Code: 80e7 4a00 a603 0009 b795 1097 ffe5 80e7 92c0 9002 (9002) 715d ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception in interrupt Add a warning for the cases where the ISA string isn't valid. It's still above the BUG_ON cut, but hopefully it's at least a bit easier for users. Reviewed-by: Evan Green <evan@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-endless-spearhead-62a5a4b149bd@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-25 16:26:15 -07:00
Linus Torvalds	4f6b6c2b2f	RISC-V Patches for the 6.5 Merge Window, Part 2 * A bunch of fixes/cleanups from the first part of the merge window, mostly related to ACPI and vector as those were large. * Some documentation improvements, mostly related to the new code. * The "riscv,isa" DT key is deprecated. * Support for link-time dead code elimination. * Support for minor fault registration in userfaultd. * A handful of cleanups around CMO alternatives. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmSoLx8THHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiSlbD/9SVAxWKL/9oGh/qDtf7As24ngAKmsy YfC1LgDwvFOjVz8+YUD7HgUG1Sath2D5e5h2QpVBa16WezIzJUbDvvnYElB28i0J cZ1sCuI/S62kQbqrP3ITqSt0yj3A1OFVyuF3x+5m6pNqjjhkx5HxYs+omFGJYf4e K9JE1Rzi1QXNf+uZeuHhK6FqQYdNIsCXmMRnjZTF5FwwzYk1zVkUR4jntZMJV0sf aP1DfXXgPUEG0LzqTdMLSyT2qnQ2hux5/9ayknt45G0Bm4IYZfGd4Twtab8LOPY9 6nJq9UHFne8xFAeUp+GGY3vQLR7Y892vXHDprblhiAP2FzH3E1HOC1g24xd1lID5 80rgTB8ttY8LgOamr2HxeRKLQkWxDeng9IcAwSwe4T0QVIvqA1hjFTezXYWrD30e GA0gqvz11ERb7KKS4aJhEljS+ux81PXKPdKIeqp6KnM2N3Ch+LBRIY2v7JZQ0rcT eAb7uU2MRLwNDevoWkB7iFTkfd+frJGotRDFQZE9atXrx3j3UUNlnFGz8aKtSLX7 b0PFP2iqxYgVPVejqxw03VlEzgV19kJrT/o8Hh7mCGjFQPSbZKIBQb7yHYXKlWWT eTM8d+ETOlV+yRpWnJSnOX18scsriUmfQj9GhcImwCFsfh9XPLw8CHj82xZiUxFf 645zqiuRJi6yJw== =jBYf -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - A bunch of fixes/cleanups from the first part of the merge window, mostly related to ACPI and vector as those were large - Some documentation improvements, mostly related to the new code - The "riscv,isa" DT key is deprecated - Support for link-time dead code elimination - Support for minor fault registration in userfaultd - A handful of cleanups around CMO alternatives * tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits) riscv: mm: mark noncoherent_supported as __ro_after_init riscv: mm: mark CBO relate initialization funcs as __init riscv: errata: thead: only set cbom size & noncoherent during boot riscv: Select HAVE_ARCH_USERFAULTFD_MINOR RISC-V: Document the ISA string parsing rules for ACPI risc-v: Fix order of IPI enablement vs RCU startup mm: riscv: fix an unsafe pte read in huge_pte_alloc() dt-bindings: riscv: deprecate riscv,isa RISC-V: drop error print from riscv_hartid_to_cpuid() riscv: Discard vector state on syscalls riscv: move memblock_allow_resize() after linear mapping is ready riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle riscv: vdso: include vdso/vsyscall.h for vdso_data selftests: Test RISC-V Vector's first-use handler riscv: vector: clear V-reg in the first-use trap riscv: vector: only enable interrupts in the first-use trap RISC-V: Fix up some vector state related build failures RISC-V: Document that V registers are clobbered on syscalls riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION ...	2023-07-07 10:07:19 -07:00
Marc Zyngier	6259f3443c	risc-v: Fix order of IPI enablement vs RCU startup Conor reports that risc-v tries to enable IPIs before telling the core code to enable RCU. With the introduction of the mapple tree as a backing store for the irq descriptors, this results in a very shouty boot sequence, as RCU is legitimately upset. Restore some sanity by moving the risc_ipi_enable() call after notify_cpu_starting(), which explicitly enables RCU on the calling CPU. Fixes: 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs") Reported-by: Conor Dooley <conor@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230703-dupe-frying-79ae2ccf94eb@spud Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230703183126.1567625-1-maz@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-05 07:24:38 -07:00
Conor Dooley	52909f1768	RISC-V: drop error print from riscv_hartid_to_cpuid() As of commit 2ac874343749 ("RISC-V: split early & late of_node to hartid mapping") my CI complains about newly added pr_err() messages during boot, for example: [ 0.000000] Couldn't find cpu id for hartid [0] [ 0.000000] riscv-intc: unable to find hart id for /cpus/cpu@0/interrupt-controller Before the split, riscv_of_processor_hartid() contained a check for whether the cpu was "available", before calling riscv_hartid_to_cpuid(), but after the split riscv_of_processor_hartid() can be called for cpus that are disabled. Most callers of riscv_hartid_to_cpuid() already report custom errors where it falls, making this print superfluous in those case. In other places, the print adds nothing - see riscv_intc_init() for example. Fixes: 2ac874343749 ("RISC-V: split early & late of_node to hartid mapping") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230629-paternity-grafted-b901b76d04a0@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-04 09:04:12 -07:00
Björn Töpel	9657e9b7d2	riscv: Discard vector state on syscalls The RISC-V vector specification states: Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified. The vector registers are set to all 1s, vill is set (invalid), and the vector status is set to Dirty. That way we can prevent userspace from accidentally relying on the stated save. Rémi pointed out [1] that writing to the registers might be superfluous, and setting vill is sufficient. Link: https://lore.kernel.org/linux-riscv/12784326.9UPPK3MAeB@basile.remlab.net/ # [1] Suggested-by: Darius Rad <darius@bluespec.com> Suggested-by: Palmer Dabbelt <palmer@rivosinc.com> Suggested-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230629142228.1125715-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-04 08:59:24 -07:00
Ben Dooks	54cdede08f	riscv: vdso: include vdso/vsyscall.h for vdso_data Add include of <vdso/vsyscall.h> to pull in the defition of vdso_data to remove the following sparse warning: arch/riscv/kernel/vdso.c:39:18: warning: symbol 'vdso_data' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20230616114357.159601-1-ben.dooks@codethink.co.uk Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-04 07:54:41 -07:00
Andy Chiu	75b59f2a90	riscv: vector: clear V-reg in the first-use trap If there is no context switch happens after we enable V for a process, then we return to user space with whatever left on the CPU's V registers accessible to the process. The leaked data could belong to another process's V-context saved from last context switch, impacting process's confidentiality on the system. To prevent this from happening, we clear V registers by restoring zero'd V context after turining on V. Fixes: cd054837243b ("riscv: Allocate user's vector context in the first-use trap") Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230627015556.12329-2-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-01 07:38:21 -07:00
Andy Chiu	26c38cd802	riscv: vector: only enable interrupts in the first-use trap The function irqentry_exit_to_user_mode() must be called with interrupt disabled. The caller of do_trap_insn_illegal() also assumes running without interrupts. So, we should turn off interrupts after riscv_v_first_use_handler() returns. Fixes: cd054837243b ("riscv: Allocate user's vector context in the first-use trap") Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230625155416.18629-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-01 07:38:20 -07:00
Palmer Dabbelt	782aefb177	Merge patch series "riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION" Jisheng Zhang <jszhang@kernel.org> says: When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the --gc-sections flag to the linker. This not only benefits my case on FPGA but also benefits defconfigs. Here are some notable improvements from enabling this with defconfigs: nommu_k210_defconfig: text data bss dec hex 1112009 410288 59837 1582134 182436 before 962838 376656 51285 1390779 1538bb after rv32_defconfig: text data bss dec hex 8804455 2816544 290577 11911576 b5c198 before 8692295 2779872 288977 11761144 b375f8 after defconfig: text data bss dec hex 9438267 3391332 485333 13314932 cb2b74 before 9285914 3350052 483349 13119315 c82f53 after patch1 and patch2 are clean ups. patch3 fixes a typo. patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. * b4-shazam-merge: riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION vmlinux.lds.h: use correct .init.data.* section name riscv: vmlinux-xip.lds.S: remove .alternative section riscv: move options to keep entries sorted riscv: Fix orphan section warnings caused by kernel/pi Link: https://lore.kernel.org/r/20230523165502.2592-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-07-01 07:38:19 -07:00
Linus Torvalds	cccf0c2ee5	Tracing updates for 6.5: - Add new feature to have function graph tracer record the return value. Adds a new option: funcgraph-retval ; when set, will show the return value of a function in the function graph tracer. - Also add the option: funcgraph-retval-hex where if it is not set, and the return value is an error code, then it will return the decimal of the error code, otherwise it still reports the hex value. - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd That when a application opens it, it becomes the task that the timer lat tracer traces. The application can also read this file to find out how it's being interrupted. - Add the file /sys/kernel/tracing/available_filter_functions_addrs that works just the same as available_filter_functions but also shows the addresses of the functions like kallsyms, except that it gives the address of where the fentry/mcount jump/nop is. This is used by BPF to make it easier to attach BPF programs to ftrace hooks. - Replace strlcpy with strscpy in the tracing boot code. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJy6ixQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnzRAPsEI2YgjaJSHnuPoGRHbrNil6pq66wY LYaLizGI4Jv9BwEAqdSdcYcMiWo1SFBAO8QxEDM++BX3zrRyVgW8ahaTNgs= =TF0C -----END PGP SIGNATURE----- Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add new feature to have function graph tracer record the return value. Adds a new option: funcgraph-retval ; when set, will show the return value of a function in the function graph tracer. - Also add the option: funcgraph-retval-hex where if it is not set, and the return value is an error code, then it will return the decimal of the error code, otherwise it still reports the hex value. - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd That when a application opens it, it becomes the task that the timer lat tracer traces. The application can also read this file to find out how it's being interrupted. - Add the file /sys/kernel/tracing/available_filter_functions_addrs that works just the same as available_filter_functions but also shows the addresses of the functions like kallsyms, except that it gives the address of where the fentry/mcount jump/nop is. This is used by BPF to make it easier to attach BPF programs to ftrace hooks. - Replace strlcpy with strscpy in the tracing boot code. * tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix warnings when building htmldocs for function graph retval riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL tracing/boot: Replace strlcpy with strscpy tracing/timerlat: Add user-space interface tracing/osnoise: Skip running osnoise if all instances are off tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable ftrace: Show all functions with addresses in available_filter_functions_addrs selftests/ftrace: Add funcgraph-retval test case LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex function_graph: Support recording and printing the return value of function fgraph: Add declaration of "struct fgraph_ret_regs"	2023-06-30 10:33:17 -07:00
Linus Torvalds	533925cb76	RISC-V Patches for the 6.5 Merge Window, Part 1 * Support for ACPI. * Various cleanups to the ISA string parsing, including making them case-insensitive * Support for the vector extension. * Support for independent irq/softirq stacks. * Our CPU DT binding now has "unevaluatedProperties: false" -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmSe70ATHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiWNPD/0ZfSdQ0A/gMVOzAD4zFKPEqQ6ffW2V Zy6Jo7UDNqKsiai7QA4XB1uyYIv/y1yUKJ0oeBVcA9Nzyq+TW9QDcApDBTabxAUI agY19YKw6VVZ+p7I9sMsf6EbdJdkNfSAzcQACPxb4ScEoaf9X+oAK5qgXuRuWluh qQuVkkJlgWc/t1cuUkrRdJmHQYvjP3zL7z4o344q2IVpXJkNNu0GeP+HbF8BYKcA +I/TTA5JY3kCIaxkpF2rU6pE6T5T9xrPmRYZ7bZoPUPnbL+M8As/jx3ym52Y4WGp kf8pgkxixOjU64kVJOH66CA8GaOiaAH/ptjQb0ZmCaGrHhr7aOT9HrkX4rU1lS8T stPphfM4gGPcCoPgRqSl+mEhBzjII8maOBLtbricAoQi6efRq8fzoOGaif/QpCbc 6n0LGS4nQPGVyD3rAPfHxxfrlGJR+SsgyDvjZoDhqauFglims14GnK+eBeO8zrui Aj/uuAS63VIYprJWC1NOBJlU2WKZiOGhCANpZ6W6SH21PYn2WjsVILqaGh+WN8ZO KOHxZNaN8fQag0Yg7oNAUb7l6S0DHYtJIksFnFW2Rf2+VT58RAMYRQbpbhr7Tqr+ jLgIR8PkFrBERHE49IqLGhAxGDnNzAUysMRw9pIk7WIre2Jt4wPqUdl+ee+5ErIX jiYfSFZw9q28UA== =Fpq8 -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for ACPI - Various cleanups to the ISA string parsing, including making them case-insensitive - Support for the vector extension - Support for independent irq/softirq stacks - Our CPU DT binding now has "unevaluatedProperties: false" * tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits) riscv: hibernate: remove WARN_ON in save_processor_state dt-bindings: riscv: cpus: switch to unevaluatedProperties: false dt-bindings: riscv: cpus: add a ref the common cpu schema riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK RISC-V: always report presence of extensions formerly part of the base ISA dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support RISC-V: remove decrement/increment dance in ISA string parser RISC-V: rework comments in ISA string parser RISC-V: validate riscv,isa at boot, not during ISA string parsing RISC-V: split early & late of_node to hartid mapping RISC-V: simplify register width check in ISA string parsing perf: RISC-V: Limit the number of counters returned from SBI riscv: replace deprecated scall with ecall riscv: uprobes: Restore thread.bad_cause riscv: mm: try VMA lock-based page fault handling first riscv: mm: Pre-allocate PGD entries for vmalloc/modules area RISC-V: hwprobe: Expose Zba, Zbb, and Zbs RISC-V: Track ISA extensions per hart ...	2023-06-30 09:37:26 -07:00
Linus Torvalds	9244724fbf	A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZb/YTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRoOD/9vAiGI3IhGyZcX/RjXxauSHf8Pmqll 05jUubFi5Vi3tKI1ubMOsnMmJTw2yy5xDyS/iGj7AcbRLq9uQd3iMtsXXHNBzo/X FNxnuWTXYUj0vcOYJ+j4puBumFzzpRCprqccMInH0kUnSWzbnaQCeelicZORAf+w zUYrswK4HpBXHDOnvPw6Z7MYQe+zyDQSwjSftstLyROzu+lCEw/9KUaysY2epShJ wHClxS2XqMnpY4rJ/CmJAlRhD0Plb89zXyo6k9YZYVDWoAcmBZy6vaTO4qoR171L 37ApqrgsksMkjFycCMnmrFIlkeb7bkrYDQ5y+xqC3JPTlYDKOYmITV5fZ83HD77o K7FAhl/CgkPq2Ec+d82GFLVBKR1rijbwHf7a0nhfUy0yMeaJCxGp4uQ45uQ09asi a/VG2T38EgxVdseC92HRhcdd3pipwCb5wqjCH/XdhdlQrk9NfeIeP+TxF4QhADhg dApp3ifhHSnuEul7+HNUkC6U+Zc8UeDPdu5lvxSTp2ooQ0JwaGgC5PJq3nI9RUi2 Vv826NHOknEjFInOQcwvp6SJPfcuSTF75Yx6xKz8EZ3HHxpvlolxZLq+3ohSfOKn 2efOuZO5bEu4S/G2tRDYcy+CBvNVSrtZmCVqSOS039c8quBWQV7cj0334cjzf+5T TRiSzvssbYYmaw== =Y8if -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...	2023-06-26 13:59:56 -07:00
Zhangjin Wu	c828856b51	riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing the user to enable dead code elimination. In order for this to work, ensure that we keep the alternative table by annotating them with KEEP. This boots well on qemu with both rv32_defconfig & rv64 defconfig, but it only shrinks their builds by ~1%, a smaller config is thereforce customized to test this feature: \| rv32 \| rv64 --------\|------------------------\|--------------------- No DCE \| 4460684 \| 4893488 DCE \| 3986716 \| 4376400 Shrink \| 473968 (~10.6%) \| 517088 (~10.5%) The config used above only reserves necessary options to boot on qemu with serial console, more like the size-critical embedded scenes: - rv64 config: https://pastebin.com/crz82T0s - rv32 config: rv64 config + 32-bit.config Here is Jisheng's original commit-msg: When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the --gc-sections flag to the linker. This not only benefits my case on FPGA but also benefits defconfigs. Here are some notable improvements from enabling this with defconfigs: nommu_k210_defconfig: text data bss dec hex 1112009 410288 59837 1582134 182436 before 962838 376656 51285 1390779 1538bb after rv32_defconfig: text data bss dec hex 8804455 2816544 290577 11911576 b5c198 before 8692295 2779872 288977 11761144 b375f8 after defconfig: text data bss dec hex 9438267 3391332 485333 13314932 cb2b74 before 9285914 3350052 483349 13119315 c82f53 after Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Co-developed-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Bin Meng <bmeng@tinylab.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build Link: https://lore.kernel.org/r/20230523165502.2592-5-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-25 16:24:05 -07:00
Jisheng Zhang	cead443a30	riscv: vmlinux-xip.lds.S: remove .alternative section ALTERNATIVE mechanism can't work on XIP, and this is also reflected by below Kconfig dependency: RISCV_ALTERNATIVE ... depends on !XIP_KERNEL ... So there's no .alternative section at all for XIP case, remove it. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build Link: https://lore.kernel.org/r/20230523165502.2592-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-25 16:24:03 -07:00
Song Shuai	91afbaafd6	riscv: hibernate: remove WARN_ON in save_processor_state During hibernation or restoration, freeze_secondary_cpus checks num_online_cpus via BUG_ON, and the subsequent save_processor_state also does the checking with WARN_ON. In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus is not defined, but the sole possible condition to disable CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1. We also don't have to check it in save_processor_state. So remove the unnecessary checking in save_processor_state. Fixes: c0317210012e ("RISC-V: Add arch functions to support hibernation/suspend-to-disk") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230609075049.2651723-4-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-23 10:06:22 -07:00
Palmer Dabbelt	b5e13f3ace	Merge patch series "riscv: Add independent irq/softirq stacks support" guoren@kernel.org <guoren@kernel.org> says: From: Guo Ren <guoren@linux.alibaba.com> This patch series adds independent irq/softirq stacks to decrease the press of the thread stack. Also, add a thread STACK_SIZE config for users to adjust the proper size during compile time. * b4-shazam-merge: riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK Link: https://lore.kernel.org/r/20230614013018.2168426-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-23 10:06:21 -07:00
Palmer Dabbelt	42b89447b6	Merge patch series "ISA string parser cleanups" Conor Dooley <conor@kernel.org> says: From: Conor Dooley <conor.dooley@microchip.com> Here are some bits that were discussed with Drew on the "should we allow caps" threads that I have now created patches for: - splitting of riscv_of_processor_hartid() into two distinct functions, one for use purely during early boot, prior to the establishment of the possible-cpus mask & another to fit the other current use-cases - that then allows us to then completely skip some validation of the hartid in the parser - the biggest diff in the series is a rework of the comments in the parser, as I have mostly found the existing (sparse) ones to not be all that helpful whenever I have to go back and look at it - from writing the comments, I found a conditional doing a bit of a dance that I found counter-intuitive, so I've had a go at making that match what I would expect a little better - `i` implies 4 other extensions, so add them as extensions and set them for the craic. Sure why not like... * b4-shazam-merge: RISC-V: always report presence of extensions formerly part of the base ISA dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support RISC-V: remove decrement/increment dance in ISA string parser RISC-V: rework comments in ISA string parser RISC-V: validate riscv,isa at boot, not during ISA string parsing RISC-V: split early & late of_node to hartid mapping RISC-V: simplify register width check in ISA string parsing Link: https://lore.kernel.org/r/20230607-audacity-overhaul-82bb867a825f@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-23 10:06:20 -07:00
Guo Ren	dd69d07a5a	riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK Add the HAVE_SOFTIRQ_ON_OWN_STACK feature for the IRQ_STACKS config, and the irq and softirq use the same irq_stack of percpu. Tested-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230614013018.2168426-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-22 10:38:36 -07:00
Guo Ren	163e76cc6e	riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK Add independent irq stacks for percpu to prevent kernel stack overflows. It is also compatible with VMAP_STACK by arch_alloc_vmap_stack. Tested-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20230614013018.2168426-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-22 10:38:35 -07:00
Donglin Peng	b97aec082b	riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL The previous patch ("function_graph: Support recording and printing the return value of function") has laid the groundwork for the for the funcgraph-retval, and this modification makes it available on the RISC-V platform. We introduce a new structure called fgraph_ret_regs for the RISC-V platform to hold return registers and the frame pointer. We then fill its content in the return_to_handler and pass its address to the function ftrace_return_to_handler to record the return value. Link: https://lore.kernel.org/linux-trace-kernel/a8d71b12259f90e7e63d0ea654fcac95b0232bbc.1680954589.git.pengdonglin@sangfor.com.cn Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-06-22 10:39:56 -04:00
Conor Dooley	07edc32779	RISC-V: always report presence of extensions formerly part of the base ISA Of these four extensions, two were part of the base ISA when the port was written and are required by the kernel. The other two are implied when `i` is in riscv,isa on DT systems. There's not much that userspace can do with this extra information, but there is no harm in reporting an ISA string that closer resembles the current versions of the specifications either. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230607-nest-collision-5796b6be8be6@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:19 -07:00
Conor Dooley	7816ebc1dd	RISC-V: remove decrement/increment dance in ISA string parser While expanding on the comments in the ISA string parsing code, I noticed that the conditional decrement of `isa` at the end of the loop was a bit odd. The parsing code expects that at the start of the for loop, `isa` will point to the first character of the next unparsed extension. However, depending on what the next extension is, this may not be true. Unless the next extension is a multi-letter extension preceded by an underscore, `isa` will either point to the string's null-terminator or to the first character of the next extension, once the switch statement has been evaluated. Obviously incrementing `isa` at the end of the loop could cause it to increment past the null terminator or miss a single letter extension, so `isa` is conditionally decremented, just so that the loop can increment it again. It's easier to understand the code if, instead of this decrement + increment dance, we instead use a while loop & rely on the handling of individual extension types to leave `isa` pointing to the first character of the next extension. As already mentioned, this won't be the case where the following extension is multi-letter & preceded by an underscore. To handle that, invert the check and increment rather than decrement. Hopefully this eliminates a "huh?!?" moment the next time somebody tries to understand this code. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20230607-estate-left-f20faabefb89@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:17 -07:00
Conor Dooley	6b913e3da8	RISC-V: rework comments in ISA string parser I have found these comments to not be at all helpful whenever I look at the parser. Further, the comments in the default case (single letter parser) are not quite right either. Group the comments into a larger one at the start of each case, that attempts to explain things at a higher level. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230607-headpiece-tannery-83ed5cc4856a@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:16 -07:00
Conor Dooley	069b0d5170	RISC-V: validate riscv,isa at boot, not during ISA string parsing Since riscv_fill_hwcap() now only iterates over possible cpus, the basic validation of whether riscv,isa contains "rv<width>" can be moved to riscv_early_of_processor_hartid(). Further, "ima" support is required by the kernel, so reject any CPU not fitting the bill. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20230607-guts-blurry-67e711acf328@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:15 -07:00
Conor Dooley	2ac8743437	RISC-V: split early & late of_node to hartid mapping Some back and forth with Drew [1] about riscv_fill_hwcap() resulted in the realisation that it is not very useful to parse the DT & perform validation of riscv,isa every time we would like to get the id for a hart. Although it is no longer called in riscv_fill_hwcap(), riscv_of_processor_hartid() is called in several other places. Notably in setup_smp() it forms part of the logic for filling the mask of possible CPUs. Since a possible CPU must have passed this basic validation of riscv,isa, a repeat validation is not required. Rename riscv_of_processor_id() to riscv_early_of_processor_id(), which will be called from setup_smp() & introduce a new riscv_of_processor_id() which makes use of the pre-populated mask of possible cpus. Link: https://lore.kernel.org/linux-riscv/xvdswl3iyikwvamny7ikrxo2ncuixshtg3f6uucjahpe3xpc5c@ud4cz4fkg5dj/ [1] Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20230607-glade-pastel-d8cbd9d9f3c6@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:14 -07:00
Conor Dooley	fed14be476	RISC-V: simplify register width check in ISA string parsing Saving off the `isa` pointer to a temp variable, followed by checking if it has been incremented is a bit of an odd pattern. Perhaps it was done to avoid a funky looking if statement mixed with the ifdeffery. Now that we use IS_ENABLED() here just return from the parser as soon as we detect a mismatch between the string and the currently running kernel. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20230607-splatter-bacterium-a75bb9f0d0b7@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2023-06-21 07:45:13 -07:00

1 2 3 4 5 ...

996 Commits