linux

iv/linux

Author	SHA1	Message	Date
Nam Cao	ab4458bc32	riscv: fix overlap of allocated page and PTR_ERR [ Upstream commit 994af1825a2aa286f4903ff64a1c7378b52defe6 ] On riscv32, it is possible for the last page in virtual address space (0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that shouldn't happen. There is already some code to ensure memblock won't allocate the last page. However, buddy allocator is left unchecked. Fix this by reserving physical memory that would be mapped at virtual addresses greater than 0xfffff000. Reported-by: Björn Töpel <bjorn@kernel.org> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://lore.kernel.org/r/20240425115201.3044202-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:38 +02:00
Jisheng Zhang	c1cb08c5a1	riscv: mm: init: try best to use IS_ENABLED(CONFIG_64BIT) instead of #ifdef [ Upstream commit 07aabe8fb6d1ac3163cc74c856521f2ee746270b ] Try our best to replace the conditional compilation using "#ifdef CONFIG_64BIT" by a check for "IS_ENABLED(CONFIG_64BIT)", to simplify the code and to increase compile coverage. Now we can also remove the __maybe_unused used in max_mapped_addr declaration. We also remove the BUG_ON check of mapping the last 4K bytes of the addressable memory since this is always true for every kernel actually. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Stable-dep-of: 994af1825a2a ("riscv: fix overlap of allocated page and PTR_ERR") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:38 +02:00
Matthew Bystrin	b8d78a7573	riscv: stacktrace: fixed walk_stackframe() [ Upstream commit a2a4d4a6a0bf5eba66f8b0b32502cc20d82715a0 ] If the load access fault occures in a leaf function (with CONFIG_FRAME_POINTER=y), when wrong stack trace will be displayed: [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c ---[ end trace 0000000000000000 ]--- Registers dump: ra 0xffffffff80485758 <regmap_mmio_read+36> sp 0xffffffc80200b9a0 fp 0xffffffc80200b9b0 pc 0xffffffff804853ba <regmap_mmio_read32le+6> Stack dump: 0xffffffc80200b9a0: 0xffffffc80200b9e0 0xffffffc80200b9e0 0xffffffc80200b9b0: 0xffffffff8116d7e8 0x0000000000000100 0xffffffc80200b9c0: 0xffffffd8055b9400 0xffffffd8055b9400 0xffffffc80200b9d0: 0xffffffc80200b9f0 0xffffffff8047c526 0xffffffc80200b9e0: 0xffffffc80200ba30 0xffffffff8047fe9a The assembler dump of the function preambula: add sp,sp,-16 sd s0,8(sp) add s0,sp,16 In the fist stack frame, where ra is not stored on the stack we can observe: 0(sp) 8(sp) .---------------------------------------------. sp->\| frame->fp \| frame->ra (saved fp) \| \|---------------------------------------------\| fp->\| .... \| .... \| \|---------------------------------------------\| \| \| \| and in the code check is performed: if (regs && (regs->epc == pc) && (frame->fp & 0x7)) I see no reason to check frame->fp value at all, because it is can be uninitialized value on the stack. A better way is to check frame->ra to be an address on the stack. After the stacktrace shows as expect: [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c [<ffffffff80485758>] regmap_mmio_read+0x24/0x52 [<ffffffff8047c526>] _regmap_bus_reg_read+0x1a/0x22 [<ffffffff8047fe9a>] _regmap_read+0x5c/0xea [<ffffffff80480376>] _regmap_update_bits+0x76/0xc0 ... ---[ end trace 0000000000000000 ]--- As pointed by Samuel Holland it is incorrect to remove check of the stackframe entirely. Changes since v2 [2]: - Add accidentally forgotten curly brace Changes since v1 [1]: - Instead of just dropping frame->fp check, replace it with validation of frame->ra, which should be a stack address. - Move frame pointer validation into the separate function. [1] https://lore.kernel.org/linux-riscv/20240426072701.6463-1-dev.mbstr@gmail.com/ [2] https://lore.kernel.org/linux-riscv/20240521131314.48895-1-dev.mbstr@gmail.com/ Fixes: f766f77a74f5 ("riscv/stacktrace: Fix stack output without ra on the stack top") Signed-off-by: Matthew Bystrin <dev.mbstr@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240521191727.62012-1-dev.mbstr@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:48 +02:00
Guo Ren	aae5f57c43	riscv: stacktrace: Make walk_stackframe cross pt_regs frame [ Upstream commit 7ecdadf7f8c659524f6b2aebf6be7bf619764d90 ] The current walk_stackframe with FRAME_POINTER would stop unwinding at ret_from_exception: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc The optimization would help walk_stackframe cross the pt_regs frame and get more backtrace of debug info: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72 [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe00033f44a>] vma_link+0x54/0x160 [<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0 [<ffffffe000342256>] do_mmap+0x2d8/0x3ac [<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8 [<ffffffe00032638a>] vm_mmap+0x2a/0x36 [<ffffffe0003cfdde>] elf_map+0x72/0x84 [<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8 [<ffffffe000376240>] bprm_execve+0x246/0x53a [<ffffffe00037786c>] kernel_execve+0xe8/0x124 [<ffffffe000aecdf2>] run_init_process+0xfa/0x10c [<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c [<ffffffe000afa920>] kernel_init+0xb4/0xf8 [<ffffffe000201b80>] ret_from_exception+0x0/0xc Here is the error injection test code for the above output: drivers/irqchip/irq-riscv-intc.c: static asmlinkage void riscv_intc_irq(struct pt_regs regs) { unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG; + u32 tmp; __get_user(tmp, (u32 )0); Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org [Palmer: use SYM_CODE_*] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Stable-dep-of: a2a4d4a6a0bf ("riscv: stacktrace: fixed walk_stackframe()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:47 +02:00
Samuel Holland	52e8a42b11	riscv: Fix TASK_SIZE on 64-bit NOMMU [ Upstream commit 6065e736f82c817c9a597a31ee67f0ce4628e948 ] On NOMMU, userspace memory can come from anywhere in physical RAM. The current definition of TASK_SIZE is wrong if any RAM exists above 4G, causing spurious failures in the userspace access routines. Fixes: 6bd33e1ece52 ("riscv: add nommu support") Fixes: c3f896dcf1e4 ("mm: switch the test_vmalloc module to use __vmalloc_node") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Bo Gan <ganboing@gmail.com> Link: https://lore.kernel.org/r/20240227003630.3634533-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-05-02 16:24:50 +02:00
Baoquan He	83c5c0e3cd	riscv: fix VMALLOC_START definition [ Upstream commit ac88ff6b9d7dea9f0907c86bdae204dde7d5c0e6 ] When below config items are set, compiler complained: -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y ...... ----------------------- ------------------------------------------------------------------- arch/riscv/kernel/crash_core.c: In function 'arch_crash_save_vmcoreinfo': arch/riscv/kernel/crash_core.c:11:58: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat=] 11 \| vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); \| ~~^ \| \| \| long unsigned int \| %x ---------------------------------------------------------------------- This is because on riscv macro VMALLOC_START has different type when CONFIG_MMU is set or unset. arch/riscv/include/asm/pgtable.h: -------------------------------------------------- Changing it to _AC(0, UL) in case CONFIG_MMU=n can fix the warning. Link: https://lkml.kernel.org/r/ZW7OsX4zQRA3mO4+@MiWiFi-R3L-srv Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 6065e736f82c ("riscv: Fix TASK_SIZE on 64-bit NOMMU") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-05-02 16:24:50 +02:00
Stefan O'Rear	dff6072124	riscv: process: Fix kernel gp leakage commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream. childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means. [From the email thread] The /* Kernel thread / comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. childregs is the user* context during syscall execution and it is observable from userspace in at least five ways: 1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment. This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch. 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries. 3. /proc//task//syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns. 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code. 5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Cc: stable@vger.kernel.org Signed-off-by: Stefan O'Rear <sorear@fastmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-04-10 16:19:42 +02:00
Samuel Holland	fd9662109d	riscv: Fix spurious errors from __get/put_kernel_nofault commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream. These macros did not initialize __kr_err, so they could fail even if the access did not fault. Cc: stable@vger.kernel.org Fixes: d464118cdc41 ("riscv: implement __get_kernel_nofault and __put_user_nofault") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-04-10 16:19:42 +02:00
Conor Dooley	7e13a78e2b	riscv: dts: sifive: add missing #interrupt-cells to pmic [ Upstream commit ce6b6d1513965f500a05f3facf223fa01fd74920 ] At W=2 dtc complains: hifive-unmatched-a00.dts:120.10-238.4: Warning (interrupt_provider): /soc/i2c@10030000/pmic@58: Missing '#interrupt-cells' in interrupt provider Add the missing property. Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-26 18:21:12 -04:00
Dimitris Vlachos	5941a90c55	riscv: Sparse-Memory/vmemmap out-of-bounds fix [ Upstream commit a11dd49dcb9376776193e15641f84fcc1e5980c9 ] Offset vmemmap so that the first page of vmemmap will be mapped to the first page of physical memory in order to ensure that vmemmap’s bounds will be respected during pfn_to_page()/page_to_pfn() operations. The conversion macros will produce correct SV39/48/57 addresses for every possible/valid DRAM_BASE inside the physical memory limits. v2:Address Alex's comments Suggested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Dimitris Vlachos <dvlachos@ics.forth.gr> Reported-by: Dimitris Vlachos <dvlachos@ics.forth.gr> Closes: https://lore.kernel.org/linux-riscv/20240202135030.42265-1-csd4492@csd.uoc.gr Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240229191723.32779-1-dvlachos@ics.forth.gr Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-06 14:38:48 +00:00
Heiko Stuebner	d93c68661a	RISC-V: fix funct4 definition for c.jalr in parse_asm.h [ Upstream commit a3775634f6da23f5511d0282d7e792cf606e5f3b ] The opcode definition for c.jalr is c.jalr c_rs1_n0 1..0=2 15..13=4 12=1 6..2=0 This means funct4 consisting of bit [15:12] is 1001b, so the value is 0x9. Fixes: edde5584c7ab ("riscv: Add SW single-step support for KDB") Reported-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20221223221332.4127602-2-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:21:50 +01:00
Arnd Bergmann	df81cbcd26	arch: consolidate arch_irq_work_raise prototypes [ Upstream commit 64bac5ea17d527872121adddfee869c7a0618f8f ] The prototype was hidden in an #ifdef on x86, which causes a warning: kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes] Some architectures have a working prototype, while others don't. Fix this by providing it in only one place that is always visible. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-02-23 08:54:39 +01:00
Alexandre Ghiti	c1669b54c3	riscv: Fix module_alloc() that did not reset the linear mapping permissions [ Upstream commit 749b94b08005929bbc636df21a23322733166e35 ] After unloading a module, we must reset the linear mapping permissions, see the example below: Before unloading a module: 0xffffaf809d65d000-0xffffaf809d6dc000 0x000000011d65d000 508K PTE . .. .. D A G . . W R V 0xffffaf809d6dc000-0xffffaf809d6dd000 0x000000011d6dc000 4K PTE . .. .. D A G . . . R V 0xffffaf809d6dd000-0xffffaf809d6e1000 0x000000011d6dd000 16K PTE . .. .. D A G . . W R V 0xffffaf809d6e1000-0xffffaf809d6e7000 0x000000011d6e1000 24K PTE . .. .. D A G . X . R V After unloading a module: 0xffffaf809d65d000-0xffffaf809d6e1000 0x000000011d65d000 528K PTE . .. .. D A G . . W R V 0xffffaf809d6e1000-0xffffaf809d6e7000 0x000000011d6e1000 24K PTE . .. .. D A G . X W R V The last mapping is not reset and we end up with WX mappings in the linear mapping. So add VM_FLUSH_RESET_PERMS to our module_alloc() definition. Fixes: 0cff8bff7af8 ("riscv: avoid the PIC offset of static percpu data in module beyond 2G limits") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213134027.155327-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-25 14:52:50 -08:00
Alexandre Ghiti	938f70d146	riscv: Check if the code to patch lies in the exit section [ Upstream commit 420370f3ae3d3b883813fd3051a38805160b2b9f ] Otherwise we fall through to vmalloc_to_page() which panics since the address does not lie in the vmalloc region. Fixes: 043cb41a85de ("riscv: introduce interfaces to patch kernel code") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231214091926.203439-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-25 14:52:50 -08:00
Clément Léger	940a7bcd4f	riscv: fix misaligned access handling of C.SWSP and C.SDSP [ Upstream commit 22e0eb04837a63af111fae35a92f7577676b9bc8 ] This is a backport of a fix that was done in OpenSBI: ec0559eb315b ("lib: sbi_misaligned_ldst: Fix handling of C.SWSP and C.SDSP"). Unlike C.LWSP/C.LDSP, these encodings can be used with the zero register, so checking that the rs2 field is non-zero is unnecessary. Additionally, the previous check was incorrect since it was checking the immediate field of the instruction instead of the rs2 field. Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231103090223.702340-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:36:41 +01:00
Nam Cao	070b3ccb9b	riscv: kprobes: allow writing to x0 commit 8cb22bec142624d21bc85ff96b7bad10b6220e6a upstream. Instructions can write to x0, so we should simulate these instructions normally. Currently, the kernel hangs if an instruction who writes to x0 is simulated. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Cc: stable@vger.kernel.org Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230829182500.61875-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 16:56:35 +00:00
Björn Töpel	72ef708865	riscv, bpf: Sign-extend return values [ Upstream commit 2f1b0d3d733169eb11680bfa97c266ae5e757148 ] The RISC-V architecture does not expose sub-registers, and hold all 32-bit values in a sign-extended format [1] [2]: \| The compiler and calling convention maintain an invariant that all \| 32-bit values are held in a sign-extended format in 64-bit \| registers. Even 32-bit unsigned integers extend bit 31 into bits \| 63 through 32. Consequently, conversion between unsigned and \| signed 32-bit integers is a no-op, as is conversion from a signed \| 32-bit integer to a signed 64-bit integer. While BPF, on the other hand, exposes sub-registers, and use zero-extension (similar to arm64/x86). This has led to some subtle bugs, where a BPF JITted program has not sign-extended the a0 register (return value in RISC-V land), passed the return value up the kernel, e.g.: \| int from_bpf(void); \| \| long foo(void) \| { \| return from_bpf(); \| } Here, a0 would be 0xffff_ffff, instead of the expected 0xffff_ffff_ffff_ffff. Internally, the RISC-V JIT uses a5 as a dedicated register for BPF return values. Keep a5 zero-extended, but explicitly sign-extend a0 (which is used outside BPF land). Now that a0 (RISC-V ABI) and a5 (BPF ABI) differs, a0 is only moved to a5 for non-BPF native calls (BPF_PSEUDO_CALL). Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-056b6ff-2023-10-02/unpriv-isa-asciidoc.pdf # [2] Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf # [2] Link: https://lore.kernel.org/bpf/20231004120706.52848-2-bjorn@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-10-19 23:05:34 +02:00
Pu Lehui	7795592e08	riscv, bpf: Factor out emit_call for kernel and bpf context [ Upstream commit 0fd1fd0104954380477353aea29c347e85dff16d ] The current emit_call function is not suitable for kernel function call as it store return value to bpf R0 register. We can separate it out for common use. Meanwhile, simplify judgment logic, that is, fixed function address can use jal or auipc+jalr, while the unfixed can use only auipc+jalr. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com Stable-dep-of: 2f1b0d3d7331 ("riscv, bpf: Sign-extend return values") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-10-19 23:05:34 +02:00
Alexandre Ghiti	e9b8ee715d	riscv: uaccess: Return the number of bytes effectively not copied [ Upstream commit 4b05b993900dd3eba0fc83ef5c5ddc7d65d786c6 ] It was reported that the riscv kernel hangs while executing the test in [1]. Indeed, the test hangs when trying to write a buffer to a file. The problem is that the riscv implementation of raw_copy_from_user() does not return the correct number of bytes not written when an exception happens and is fixed up, instead it always returns the initial size to copy, even if some bytes were actually copied. generic_perform_write() pre-faults the user pages and bails out if nothing can be written, otherwise it will access the userspace buffer: here the riscv implementation keeps returning it was not able to copy any byte though the pre-faulting indicates otherwise. So generic_perform_write() keeps retrying to access the user memory and ends up in an infinite loop. Note that before the commit mentioned in [1] that introduced this regression, it worked because generic_perform_write() would bail out if only one byte could not be written. So fix this by returning the number of bytes effectively not written in __asm_copy_[to\|from]_user() and __clear_user(), as it is expected. Link: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/ [1] Fixes: ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code") Reported-by: Bo YU <tsu.yubo@gmail.com> Closes: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/#t Reported-by: Aurelien Jarno <aurelien@aurel32.net> Closes: https://lore.kernel.org/linux-riscv/ZNOnCakhwIeue3yr@aurel32.net/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Link: https://lore.kernel.org/r/20230811150604.1621784-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:36 +02:00
Andrea Parri	bcd9eeb3a3	riscv,mmio: Fix readX()-to-delay() ordering commit 4eb2eb1b4c0eb07793c240744843498564a67b83 upstream. Section 2.1 of the Platform Specification [1] states: Unless otherwise specified by a given I/O device, I/O devices are on ordering channel 0 (i.e., they are point-to-point strongly ordered). which is not sufficient to guarantee that a readX() by a hart completes before a subsequent delay() on the same hart (cf. memory-barriers.txt, "Kernel I/O barrier effects"). Set the I(nput) bit in __io_ar() to restore the ordering, align inline comments. [1] https://github.com/riscv/riscv-platform-specs Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20230803042738.5937-1-parri.andrea@gmail.com Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:21:58 +02:00
Jisheng Zhang	8a128e601f	riscv: mm: fix truncation warning on RV32 [ Upstream commit b690e266dae2f85f4dfea21fa6a05e3500a51054 ] lkp reports below sparse warning when building for RV32: arch/riscv/mm/init.c:1204:48: sparse: warning: cast truncates bits from constant value (100000000 becomes 0) IMO, the reason we didn't see this truncates bug in real world is "0" means MEMBLOCK_ALLOC_ACCESSIBLE in memblock and there's no RV32 HW with more than 4GB memory. Fix it anyway to make sparse happy. Fixes: decf89f86ecd ("riscv: try to allocate crashkern region from 32bit addressible memory") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306080034.SLiCiOMn-lkp@intel.com/ Link: https://lore.kernel.org/r/20230709171036.1906-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-23 13:47:45 +02:00
Björn Töpel	567397dd8e	riscv, bpf: Fix inconsistent JIT image generation [ Upstream commit c56fb2aab23505bb7160d06097c8de100b82b851 ] In order to generate the prologue and epilogue, the BPF JIT needs to know which registers that are clobbered. Therefore, the during pre-final passes, the prologue is generated after the body of the program body-prologue-epilogue. Then, in the final pass, a proper prologue-body-epilogue JITted image is generated. This scheme has worked most of the time. However, for some large programs with many jumps, e.g. the test_kmod.sh BPF selftest with hardening enabled (blinding constants), this has shown to be incorrect. For the final pass, when the proper prologue-body-epilogue is generated, the image has not converged. This will lead to that the final image will have incorrect jump offsets. The following is an excerpt from an incorrect image: \| ... \| 3b8: 00c50663 beq a0,a2,3c4 <.text+0x3c4> \| 3bc: 0020e317 auipc t1,0x20e \| 3c0: 49630067 jalr zero,1174(t1) # 20e852 <.text+0x20e852> \| ... \| 20e84c: 8796 c.mv a5,t0 \| 20e84e: 6422 c.ldsp s0,8(sp) # Epilogue start \| 20e850: 6141 c.addi16sp sp,16 \| 20e852: 853e c.mv a0,a5 # Incorrect jump target \| 20e854: 8082 c.jr ra The image has shrunk, and the epilogue offset is incorrect in the final pass. Correct the problem by always generating proper prologue-body-epilogue outputs, which means that the first pass will only generate the body to track what registers that are touched. Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230710074131.19596-1-bjorn@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-23 13:47:44 +02:00
Pu Lehui	4e4e1f99bb	bpf, riscv: Support riscv jit to provide bpf_line_info [ Upstream commit 3cb70413041fdf028fa1ba3986fd0c6aec9e3dcb ] Add support for riscv jit to provide bpf_line_info. We need to consider the prologue offset in ctx->offset, but unlike x86 and arm64, ctx->offset of riscv does not provide an extra slot for the prologue, so here we just calculate the len of prologue and add it to ctx->offset at the end. Both RV64 and RV32 have been tested. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220530092815.1112406-3-pulehui@huawei.com Stable-dep-of: c56fb2aab235 ("riscv, bpf: Fix inconsistent JIT image generation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-23 13:47:43 +02:00
Woody Zhang	a4284246fc	riscv: move memblock_allow_resize() after linear mapping is ready [ Upstream commit 85fadc0d04119c2fe4a20287767ab904c6d21ba1 ] The initial memblock metadata is accessed from kernel image mapping. The regions arrays need to "reallocated" from memblock and accessed through linear mapping to cover more memblock regions. So the resizing should not be allowed until linear mapping is ready. Note that there are memblock allocations when building linear mapping. This patch is similar to 24cc61d8cb5a ("arm64: memblock: don't permit memblock resizing until linear mapping is up"). In following log, many memblock regions are reserved before create_linear_mapping_page_table(). And then it triggered reallocation of memblock.reserved.regions and memcpy the old array in kernel image mapping to the new array in linear mapping which caused a page fault. [ 0.000000] memblock_reserve: [0x00000000bf01f000-0x00000000bf01ffff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf021000-0x00000000bf021fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf023000-0x00000000bf023fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf025000-0x00000000bf025fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf027000-0x00000000bf027fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf029000-0x00000000bf029fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02b000-0x00000000bf02bfff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02d000-0x00000000bf02dfff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02f000-0x00000000bf02ffff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf030000-0x00000000bf030fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008007ffff (512 KiB) map non-reusable mmode_resv0@80000000 [ 0.000000] memblock_reserve: [0x00000000bf000000-0x00000000bf001fed] paging_init+0x19a/0x5ae [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128 [ 0.000000] memblock: reserved is doubled to 256 at [0x000000017fffd000-0x000000017fffe7ff] [ 0.000000] Unable to handle kernel paging request at virtual address ff600000ffffd000 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc1-00011-g99a670b2069c #66 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __memcpy+0x60/0xf8 [ 0.000000] ra : memblock_double_array+0x192/0x248 [ 0.000000] epc : ffffffff8081d214 ra : ffffffff80a3dfc0 sp : ffffffff81403bd0 [ 0.000000] gp : ffffffff814fbb38 tp : ffffffff8140dac0 t0 : 0000000001600000 [ 0.000000] t1 : 0000000000000000 t2 : 000000008f001000 s0 : ffffffff81403c60 [ 0.000000] s1 : ffffffff80c0bc98 a0 : ff600000ffffd000 a1 : ffffffff80c0bcd8 [ 0.000000] a2 : 0000000000000c00 a3 : ffffffff80c0c8d8 a4 : 0000000080000000 [ 0.000000] a5 : 0000000000080000 a6 : 0000000000000000 a7 : 0000000080200000 [ 0.000000] s2 : ff600000ffffd000 s3 : 0000000000002000 s4 : 0000000000000c00 [ 0.000000] s5 : ffffffff80c0bc60 s6 : ffffffff80c0bcc8 s7 : 0000000000000000 [ 0.000000] s8 : ffffffff814fd0a8 s9 : 000000017fffe7ff s10: 0000000000000000 [ 0.000000] s11: 0000000000001000 t3 : 0000000000001000 t4 : 0000000000000000 [ 0.000000] t5 : 000000008f003000 t6 : ff600000ffffd000 [ 0.000000] status: 0000000200000100 badaddr: ff600000ffffd000 cause: 000000000000000f [ 0.000000] [<ffffffff8081d214>] __memcpy+0x60/0xf8 [ 0.000000] [<ffffffff80a3e1a2>] memblock_add_range.isra.14+0x12c/0x162 [ 0.000000] [<ffffffff80a3e36a>] memblock_reserve+0x6e/0x8c [ 0.000000] [<ffffffff80a123fc>] memblock_alloc_range_nid+0xb8/0x128 [ 0.000000] [<ffffffff80a1256a>] memblock_phys_alloc_range+0x5e/0x6a [ 0.000000] [<ffffffff80a04732>] alloc_pmd_fixmap+0x14/0x1c [ 0.000000] [<ffffffff80a0475a>] alloc_p4d_fixmap+0xc/0x14 [ 0.000000] [<ffffffff80a04a36>] create_pgd_mapping+0x98/0x17c [ 0.000000] [<ffffffff80a04e9e>] create_linear_mapping_range.constprop.10+0xe4/0x112 [ 0.000000] [<ffffffff80a05bb8>] paging_init+0x3ec/0x5ae [ 0.000000] [<ffffffff80a03354>] setup_arch+0xb2/0x576 [ 0.000000] [<ffffffff80a00726>] start_kernel+0x72/0x57e [ 0.000000] Code: b303 0285 b383 0305 be03 0385 be83 0405 bf03 0485 (b023) 00ef [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Fixes: 671f9a3e2e24 ("RISC-V: Setup initial page tables in two stages") Signed-off-by: Woody Zhang <woodylab@foxmail.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Link: https://lore.kernel.org/r/tencent_FBB94CE615C5CCE7701CD39C15CCE0EE9706@qq.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-23 13:47:30 +02:00
Tiezhu Yang	fafeeb398d	riscv: uprobes: Restore thread.bad_cause [ Upstream commit 58b1294dd1d65bb62f08dddbf418f954210c2057 ] thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation is meaningless, this change is similar with x86 and powerpc. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Fixes: 74784081aac8 ("riscv: Add uprobes supported") Link: https://lore.kernel.org/r/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-23 13:47:13 +02:00
Ruan Jinjie	952d1e4cbc	riscv: fix kprobe __user string arg print fault issue [ Upstream commit 99a670b2069c725a7b50318aa681d9cae8f89325 ] On riscv qemu platform, when add kprobe event on do_sys_open() to show filename string arg, it just print fault as follow: echo 'p:myprobe do_sys_open dfd=$arg1 filename=+0($arg2):string flags=$arg3 mode=$arg4' > kprobe_events bash-166 [000] ...1. 360.195367: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6 bash-166 [000] ...1. 360.219369: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6 bash-191 [000] ...1. 360.378827: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename=(fault) flags=0x98800 mode=0x0 As riscv do not select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE, the +0($arg2) addr is processed as a kernel address though it is a userspace address, cause the above filename=(fault) print. So select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to avoid the issue, after that the kprobe trace is ok as below: bash-166 [000] ...1. 96.767641: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6 bash-166 [000] ...1. 96.793751: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6 bash-177 [000] ...1. 96.962354: myprobe: (do_sys_open+0x0/0x84) dfd=0xffffffffffffff9c filename="/sys/kernel/debug/tracing/events/kprobes/" flags=0x98800 mode=0x0 Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") Link: https://lore.kernel.org/r/20230504072910.3742842-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-14 11:13:09 +02:00
Alexandre Ghiti	3a7793ae69	riscv: Fix unused variable warning when BUILTIN_DTB is set [ Upstream commit 33d418da6f476b15e4510e0a590062583f63cd36 ] commit ef69d2559fe9 ("riscv: Move early dtb mapping into the fixmap region") wrongly moved the #ifndef CONFIG_BUILTIN_DTB surrounding the pa variable definition in create_fdt_early_page_table(), so move it back to its right place to quiet the following warning: ../arch/riscv/mm/init.c: In function ‘create_fdt_early_page_table’: ../arch/riscv/mm/init.c:925:12: warning: unused variable ‘pa’ [-Wunused-variable] 925 \| uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1); Fixes: ef69d2559fe9 ("riscv: Move early dtb mapping into the fixmap region") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230519131311.391960-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-09 10:32:15 +02:00
Alexandre Ghiti	de9a3ed423	RISC-V: Fix up a cherry-pick warning in setup_vm_final() This triggers a -Wdeclaration-after-statement as the code has changed a bit since upstream. It might be better to hoist the whole block up, but this is a smaller change so I went with it. arch/riscv/mm/init.c:755:16: warning: mixing declarations and code is a C99 extension [-Wdeclaration-after-statement] unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT)); ^ 1 warning generated. Fixes: bbf94b042155 ("riscv: Move early dtb mapping into the fixmap region") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304300429.SXZOA5up-lkp@intel.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-17 11:50:31 +02:00
Sia Jee Heng	59bf62f0ed	RISC-V: mm: Enable huge page support to kernel_page_present() function [ Upstream commit a15c90b67a662c75f469822a7f95c7aaa049e28f ] Currently kernel_page_present() function doesn't support huge page detection causes the function to mistakenly return false to the hibernation core. Add huge page detection to the function to solve the problem. Fixes: 9e953cda5cdf ("riscv: Introduce huge page support for 32/64bit kernel") Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230330064321.1008373-4-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-17 11:50:16 +02:00
Song Shuai	b91a5aa1e7	riscv: mm: remove redundant parameter of create_fdt_early_page_table commit e4ef93edd4e0b022529303db1915766ff9de450e upstream. create_fdt_early_page_table() explicitly uses early_pg_dir for 32-bit fdt mapping and the pgdir parameter is redundant here. So remove it and its caller. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Fixes: ef69d2559fe9 ("riscv: Move early dtb mapping into the fixmap region") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230426100009.685435-1-suagrfillet@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-11 23:00:18 +09:00
Alexandre Ghiti	cab0f98503	riscv: No need to relocate the dtb as it lies in the fixmap region commit 1b50f956c8fe9082bdee4a9cfd798149c52f7043 upstream. We used to access the dtb via its linear mapping address but now that the dtb early mapping was moved in the fixmap region, we can keep using this address since it is present in swapper_pg_dir, and remove the dtb relocation. Note that the relocation was wrong anyway since early_memremap() is restricted to 256K whereas the maximum fdt size is 2MB. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230329081932.79831-4-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-01 08:23:24 +09:00
Alexandre Ghiti	1f09c9bab7	riscv: Do not set initial_boot_params to the linear address of the dtb commit f1581626071c8e37c58c5e8f0b4126b17172a211 upstream. early_init_dt_verify() is already called in parse_dtb() and since the dtb address does not change anymore (it is now in the fixmap region), no need to reset initial_boot_params by calling early_init_dt_verify() again. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-01 08:23:24 +09:00
Alexandre Ghiti	bbf94b0421	riscv: Move early dtb mapping into the fixmap region commit ef69d2559fe91f23d27a3d6fd640b5641787d22e upstream. riscv establishes 2 virtual mappings: - early_pg_dir maps the kernel which allows to discover the system memory - swapper_pg_dir installs the final mapping (linear mapping included) We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this mapping was not carried over in swapper_pg_dir. It happens that early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is setup otherwise we could allocate reserved memory defined in the dtb. And this function initializes reserved_mem variable with addresses that lie in the early_pg_dir dtb mapping: when those addresses are reused with swapper_pg_dir, this mapping does not exist and then we trap. The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is set up otherwise we could allocate in reserved memory defined in the dtb. So move the dtb mapping in the fixmap region which is established in early_pg_dir and handed over to swapper_pg_dir. This patch had to be backported because: - the documentation for sv57 is not present here (as sv48/57 are not present) - handling of sv48/57 is not needed (as not present) Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob") Fixes: 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap") Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup") Reported-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-01 08:23:24 +09:00
Mathis Salmen	a42f565c0e	riscv: add icache flush for nommu sigreturn trampoline commit 8d736482749f6d350892ef83a7a11d43cd49981e upstream. In a NOMMU kernel, sigreturn trampolines are generated on the user stack by setup_rt_frame. Currently, these trampolines are not instruction fenced, thus their visibility to ifetch is not guaranteed. This patch adds a flush_icache_range in setup_rt_frame to fix this problem. Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de> Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-20 12:13:55 +02:00
Nathan Chancellor	9c7ee94715	riscv: Handle zicsr/zifencei issues between clang and binutils commit e89c2e815e76471cb507bd95728bf26da7976430 upstream. There are two related issues that appear in certain combinations with clang and GNU binutils. The first occurs when a version of clang that supports zicsr or zifencei via '-march=' [1] (i.e, >= 17.x) is used in combination with a version of GNU binutils that do not recognize zicsr and zifencei in the '-march=' value (i.e., < 2.36): riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o The second occurs when a version of clang that does not support zicsr or zifencei via '-march=' (i.e., <= 16.x) is used in combination with a version of GNU as that defaults to a newer ISA base spec, which requires specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >= 2.38): ../arch/riscv/kernel/kexec_relocate.S: Assembler messages: ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required clang-12: error: assembler command failed with exit code 1 (use -v to see invocation) This is the same issue addressed by commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") (see [2] for additional information) but older versions of clang miss out on it because the cc-option check fails: clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' To resolve the first issue, only attempt to add zicsr and zifencei to the march string when using the GNU assembler 2.38 or newer, which is when the default ISA spec was updated, requiring these extensions to be specified explicitly. LLVM implements an older version of the base specification for all currently released versions, so these instructions are available as part of the 'i' extension. If LLVM's implementation is updated in the future, a CONFIG_AS_IS_LLVM condition can be added to CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. To resolve the second issue, use version 2.2 of the base ISA spec when using an older version of clang that does not support zicsr or zifencei via '-march=', as that is the spec version most compatible with the one clang/LLVM implements and avoids the need to specify zicsr and zifencei explicitly due to still being a part of 'i'. [1]: `22e199e6af` [2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/ Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1808 Co-developed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-30 12:47:59 +02:00
Dylan Jhong	c100236820	riscv: mm: Fix incorrect ASID argument when flushing TLB commit 9a801afd3eb95e1a89aba17321062df06fb49d98 upstream. Currently, we pass the CONTEXTID instead of the ASID to the TLB flush function. We should only take the ASID field to prevent from touching the reserved bit field. Fixes: 3f1e782998cd ("riscv: add ASID-based tlbflushing methods") Signed-off-by: Dylan Jhong <dylan@andestech.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Link: https://lore.kernel.org/r/20230313034906.2401730-1-dylan@andestech.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-30 12:47:59 +02:00
Alexandre Ghiti	a4c639012a	riscv: Bump COMMAND_LINE_SIZE value to 1024 [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] Increase COMMAND_LINE_SIZE as the current default value is too low for syzbot kernel command line. There has been considerable discussion on this patch that has led to a larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all ports. That's not quite done yet, but it's gotten far enough we're confident this is not a uABI change so this is safe. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr [Palmer: it's not uabi] Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:47:52 +02:00
Guo Ren	d1d8269544	riscv: asid: Fixup stale TLB entry cause application crash commit 82dd33fde0268cc622d3d1ac64971f3f61634142 upstream. After use_asid_allocator is enabled, the userspace application will crash by stale TLB entries. Because only using cpumask_clear_cpu without local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh. Then set_mm_asid would cause the user space application to get a stale value by stale TLB entry, but set_mm_noasid is okay. Here is the symptom of the bug: unhandled signal 11 code 0x1 (coredump) 0x0000003fd6d22524 <+4>: auipc s0,0x70 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 => 0x0000003fd6d2252c <+12>: ld a5,0(s0) (gdb) i r s0 s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 (gdb) x /2x 0x3fd6d92490 0x3fd6d92490: 0xd80ac8a8 0x0000003f The core dump file shows that register s0 is wrong, but the value in memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry in TLB and got a wrong result from an incorrect physical address. When the task ran on CPU0, which loaded/speculative-loaded the value of address(0x3fd6d92490), then the first version of the mapping entry was PTWed into CPU0's TLB. When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by asid), it happened to write a value on the address (0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous switch_mm. So we only flushed the CPU1 TLB and set the second version mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0 still used a stale TLB mapping entry which contained a wrong target physical address. It raised a bug when the task happened to read that value. CPU0 CPU1 - switch 'task' in - read addr (Fill stale mapping entry into TLB) - switch 'task' out (no tlb_flush) - switch 'task' in (no tlb_flush) - write addr cause pagefault do_page_fault() (change to new addr mapping) wp_page_copy() ptep_clear_flush() ptep_get_and_clear() & flush_tlb_page() write new value into addr - switch 'task' out (no tlb_flush) - switch 'task' in (no tlb_flush) - read addr again (Use stale mapping entry in TLB) get wrong value from old phyical addr, BUG! The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm, which could guarantee to invalidate all stale TLB entries during TLB flush. Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Zong Li <zong.li@sifive.com> Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230226150137.1919750-3-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-22 13:31:34 +01:00
Sergey Matyukevich	aeefcfc579	Revert "riscv: mm: notify remote harts about mmu cache updates" commit e921050022f1f12d5029d1487a7dfc46cde15523 upstream. This reverts the remaining bits of commit 4bd1d80efb5a ("riscv: mm: notify remote harts harts about mmu cache updates"). According to bug reports, suggested approach to fix stale TLB entries is not sufficient. It needs to be replaced by a more robust solution. Fixes: 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates") Reported-by: Zong Li <zong.li@sifive.com> Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Cc: stable@vger.kernel.org Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230226150137.1919750-2-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-22 13:31:33 +01:00
Conor Dooley	07b0aba4ad	RISC-V: Don't check text_mutex during stop_machine [ Upstream commit 2a8db5ec4a28a0fce822d10224db9471a44b6925 ] We're currently using stop_machine() to update ftrace & kprobes, which means that the thread that takes text_mutex during may not be the same as the thread that eventually patches the code. This isn't actually a race because the lock is still held (preventing any other concurrent accesses) and there is only one thread running during stop_machine(), but it does trigger a lockdep failure. This patch just elides the lockdep check during stop_machine. Fixes: c15ac4fd60d5 ("riscv/ftrace: Add dynamic function tracer support") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:57 +01:00
Alexandre Ghiti	3a9418d2c9	riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode [ Upstream commit 76950340cf03b149412fe0d5f0810e52ac1df8cb ] When CONFIG_FRAME_POINTER is unset, the stack unwinding function walk_stackframe randomly reads the stack and then, when KASAN is enabled, it can lead to the following backtrace: [ 0.000000] ================================================================== [ 0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a [ 0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0 [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] Call Trace: [ 0.000000] [<ffffffff80007ba8>] walk_stackframe+0x0/0x11a [ 0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff80c49c80>] dump_stack_lvl+0x22/0x36 [ 0.000000] [<ffffffff80c3783e>] print_report+0x198/0x4a8 [ 0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff8015f68a>] kasan_report+0x9a/0xc8 [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff8006e99c>] desc_make_final+0x80/0x84 [ 0.000000] [<ffffffff8009a04e>] stack_trace_save+0x88/0xa6 [ 0.000000] [<ffffffff80099fc2>] filter_irq_stacks+0x72/0x76 [ 0.000000] [<ffffffff8006b95e>] devkmsg_read+0x32a/0x32e [ 0.000000] [<ffffffff8015ec16>] kasan_save_stack+0x28/0x52 [ 0.000000] [<ffffffff8006e998>] desc_make_final+0x7c/0x84 [ 0.000000] [<ffffffff8009a04a>] stack_trace_save+0x84/0xa6 [ 0.000000] [<ffffffff8015ec52>] kasan_set_track+0x12/0x20 [ 0.000000] [<ffffffff8015f22e>] __kasan_slab_alloc+0x58/0x5e [ 0.000000] [<ffffffff8015e7ea>] __kmem_cache_create+0x21e/0x39a [ 0.000000] [<ffffffff80e133ac>] create_boot_cache+0x70/0x9c [ 0.000000] [<ffffffff80e17ab2>] kmem_cache_init+0x6c/0x11e [ 0.000000] [<ffffffff80e00fd6>] mm_init+0xd8/0xfe [ 0.000000] [<ffffffff80e011d8>] start_kernel+0x190/0x3ca [ 0.000000] [ 0.000000] The buggy address belongs to stack of task swapper/0 [ 0.000000] and is located at offset 0 in frame: [ 0.000000] stack_trace_save+0x0/0xa6 [ 0.000000] [ 0.000000] This frame has 1 object: [ 0.000000] [32, 56) 'c' [ 0.000000] [ 0.000000] The buggy address belongs to the physical page: [ 0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07 [ 0.000000] flags: 0x1000(reserved\|zone=0) [ 0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000 [ 0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff [ 0.000000] page dumped because: kasan: bad access detected [ 0.000000] [ 0.000000] Memory state around the buggy address: [ 0.000000] ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3 [ 0.000000] ^ [ 0.000000] ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ================================================================== Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise mode. Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly") Reported-by: Chathura Rajapaksha <chathura.abeyrathne.lk@gmail.com> Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/ Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230308091639.602024-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:57 +01:00
Liao Chang	463ae58d7c	riscv: Add header include guards to insn.h [ Upstream commit 8ac6e619d9d51b3eb5bae817db8aa94e780a0db4 ] Add header include guards to insn.h to prevent repeating declaration of any identifiers in insn.h. Fixes: edde5584c7ab ("riscv: Add SW single-step support for KDB") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Fixes: c9c1af3f186a ("RISC-V: rename parse_asm.h to insn.h") Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230129094242.282620-1-liaochang1@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:51 +01:00
Mattias Nissler	4dd43ee784	riscv: Avoid enabling interrupts in die() [ Upstream commit 130aee3fd9981297ff9354e5d5609cd59aafbbea ] While working on something else, I noticed that the kernel would start accepting interrupts again after crashing in an interrupt handler. Since the kernel is already in inconsistent state, enabling interrupts is dangerous and opens up risk of kernel state deteriorating further. Interrupts do get enabled via what looks like an unintended side effect of spin_unlock_irq, so switch to the more cautious spin_lock_irqsave/spin_unlock_irqrestore instead. Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Mattias Nissler <mnissler@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:51 +01:00
Palmer Dabbelt	5ab1d0528b	RISC-V: Avoid dereferening NULL regs in die() [ Upstream commit f2913d006fcdb61719635e093d1b5dd0dafecac7 ] I don't think we can actually die() without a regs pointer, but the compiler was warning about a NULL check after a dereference. It seems prudent to just avoid the possibly-NULL dereference, given that when die()ing the system is already toast so who knows how we got there. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220920200037.6727-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Stable-dep-of: 130aee3fd998 ("riscv: Avoid enabling interrupts in die()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:51 +01:00
Guo Ren	71f81b6842	riscv: ftrace: Reduce the detour code size to half commit 6724a76cff85ee271bbbff42ac527e4643b2ec52 upstream. Use a temporary register to reduce the size of detour code from 16 bytes to 8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'. Before the patch: <func_prolog>: 0: REG_S ra, -SZREG(sp) 4: auipc ra, ? 8: jalr ?(ra) 12: REG_L ra, -SZREG(sp) (func_boddy) After the patch: <func_prolog>: 0: auipc t0, ? 4: jalr t0, ?(t0) (func_boddy) This patch not just reduces the size of detour code, but also fixes an important issue: An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can actually change the instruction pointer, e.g. to "replace" the given kernel function with a new one, which is needed for livepatching, etc. In this case, the trampoline (ftrace_regs_caller) would not return to <func_prolog+12> but would rather jump to the new function. So, "REG_L ra, -SZREG(sp)" would not run and the original return address would not be restored. The kernel is likely to hang or crash as a result. This can be easily demonstrated if one tries to "replace", say, cmdline_proc_show() with a new function with the same signature using instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace callback. Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/ Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/ Co-developed-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Evgenii Shatokhin <e.shatokhin@yadro.com> Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org Cc: stable@vger.kernel.org Fixes: 10626c32e382 ("riscv/ftrace: Add basic support") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00
Guo Ren	4accfc428f	riscv: ftrace: Remove wasted nops for !RISCV_ISA_C commit 409c8fb20c66df7150e592747412438c04aeb11f upstream. When CONFIG_RISCV_ISA_C=n, -fpatchable-function-entry=8 would generate more nops than we expect. Because it treat nop opcode as 0x00000013 instead of 0x0001. Dump of assembler code for function dw_pcie_free_msi: 0xffffffff806fce94 <+0>: sd ra,-8(sp) 0xffffffff806fce98 <+4>: auipc ra,0xff90f 0xffffffff806fce9c <+8>: jalr -684(ra) # 0xffffffff8000bbec <ftrace_caller> 0xffffffff806fcea0 <+12>: ld ra,-8(sp) 0xffffffff806fcea4 <+16>: nop /* wasted / 0xffffffff806fcea8 <+20>: nop / wasted / 0xffffffff806fceac <+24>: nop / wasted / 0xffffffff806fceb0 <+28>: nop / wasted */ 0xffffffff806fceb4 <+0>: addi sp,sp,-48 0xffffffff806fceb8 <+4>: sd s0,32(sp) 0xffffffff806fcebc <+8>: sd s1,24(sp) 0xffffffff806fcec0 <+12>: sd s2,16(sp) 0xffffffff806fcec4 <+16>: sd s3,8(sp) 0xffffffff806fcec8 <+20>: sd ra,40(sp) 0xffffffff806fcecc <+24>: addi s0,sp,48 Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230112090603.1295340-3-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00
Björn Töpel	f6b5db68b2	riscv, mm: Perform BPF exhandler fixup on page fault commit 416721ff05fddc58ca531b6f069de250301de6e5 upstream. Commit 21855cac82d3 ("riscv/mm: Prevent kernel module to access user memory without uaccess routines") added early exits/deaths for page faults stemming from accesses to user-space without using proper uaccess routines (where sstatus.SUM is set). Unfortunatly, this is too strict for some BPF programs, which relies on BPF exhandler fixups. These BPF programs loads "BTF pointers". A BTF pointers could either be a valid kernel pointer or NULL, but not a userspace address. Resolve the problem by calling the fixup handler in the early exit path. Fixes: 21855cac82d3 ("riscv/mm: Prevent kernel module to access user memory without uaccess routines") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230214162515.184827-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00
Andy Chiu	043d1657cc	riscv: jump_label: Fixup unaligned arch_static_branch function commit 9ddfc3cd806081ce1f6c9c2f988cbb031f35d28f upstream. Runtime code patching must be done at a naturally aligned address, or we may execute on a partial instruction. We have encountered problems traced back to static jump functions during the test. We switched the tracer randomly for every 1~5 seconds on a dual-core QEMU setup and found the kernel sucking at a static branch where it jumps to itself. The reason is that the static branch was 2-byte but not 4-byte aligned. Then, the kernel would patch the instruction, either J or NOP, with two half-word stores if the machine does not have efficient unaligned accesses. Thus, moments exist where half of the NOP mixes with the other half of the J when transitioning the branch. In our particular case, on a little-endian machine, the upper half of the NOP was mixed with the lower part of the J when enabling the branch, resulting in a jump that jumped to itself. Conversely, it would result in a HINT instruction when disabling the branch, but it might not be observable. ARM64 does not have this problem since all instructions must be 4-byte aligned. Fixes: ebc00dde8a97 ("riscv: Add jump-label implementation") Link: https://lore.kernel.org/linux-riscv/20220913094252.3555240-6-andy.chiu@sifive.com/ Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230206090440.1255001-1-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00
Sergey Matyukevich	ac5ff022d9	riscv: mm: fix regression due to update_mmu_cache change commit b49f700668fff7565b945dce823def79bff59bb0 upstream. This is a partial revert of the commit 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates"). Original commit included two loosely related changes serving the same purpose of fixing stale TLB entries causing user-space application crash: - introduce deferred per-ASID TLB flush for CPUs not running the task - switch to per-ASID TLB flush on all CPUs running the task in update_mmu_cache According to report and discussion in [1], the second part caused a regression on Renesas RZ/Five SoC. For now restore the old behavior of the update_mmu_cache. [1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/ Fixes: 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates") Reported-by: "Lad, Prabhakar" <prabhakar.csengg@gmail.com> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Link: trailer, so that it can be parsed with git's trailer functionality? Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230129211818.686557-1-geomatsi@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00
Conor Dooley	59b83f7b05	RISC-V: add a spin_shadow_stack declaration commit eb9be8310c58c166f9fae3b71c0ad9d6741b4897 upstream. The patchwork automation reported a sparse complaint that spin_shadow_stack was not declared and should be static: ../arch/riscv/kernel/traps.c:335:15: warning: symbol 'spin_shadow_stack' was not declared. Should it be static? However, this is used in entry.S and therefore shouldn't be static. The same applies to the shadow_stack that this pseudo spinlock is trying to protect, so do like its charge and add a declaration to thread_info.h Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Fixes: 7e1864332fbc ("riscv: fix race when vmap stack overflow") Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230210185945.915806-1-conor@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:12 +01:00

1 2 3 4 5 ...

1454 Commits