1044 Commits

Author SHA1 Message Date
Alexandre Ghiti
8633e74002 riscv: Fix module_alloc() that did not reset the linear mapping permissions
[ Upstream commit 749b94b08005929bbc636df21a23322733166e35 ]

After unloading a module, we must reset the linear mapping permissions,
see the example below:

Before unloading a module:

0xffffaf809d65d000-0xffffaf809d6dc000    0x000000011d65d000       508K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6dc000-0xffffaf809d6dd000    0x000000011d6dc000         4K PTE .   ..     ..   D A G . . . R V
0xffffaf809d6dd000-0xffffaf809d6e1000    0x000000011d6dd000        16K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000    0x000000011d6e1000        24K PTE .   ..     ..   D A G . X . R V

After unloading a module:

0xffffaf809d65d000-0xffffaf809d6e1000    0x000000011d65d000       528K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000    0x000000011d6e1000        24K PTE .   ..     ..   D A G . X W R V

The last mapping is not reset and we end up with WX mappings in the linear
mapping.

So add VM_FLUSH_RESET_PERMS to our module_alloc() definition.

Fixes: 0cff8bff7af8 ("riscv: avoid the PIC offset of static percpu data in module beyond 2G limits")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213134027.155327-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:50 -08:00
Alexandre Ghiti
8db56df4a9 riscv: Check if the code to patch lies in the exit section
[ Upstream commit 420370f3ae3d3b883813fd3051a38805160b2b9f ]

Otherwise we fall through to vmalloc_to_page() which panics since the
address does not lie in the vmalloc region.

Fixes: 043cb41a85de ("riscv: introduce interfaces to patch kernel code")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231214091926.203439-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:50 -08:00
Jisheng Zhang
b649a7fe9a riscv: don't probe unaligned access speed if already done
[ Upstream commit c20d36cc2a2073d4cdcda92bd7a1bb9b3b3b7c79 ]

If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912154040.3306-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:16:56 +01:00
Ignat Korchagin
37b561d559 kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
commit c41bd2514184d75db087fe4c1221237fb7922875 upstream.

In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for
CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP
required CONFIG_KEXEC.

However, it was not enough at least for arm64 platforms.  While further
testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP
is unavailable in menuconfig.  This is because CONFIG_CRASH_DUMP still
depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit
91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on
arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in
turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of
which are set in our config.

Given that we already established that CONFIG_KEXEC (which is a switch for
kexec system call itself) is not required for CONFIG_CRASH_DUMP drop
CONFIG_ARCH_SUPPORTS_KEXEC dependency as well.  The arm64 kernel builds
just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and
CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select
of KEXEC for CRASH_DUMP") and this patch are applied given that the
necessary shared bits are included via CONFIG_KEXEC_CORE dependency.

[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n]
  Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv
[bhe@redhat.com: riscv, kexec: fix dependency of two items]
  Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com
Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Cc: <stable@vger.kernel.org> # 6.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20 17:02:03 +01:00
Clément Léger
b12ccda0d4 riscv: fix misaligned access handling of C.SWSP and C.SDSP
[ Upstream commit 22e0eb04837a63af111fae35a92f7577676b9bc8 ]

This is a backport of a fix that was done in OpenSBI: ec0559eb315b
("lib: sbi_misaligned_ldst: Fix handling of C.SWSP and C.SDSP").

Unlike C.LWSP/C.LDSP, these encodings can be used with the zero
register, so checking that the rs2 field is non-zero is unnecessary.

Additionally, the previous check was incorrect since it was checking
the immediate field of the instruction instead of the rs2 field.

Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE")
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231103090223.702340-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13 18:45:19 +01:00
Nam Cao
b59f21292e riscv: kprobes: allow writing to x0
commit 8cb22bec142624d21bc85ff96b7bad10b6220e6a upstream.

Instructions can write to x0, so we should simulate these instructions
normally.

Currently, the kernel hangs if an instruction who writes to x0 is
simulated.

Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
Cc: stable@vger.kernel.org
Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230829182500.61875-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28 17:20:13 +00:00
Nam Cao
a08b414d1e riscv: put interrupt entries into .irqentry.text
commit 87615e95f6f9ccd36d4a3905a2d87f91967ea9d2 upstream.

The interrupt entries are expected to be in the .irqentry.text section.
For example, for kprobes to work properly, exception code cannot be
probed; this is ensured by blacklisting addresses in the .irqentry.text
section.

Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Signed-off-by: Nam Cao <namcaov@gmail.com>
Link: https://lore.kernel.org/r/20230821145708.21270-1-namcaov@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28 17:20:13 +00:00
Nam Cao
f9e1d36a66 riscv: provide riscv-specific is_trap_insn()
[ Upstream commit b701f9e726f0a30a94ea6af596b74c1f07b95b6b ]

uprobes expects is_trap_insn() to return true for any trap instructions,
not just the one used for installing uprobe. The current default
implementation only returns true for 16-bit c.ebreak if C extension is
enabled. This can confuse uprobes if a 32-bit ebreak generates a trap
exception from userspace: uprobes asks is_trap_insn() who says there is no
trap, so uprobes assume a probe was there before but has been removed, and
return to the trap instruction. This causes an infinite loop of entering
and exiting trap handler.

Instead of using the default implementation, implement this function
speficially for riscv with checks for both ebreak and c.ebreak.

Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Nam Cao <namcaov@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230829083614.117748-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 17:19:49 +00:00
Andrew Jones
81e25896cc RISC-V: hwprobe: Fix vDSO SIGSEGV
[ Upstream commit e1c05b3bf80f829ced464bdca90f1dfa96e8d251 ]

A hwprobe pair key is signed, but the hwprobe vDSO function was
only checking that the upper bound was valid. In order to help
avoid this type of problem in the future, and in anticipation of
this check becoming more complicated with sparse keys, introduce
and use a "key is valid" predicate function for the check.

Fixes: aa5af0aa90ba ("RISC-V: Add hwprobe vDSO function and data")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231010165101.14942-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 17:19:49 +00:00
Deepak Gupta
eff53aea38 riscv: VMAP_STACK overflow detection thread-safe
[ Upstream commit be97d0db5f44c0674480cb79ac6f5b0529b84c76 ]

commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added
support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to
`shadow_stack` temporarily before switching finally to per-cpu
`overflow_stack`.

If two CPUs/harts are racing and end up in over flowing kernel stack, one
or both will end up corrupting each other state because `shadow_stack` is
not per-cpu. This patch optimizes per-cpu overflow stack switch by
directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`.

Following are the changes in this patch

 - Defines an asm macro to obtain per-cpu symbols in destination
   register.
 - In entry.S, when overflow is detected, per-cpu overflow stack is
   located using per-cpu asm macro. Computing per-cpu symbol requires
   a temporary register. x31 is saved away into CSR_SCRATCH
   (CSR_SCRATCH is anyways zero since we're in kernel).

Please see Links for additional relevant disccussion and alternative
solution.

Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT`
Kernel crash log below

 Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT
 Task stack:     [0xff20000010a98000..0xff20000010a9c000]
 Overflow stack: [0xff600001f7d98370..0xff600001f7d99370]
 CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
 Hardware name: riscv-virtio,qemu (DT)
 epc : __memset+0x60/0xfc
  ra : recursive_loop+0x48/0xc6 [lkdtm]
 epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80
  gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88
  t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0
  s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000
  a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000
  a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff
  s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90
  s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684
  s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10
  s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4
  t5 : ffffffff815dbab8 t6 : ff20000010a9bb48
 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f
 Kernel panic - not syncing: Kernel stack overflow
 CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
 Hardware name: riscv-virtio,qemu (DT)
 Call Trace:
 [<ffffffff80006754>] dump_backtrace+0x30/0x38
 [<ffffffff808de798>] show_stack+0x40/0x4c
 [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c
 [<ffffffff808ea2d8>] dump_stack+0x18/0x20
 [<ffffffff808dec06>] panic+0x126/0x2fe
 [<ffffffff800065ea>] walk_stackframe+0x0/0xf0
 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm]
 SMP: stopping secondary CPUs
 ---[ end Kernel panic - not syncing: Kernel stack overflow ]---

Cc: Guo Ren <guoren@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t
Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Guo Ren <guoren@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 17:19:47 +00:00
Anup Patel
625046cd1d RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs
[ Upstream commit c4676f8dc1e12e68d6511f9ed89707fdad4c962c ]

The riscv_of_processor_hartid() used by riscv_of_parent_hartid() fails
for HARTs disabled in the DT. This results in the following warning
thrown by the RISC-V INTC driver for the E-core on SiFive boards:

[    0.000000] riscv-intc: unable to find hart id for /cpus/cpu@0/interrupt-controller

The riscv_of_parent_hartid() is only expected to read the hartid
from the DT so we directly call of_get_cpu_hwid() instead of calling
riscv_of_processor_hartid().

Fixes: ad635e723e17 ("riscv: cpu: Add 64bit hartid support on RV64")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20231027154254.355853-2-apatel@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20 11:59:37 +01:00
Jiexun Wang
07a2766575
RISC-V: Fix wrong use of CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK
If configuration options SOFTIRQ_ON_OWN_STACK and PREEMPT_RT
are enabled simultaneously under RISC-V architecture,
it will result in a compilation failure:

arch/riscv/kernel/irq.c:64:6: error: redefinition of 'do_softirq_own_stack'
   64 | void do_softirq_own_stack(void)
      |      ^~~~~~~~~~~~~~~~~~~~
In file included from ./arch/riscv/include/generated/asm/softirq_stack.h:1,
                 from arch/riscv/kernel/irq.c:15:
./include/asm-generic/softirq_stack.h:8:20: note: previous definition of 'do_softirq_own_stack' was here
    8 | static inline void do_softirq_own_stack(void)
      |                    ^~~~~~~~~~~~~~~~~~~~

After changing CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK to CONFIG_SOFTIRQ_ON_OWN_STACK,
compilation can be successful.

Fixes: dd69d07a5a6c ("riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK")
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Jiexun Wang <wangjiexun@tinylab.org>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20230913052940.374686-1-wangjiexun@tinylab.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12 11:34:16 -07:00
Chen Jiahao
1d6cd2146c
riscv: kdump: fix crashkernel reserving problem on RISC-V
When testing on risc-v QEMU environment with "crashkernel="
parameter enabled, a problem occurred with the following
message:

[    0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB)
[    0.000000] crashkernel reserved: 0x0000000177e00000 - 0x0000000277e00000 (4096 MB)
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:779 __insert_resource+0x8e/0xd0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc2-next-20230920 #1
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] epc : __insert_resource+0x8e/0xd0
[    0.000000]  ra : insert_resource+0x28/0x4e
[    0.000000] epc : ffffffff80017344 ra : ffffffff8001742e sp : ffffffff81203db0
[    0.000000]  gp : ffffffff812ece98 tp : ffffffff8120dac0 t0 : ff600001f7ff2b00
[    0.000000]  t1 : 0000000000000000 t2 : 3428203030303030 s0 : ffffffff81203dc0
[    0.000000]  s1 : ffffffff81211e18 a0 : ffffffff81211e18 a1 : ffffffff81289380
[    0.000000]  a2 : 0000000277dfffff a3 : 0000000177e00000 a4 : 0000000177e00000
[    0.000000]  a5 : ffffffff81289380 a6 : 0000000277dfffff a7 : 0000000000000078
[    0.000000]  s2 : ffffffff81289380 s3 : ffffffff80a0bac8 s4 : ff600001f7ff2880
[    0.000000]  s5 : 0000000000000280 s6 : 8000000a00006800 s7 : 000000000000007f
[    0.000000]  s8 : 0000000080017038 s9 : 0000000080038ea0 s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : ffffffff80a0bc00 t4 : ffffffff80a0bc00
[    0.000000]  t5 : ffffffff80a0bbd0 t6 : ffffffff80a0bc00
[    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[    0.000000] [<ffffffff80017344>] __insert_resource+0x8e/0xd0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Failed to add a Crash kernel resource at 177e00000

The crashkernel memory has been allocated successfully, whereas
it failed to insert into iomem_resource. This is due to the
unique reserving logic in risc-v arch specific code, i.e.
crashk_res/crashk_low_res will be added into iomem_resource
later in init_resources(), which is not aligned with current
unified reserving logic in reserve_crashkernel_{generic,low}()
and therefore leads to the failure of crashkernel reservation.

Removing the arch specific code within #ifdef CONFIG_KEXEC_CORE
in init_resources() to fix above problem.

Fixes: 31549153088e ("riscv: kdump: use generic interface to simplify crashkernel reservation")
Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com>
Link: https://lore.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12 11:34:09 -07:00
Andy Chiu
14a270bfab
riscv: signal: fix sigaltstack frame size checking
The alternative stack checking in get_sigframe introduced by the Vector
support is not needed and has a problem. It is not needed as we have
already validate it at the beginning of the function if we are already
on an altstack. If not, the size of an altstack is always validated at
its allocation stage with sigaltstack_size_valid().

Besides, we must only regard the size of an altstack if the handler of a
signal is registered with SA_ONSTACK. So, blindly checking overflow of
an altstack if sas_ss_size not equals to zero will check against wrong
signal handlers if only a subset of signals are registered with
SA_ONSTACK.

Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector")
Reported-by: Prashanth Swaminathan <prashanthsw@google.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230822164904.21660-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12 05:38:40 -07:00
Björn Töpel
9f564b92cf
riscv: Only consider swbp/ss handlers for correct privileged mode
RISC-V software breakpoint trap handlers are used for {k,u}probes.

When trapping from kernelmode, only the kernelmode handlers should be
considered. Vice versa, only usermode handlers for usermode
traps. This is not the case on RISC-V, which can trigger a bug if a
userspace process uses uprobes, and a WARN() is triggered from
kernelmode (which is implemented via {c.,}ebreak).

The kernel will trap on the kernelmode {c.,}ebreak, look for uprobes
handlers, realize incorrectly that uprobes need to be handled, and
exit the trap handler early. The trap returns to re-executing the
{c.,}ebreak, and enter an infinite trap-loop.

The issue was found running the BPF selftest [1].

Fix this issue by only considering the swbp/ss handlers for
kernel/usermode respectively. Also, move CONFIG ifdeffery from traps.c
to the asm/{k,u}probes.h headers.

Note that linux/uprobes.h only include asm/uprobes.h if CONFIG_UPROBES
is defined, which is why asm/uprobes.h needs to be unconditionally
included in traps.c

Link: https://lore.kernel.org/linux-riscv/87v8d19aun.fsf@all.your.base.are.belong.to.us/ # [1]
Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Nam Cao <namcaov@gmail.com>
Tested-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230912065619.62020-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-20 06:45:27 -07:00
Song Shuai
1bfb2b618d
riscv: kexec: Align the kexeced kernel entry
The current riscv boot protocol requires 2MB alignment for RV64
and 4MB alignment for RV32.

In KEXEC_FILE path, the elf_find_pbase() function should align
the kexeced kernel entry according to the requirement, otherwise
the kexeced kernel would silently BUG at the setup_vm().

Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230906095817.364390-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-12 13:58:39 -07:00
Palmer Dabbelt
77eea559ba
Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"
Puranjay Mohan <puranjay12@gmail.com> says:

Here is some data to prove the V2 fixes the problem:

Without this series:
root@rv-selftester:~/src/kselftest/bpf# time ./test_tag
test_tag: OK (40945 tests)

real    7m47.562s
user    0m24.145s
sys     6m37.064s

With this series applied:
root@rv-selftester:~/src/selftest/bpf# time ./test_tag
test_tag: OK (40945 tests)

real    7m29.472s
user    0m25.865s
sys     6m18.401s

BPF programs currently consume a page each on RISCV. For systems with many BPF
programs, this adds significant pressure to instruction TLB. High iTLB pressure
usually causes slow down for the whole system.

Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue.
It packs multiple BPF programs into a single huge page. It is currently only
enabled for the x86_64 BPF JIT.

I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now.

This patch series enables the BPF prog pack allocator for the RISCV BPF JIT.

======================================================
Performance Analysis of prog pack allocator on RISCV64
======================================================

Test setup:
===========

Host machine: Debian GNU/Linux 11 (bullseye)
Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1)
u-boot-qemu Version: 2023.07+dfsg-1
opensbi Version: 1.3-1

To test the performance of the BPF prog pack allocator on RV, a stresser
tool[4] linked below was built. This tool loads 8 BPF programs on the system and
triggers 5 of them in an infinite loop by doing system calls.

The runner script starts 20 instances of the above which loads 8*20=160 BPF
programs on the system, 5*20=100 of which are being constantly triggered.
The script is passed a command which would be run in the above environment.

The script was run with following perf command:
./run.sh "perf stat -a \
        -e iTLB-load-misses \
        -e dTLB-load-misses  \
        -e dTLB-store-misses \
        -e instructions \
        --timeout 60000"

The output of the above command is discussed below before and after enabling the
BPF prog pack allocator.

The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs
was created using Bjorn's riscv-cross-builder[5] docker container linked below.

Results
=======

Before enabling prog pack allocator:
------------------------------------

Performance counter stats for 'system wide':

           4939048      iTLB-load-misses
           5468689      dTLB-load-misses
            465234      dTLB-store-misses
     1441082097998      instructions

      60.045791200 seconds time elapsed

After enabling prog pack allocator:
-----------------------------------

Performance counter stats for 'system wide':

           3430035      iTLB-load-misses
           5008745      dTLB-load-misses
            409944      dTLB-store-misses
     1441535637988      instructions

      60.046296600 seconds time elapsed

Improvements in metrics
=======================

It was expected that the iTLB-load-misses would decrease as now a single huge
page is used to keep all the BPF programs compared to a single page for each
program earlier.

--------------------------------------------
The improvement in iTLB-load-misses: -30.5 %
--------------------------------------------

I repeated this expriment more than 100 times in different setups and the
improvement was always greater than 30%.

This patch series is boot tested on the Starfive VisionFive 2 board[6].
The performance analysis was not done on the board because it doesn't
expose iTLB-load-misses, etc. The stresser program was run on the board to test
the loading and unloading of BPF programs

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/
[3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/
[4] https://github.com/puranjaymohan/BPF-Allocator-Bench
[5] https://github.com/bjoto/riscv-cross-builder
[6] https://www.starfivetech.com/en/site/boards

* b4-shazam-merge:
  bpf, riscv: use prog pack allocator in the BPF JIT
  riscv: implement a memset like function for text
  riscv: extend patch_text_nosync() for multiple pages
  bpf: make bpf_prog_pack allocator portable

Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08 11:25:25 -07:00
Palmer Dabbelt
f578055558
Merge patch series "riscv: Introduce KASLR"
Alexandre Ghiti <alexghiti@rivosinc.com> says:

The following KASLR implementation allows to randomize the kernel mapping:

- virtually: we expect the bootloader to provide a seed in the device-tree
- physically: only implemented in the EFI stub, it relies on the firmware to
  provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation
  hence the patch 3 factorizes KASLR related functions for riscv to take
  advantage.

The new virtual kernel location is limited by the early page table that only
has one PUD and with the PMD alignment constraint, the kernel can only take
< 512 positions.

* b4-shazam-merge:
  riscv: libstub: Implement KASLR by using generic functions
  libstub: Fix compilation warning for rv32
  arm64: libstub: Move KASLR handling functions to kaslr.c
  riscv: Dump out kernel offset information on panic
  riscv: Introduce virtual kernel mapping KASLR

Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08 11:25:13 -07:00
Palmer Dabbelt
f093636354
Merge patch "RISC-V: Add ptrace support for vectors"
This resurrects the vector ptrace() support that was removed for 6.5 due
to some bugs cropping up as part of the GDB review process.

* b4-shazam-merge:
  RISC-V: Add ptrace support for vectors

Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08 11:24:38 -07:00
Palmer Dabbelt
c23be918c5
Merge patch series "Add non-coherent DMA support for AX45MP"
Prabhakar <prabhakar.csengg@gmail.com> says:

From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>

non-coherent DMA support for AX45MP
====================================

On the Andes AX45MP core, cache coherency is a specification option so it
may not be supported. In this case DMA will fail. To get around with this
issue this patch series does the below:

1] Andes alternative ports is implemented as errata which checks if the
IOCP is missing and only then applies to CMO errata. One vendor specific
SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of
errata.

Below are the configs which Andes port provides (and are selected by
RZ/Five):
      - ERRATA_ANDES
      - ERRATA_ANDES_CMO

OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now
part v1.3 release.

2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
block that allows dynamic adjustment of memory attributes in the runtime.
It contains a configurable amount of PMA entries implemented as CSR
registers to control the attributes of memory locations in interest.
OpenSBI configures the PMA regions as required and creates a reserve memory
node and propagates it to the higher boot stack.

Currently OpenSBI (upstream) configures the required PMA region and passes
this a shared DMA pool to Linux.

    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;

        pma_resv0@58000000 {
            compatible = "shared-dma-pool";
            reg = <0x0 0x58000000 0x0 0x08000000>;
            no-map;
            linux,dma-default;
        };
    };

The above shared DMA pool gets appended to Linux DTB so the DMA memory
requests go through this region.

3] We provide callbacks to synchronize specific content between memory and
cache.

4] RZ/Five SoC selects the below configs
        - AX45MP_L2_CACHE
        - DMA_GLOBAL_POOL
        - ERRATA_ANDES
        - ERRATA_ANDES_CMO

----------x---------------------x--------------------x---------------x----

* b4-shazam-merge:
  soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
  cache: Add L2 cache management for Andes AX45MP RISC-V core
  dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
  riscv: mm: dma-noncoherent: nonstandard cache operations support
  riscv: errata: Add Andes alternative ports
  riscv: asm: vendorid_list: Add Andes Technology to the vendors list

Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08 11:24:34 -07:00
Palmer Dabbelt
580253b518
Merge patch series "RISC-V: Probe for misaligned access speed"
Evan Green <evan@rivosinc.com> says:

The current setting for the hwprobe bit indicating misaligned access
speed is controlled by a vendor-specific feature probe function. This is
essentially a per-SoC table we have to maintain on behalf of each vendor
going forward. Let's convert that instead to something we detect at
runtime.

We have two assembly routines at the heart of our probe: one that
does a bunch of word-sized accesses (without aligning its input buffer),
and the other that does byte accesses. If we can move a larger number of
bytes using misaligned word accesses than we can with the same amount of
time doing byte accesses, then we can declare misaligned accesses as
"fast".

The tradeoff of reducing this maintenance burden is boot time. We spend
4-6 jiffies per core doing this measurement (0-2 on jiffie edge
alignment, and 4 on measurement). The timing loop was based on
raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number
of algorithms). By taking only the fastest iteration out of all
attempts for use in the comparison, variance between runs is very low.
On my THead C906, it looks like this:

[    0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast

Several others have chimed in with results on slow machines with the
older algorithm, which took all runs into account, including noise like
interrupts. Even with this variation, results indicate that in all cases
(fast, slow, and emulated) the measured numbers are nowhere near each
other (always multiple factors away).

* b4-shazam-merge:
  RISC-V: alternative: Remove feature_probe_func
  RISC-V: Probe for unaligned access speed

Link: https://lore.kernel.org/r/20230818194136.4084400-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08 11:24:12 -07:00
Puranjay Mohan
cad539baa4
riscv: implement a memset like function for text
The BPF JIT needs to write invalid instructions to RX regions of memory to
invalidate removed BPF programs. This needs a function like memset() that
can work with RX memory.

Implement patch_text_set_nosync() which is similar to text_poke_set() of
x86.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230831131229.497941-4-puranjay12@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06 06:26:06 -07:00
Puranjay Mohan
9721873c3c
riscv: extend patch_text_nosync() for multiple pages
The patch_insn_write() function currently doesn't work for multiple pages
of instructions, therefore patch_text_nosync() will fail with a page fault
if called with lengths spanning multiple pages.

This commit extends the patch_insn_write() function to support multiple
pages by copying at max 2 pages at a time in a loop. This implementation
is similar to text_poke_copy() function of x86.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230831131229.497941-3-puranjay12@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06 06:26:05 -07:00
Alexandre Ghiti
b7ac4b8ee7
riscv: libstub: Implement KASLR by using generic functions
We can now use arm64 functions to handle the move of the kernel physical
mapping: if KASLR is enabled, we will try to get a random seed from the
firmware, if not possible, the kernel will be moved to a location that
suits its alignment constraints.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230722123850.634544-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-05 19:49:31 -07:00
Alexandre Ghiti
54a519e6af
riscv: Dump out kernel offset information on panic
Dump out the KASLR virtual kernel offset when panic to help debug kernel.

Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230722123850.634544-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-05 19:49:28 -07:00
Alexandre Ghiti
84fe419dc7
riscv: Introduce virtual kernel mapping KASLR
KASLR implementation relies on a relocatable kernel so that we can move
the kernel mapping.

The seed needed to virtually move the kernel is taken from the device tree,
so we rely on the bootloader to provide a correct seed. Zkr could be used
unconditionnally instead if implemented, but that's for another patch.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230722123850.634544-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-05 19:49:27 -07:00
Andy Chiu
9300f00439
RISC-V: Add ptrace support for vectors
This patch add back the ptrace support with the following fix:
 - Define NT_RISCV_CSR and re-number NT_RISCV_VECTOR to prevent
   conflicting with gdb's NT_RISCV_CSR.
 - Use struct __riscv_v_regset_state to handle ptrace requests

Since gdb does not directly include the note description header in
Linux and has already defined NT_RISCV_CSR as 0x900, we decide to
sync with gdb and renumber NT_RISCV_VECTOR to solve and prevent future
conflicts.

Fixes: 0c59922c769a ("riscv: Add ptrace vector support")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com
[Palmer: Drop the unused "size" variable in riscv_vr_set().]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01 13:05:38 -07:00
Lad Prabhakar
e021ae7f51
riscv: errata: Add Andes alternative ports
Add required ports of the Alternative scheme for Andes CPU cores.

I/O Coherence Port (IOCP) provides an AXI interface for connecting external
non-caching masters, such as DMA controllers. IOCP is a specification
option and is disabled on the Renesas RZ/Five SoC due to this reason cache
management needs a software workaround.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1
Link: https://lore.kernel.org/r/20230818135723.80612-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01 09:08:56 -07:00
Evan Green
f2d14bc4e4
RISC-V: alternative: Remove feature_probe_func
Now that we're testing unaligned memory copy and making that
determination generically, there are no more users of the vendor
feature_probe_func(). While I think it's probably going to need to come
back, there are no users right now, so let's remove it until it's
needed.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230818194136.4084400-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01 09:06:26 -07:00
Evan Green
584ea6564b
RISC-V: Probe for unaligned access speed
Rather than deferring unaligned access speed determinations to a vendor
function, let's probe them and find out how fast they are. If we
determine that an unaligned word access is faster than N byte accesses,
mark the hardware's unaligned access as "fast". Otherwise, we mark
accesses as slow.

The algorithm itself runs for a fixed amount of jiffies. Within each
iteration it attempts to time a single loop, and then keeps only the best
(fastest) loop it saw. This algorithm was found to have lower variance from
run to run than my first attempt, which counted the total number of
iterations that could be done in that fixed amount of jiffies. By taking
only the best iteration in the loop, assuming at least one loop wasn't
perturbed by an interrupt, we eliminate the effects of interrupts and
other "warm up" factors like branch prediction. The only downside is it
depends on having an rdtime granular and accurate enough to measure a
single copy. If we ever manage to complete a loop in 0 rdtime ticks, we
leave the unaligned setting at UNKNOWN.

There is a slight change in user-visible behavior here. Previously, all
boards except the THead C906 reported misaligned access speed of
UNKNOWN. C906 reported FAST. With this change, since we're now measuring
misaligned access speed on each hart, all RISC-V systems will have this
key set as either FAST or SLOW.

Currently, we don't have a way to confidently measure the difference between
SLOW and EMULATED, so we label anything not fast as SLOW. This will
mislabel some systems that are actually EMULATED as SLOW. When we get
support for delegating misaligned access traps to the kernel (as opposed
to the firmware quietly handling it), we can explicitly test in Linux to
see if unaligned accesses trap. Those systems will start to report
EMULATED, though older (today's) systems without that new SBI mechanism
will continue to report SLOW.

I've updated the documentation for those hwprobe values to reflect
this, specifically: SLOW may or may not be emulated by software, and FAST
represents means being faster than equivalent byte accesses. The change
in documentation is accurate with respect to both the former and current
behavior.

Signed-off-by: Evan Green <evan@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230818194136.4084400-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01 09:06:25 -07:00
Linus Torvalds
e0152e7481 RISC-V Patches for the 6.6 Merge Window, Part 1
* Support for the new "riscv,isa-extensions" and "riscv,isa-base" device
   tree interfaces for probing extensions.
 * Support for userspace access to the performance counters.
 * Support for more instructions in kprobes.
 * Crash kernels can be allocated above 4GiB.
 * Support for KCFI.
 * Support for ELFs in !MMU configurations.
 * ARCH_KMALLOC_MINALIGN has been reduced to 8.
 * mmap() defaults to sv48-sized addresses, with longer addresses hidden
   behind a hint (similar to Arm and Intel).
 * Also various fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmTx96kTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiVjRD/9DYVLlkQ/OEDJjPaEcYCP49xgIVUUU
 lhs3XbSs2VNHBaiG114f6Q0AaT/uNi+uqSej3CeTmEot2kZkBk/f2yu+UNIriPZ9
 GQiZsdyXhu921C+5VFtiI47KDWOVZ+Jpy3M1ll61IWt3yPSQHr1xOP0AOiyHHqe3
 cmqpNnzjajlfVDoXPc2mGGzUJt/7ar4thcwnMNi98raXR5Qh7SP6rrHjoQhE1oFk
 LMP3CHqEAcHE2tE4CxZVpc6HOQ5m0LpQIOK7ypufGMyoIYESm5dt/JOT4MlhTtDw
 6JzyVKtiM7lartUnUaW3ZoX4trQYT5gbXxWrJ2gCnUGy3VulikoXr1Rpz0qfdeOR
 XN8OLkVAqHfTGFI7oKk24f9Adw96R5NPZcdCay90h4J/kMfCiC7ZyUUI1XIa5iy1
 np5pZCkf8HNcdywML7qcFd5n2O0wchyFnRLFZo6kJP9Ls5cEi6kBx/1jSdTcNgx/
 fUKXyoEcriGoQiiwn29+4RZnU69gJV3zqQNLPpuwDQ5F/Q1zHTlrr+dqzezKkzcO
 dRTV2d2Q4A5vIDXPptzNNLlRQdrc8qxPJ1lxQVkPIU4/mtqczmZBwlyY2u9zwPyS
 sehJgJZnoAf+jm71NgQAKLck4MUBsMnMogOWunhXkVRCoZlbbkUWX4ECZYwPKsVk
 W7zVPmLvSM0l5g==
 =/tXb
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the new "riscv,isa-extensions" and "riscv,isa-base"
   device tree interfaces for probing extensions

 - Support for userspace access to the performance counters

 - Support for more instructions in kprobes

 - Crash kernels can be allocated above 4GiB

 - Support for KCFI

 - Support for ELFs in !MMU configurations

 - ARCH_KMALLOC_MINALIGN has been reduced to 8

 - mmap() defaults to sv48-sized addresses, with longer addresses hidden
   behind a hint (similar to Arm and Intel)

 - Also various fixes and cleanups

* tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V
  riscv: support PREEMPT_DYNAMIC with static keys
  riscv: Move create_tmp_mapping() to init sections
  riscv: Mark KASAN tmp* page tables variables as static
  riscv: mm: use bitmap_zero() API
  riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
  riscv: remove redundant mv instructions
  RISC-V: mm: Document mmap changes
  RISC-V: mm: Update pgtable comment documentation
  RISC-V: mm: Add tests for RISC-V mm
  RISC-V: mm: Restrict address space for sv39,sv48,sv57
  riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
  riscv: allow kmalloc() caches aligned to the smallest value
  riscv: support the elf-fdpic binfmt loader
  binfmt_elf_fdpic: support 64-bit systems
  riscv: Allow CONFIG_CFI_CLANG to be selected
  riscv/purgatory: Disable CFI
  riscv: Add CFI error handling
  riscv: Add ftrace_stub_graph
  riscv: Add types to indirectly called assembly functions
  ...
2023-09-01 08:09:48 -07:00
Palmer Dabbelt
52b77c2806
Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8"
Jisheng Zhang <jszhang@kernel.org> says:

Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
it brings some bad effects to coherent platforms:

Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
kmalloc-8 slab caches don't exist any more, they are replaced with
either kmalloc-128 or kmalloc-64.

Secondly, larger than necessary kmalloc aligned allocations results
in unnecessary cache/TLB pressure.

This issue also exists on arm64 platforms. From last year, Catalin
tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
in various drivers with ARCH_DMA_MINALIGN etc.[1]

One fact we can make use of for riscv: if the CPU doesn't support
ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
Catalin's work and above fact, we can easily solve the kmalloc align
issue for riscv: we can override dma_get_cache_alignment(), then let
it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.

So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
kmalloc minimum alignment, nothing changed in this case. This case
can be improved in the future once we see such platforms in mainline.

After this patch, a simple test of booting to a small buildroot rootfs
on qemu shows:

kmalloc-96           5041    5041     96  ...
kmalloc-64           9606    9606     64  ...
kmalloc-32           5128    5128     32  ...
kmalloc-16           7682    7682     16  ...
kmalloc-8           10246   10246      8  ...

So we save about 1268KB memory. The saving will be much larger in normal
OS env on real HW platforms.

patch1 allows kmalloc() caches aligned to the smallest value.
patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC.

After this series:

As for coherent platforms, kmalloc-{8,16,32,96} caches come back on
coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO.

As for noncoherent RV32 platforms, nothing changed.

As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the
above kmalloc caches also come back if > 4GB memory or users pass
"swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How
much mmnn should be depends on the specific platform, it needs to be
tried and tested all possible usage case on the specific hardware. For
example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock.

* b4-shazam-merge:
  riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
  riscv: allow kmalloc() caches aligned to the smallest value

Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]
Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:35 -07:00
Palmer Dabbelt
7f7d3ea6eb
Merge patch series "riscv: KCFI support"
Sami Tolvanen <samitolvanen@google.com> says:

This series adds KCFI support for RISC-V. KCFI is a fine-grained
forward-edge control-flow integrity scheme supported in Clang >=16,
which ensures indirect calls in instrumented code can only branch to
functions whose type matches the function pointer type, thus making
code reuse attacks more difficult.

Patch 1 implements a pt_regs based syscall wrapper to address
function pointer type mismatches in syscall handling. Patches 2 and 3
annotate indirectly called assembly functions with CFI types. Patch 4
implements error handling for indirect call checks. Patch 5 disables
CFI for arch/riscv/purgatory. Patch 6 finally allows CONFIG_CFI_CLANG
to be enabled for RISC-V.

Note that Clang 16 has a generic architecture-agnostic KCFI
implementation, which does work with the kernel, but doesn't produce
a stable code sequence for indirect call checks, which means
potential failures just trap and won't result in informative error
messages. Clang 17 includes a RISC-V specific back-end implementation
for KCFI, which emits a predictable code sequence for the checks and a
.kcfi_traps section with locations of the traps, which patch 5 uses to
produce more useful errors.

The type mismatch fixes and annotations in the first three patches
also become necessary in future if the kernel decides to support
fine-grained CFI implemented using the hardware landing pad
feature proposed in the in-progress Zicfisslp extension. Once the
specification is ratified and hardware support emerges, implementing
runtime patching support that replaces KCFI instrumentation with
Zicfisslp landing pads might also be feasible (similarly to KCFI to
FineIBT patching on x86_64), allowing distributions to ship a unified
kernel binary for all devices.

* b4-shazam-merge:
  riscv: Allow CONFIG_CFI_CLANG to be selected
  riscv/purgatory: Disable CFI
  riscv: Add CFI error handling
  riscv: Add ftrace_stub_graph
  riscv: Add types to indirectly called assembly functions
  riscv: Implement syscall wrappers

Link: https://lore.kernel.org/r/20230710183544.999540-8-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:32 -07:00
Palmer Dabbelt
9389e6715f
Merge patch series "support allocating crashkernel above 4G explicitly on riscv"
Chen Jiahao <chenjiahao16@huawei.com> says:

On riscv, the current crash kernel allocation logic is trying to
allocate within 32bit addressible memory region by default, if
failed, try to allocate without 4G restriction.

In need of saving DMA zone memory while allocating a relatively large
crash kernel region, allocating the reserved memory top down in
high memory, without overlapping the DMA zone, is a mature solution.
Hence this patchset introduces the parameter option crashkernel=X,[high,low].

One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low". Besides, there are few rules need
to take notice:
1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size"
   is specified.
2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed
   and there is enough memory to be allocated under 4G.
3. When allocating crashkernel above 4G and no "crashkernel=X,low" is
   specified, a 128M low memory will be allocated automatically for
   swiotlb bounce buffer.
See Documentation/admin-guide/kernel-parameters.txt for more information.

To verify loading the crashkernel, adapted kexec-tools is attached below:
https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2

Following test cases have been performed as expected:
1) crashkernel=256M                          //low=256M
2) crashkernel=1G                            //low=1G
3) crashkernel=4G                            //high=4G, low=128M(default)
4) crashkernel=4G crashkernel=256M,high      //high=4G, low=128M(default), high is ignored
5) crashkernel=4G crashkernel=256M,low       //high=4G, low=128M(default), low is ignored
6) crashkernel=4G,high                       //high=4G, low=128M(default)
7) crashkernel=256M,low                      //low=0M, invalid
8) crashkernel=4G,high crashkernel=256M,low  //high=4G, low=256M
9) crashkernel=4G,high crashkernel=4G,low    //high=0M, low=0M, invalid
10) crashkernel=512M@0xd0000000              //low=512M
11) crashkernel=1G,high crashkernel=0M,low   //high=1G, low=0M

* b4-shazam-merge:
  docs: kdump: Update the crashkernel description for riscv
  riscv: kdump: Implement crashkernel=X,[high,low]

Link: https://lore.kernel.org/r/20230726175000.2536220-1-chenjiahao16@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:28 -07:00
Palmer Dabbelt
82dfb5fde6
Merge patch series "riscv: kprobes: simulate some instructions"
Nam Cao <namcaov@gmail.com> says:

Simulate some currently rejected instructions. Still to be simulated are:
    - c.jal
    - c.ebreak

* b4-shazam-merge:
  riscv: kprobes: simulate c.beqz and c.bnez
  riscv: kprobes: simulate c.jr and c.jalr instructions
  riscv: kprobes: simulate c.j instruction

Link: https://lore.kernel.org/r/cover.1690704360.git.namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:27 -07:00
Nam Cao
6b289a3ffa
riscv: remove redundant mv instructions
Some mv instructions were useful when first introduced to preserve a0 and
a1 before function calls. However the code has changed and they are now
redundant. Remove them.

Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230725053835.138910-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:25 -07:00
Linus Torvalds
ef2a0b7cdb Devicetree include cleanups for v6.6:
These are the remaining few clean-ups of DT related includes which
 didn't get applied to subsystem trees.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmTucUoACgkQ+vtdtY28
 YcOYoQ//RwIPeWc74PHQbOb6eQR95eTHDcDE1MR9Fw8amqxFaomGlSMpbyVyP4ag
 8p82c6qfJIZautyEikbKFO+iYjFMua0KuOTMVuDxHErQOl6ym4P4Uk3+1h5stVSj
 IdfK4CACtMKxKBOPAcyxJU6HKoWcUtMKsKV6OLdDh7M2Fy/G4RCjv4w1Xf3VAn59
 VOa0KF7FhHU3dhIB/tGsj0t13+3e3kF5+l4+pdoMoZWhR4gac5FJRxiR5dMZG6jr
 VY8i9FZb7DW2VtY78FVVOaYDDVf4vNrc+0kqnCbWUaKACHPgNXC375LvS7jFGXvc
 HYVN3teqhFxNOyoSehn2bdBVwJxjQFgy2gTt2vRWTa/CaUDES90cue2R9GT2Sz0b
 eBc3DQtNeT5m8mrLkuEfZrJjKjaEy2Pr6FjNDhNcmkJak7dkMMgkG/Y/SpNmpZOe
 2C3T6i4i6FUxni/2/rWHSVLnYBGfhPNdwWAZcQOi8rqtzp3tF46wVa345+Ev3VDG
 ECDndH8Qk3gtOmGyeTIvPc51yDP6Hpuh7+0jydtehkXHB+cUJtR+g0efIGf7BDgo
 sQpa1vRxkOolrCxyzKwcogEY7jjeccv/FM7BwaZQKXEibiKGkxeDuahdwbfvDuVq
 br16Uj9VzG8Jl6KK0gexV7kzZAAdw1y3JqPGUZf7hn4zmk099ow=
 =eLMf
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree include cleanups from Rob Herring:
 "These are the remaining few clean-ups of DT related includes which
  didn't get applied to subsystem trees"

* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  ipmi: Explicitly include correct DT includes
  tpm: Explicitly include correct DT includes
  lib/genalloc: Explicitly include correct DT includes
  parport: Explicitly include correct DT includes
  sbus: Explicitly include correct DT includes
  mux: Explicitly include correct DT includes
  macintosh: Explicitly include correct DT includes
  hte: Explicitly include correct DT includes
  EDAC: Explicitly include correct DT includes
  clocksource: Explicitly include correct DT includes
  sparc: Explicitly include correct DT includes
  riscv: Explicitly include correct DT includes
2023-08-30 17:04:28 -07:00
Linus Torvalds
d68b4b6f30 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options").
 
 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h").
 
 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands").
 
 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions").
 
 - Switch the handling of kdump from a udev scheme to in-kernel handling,
   by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
   un/plug").
 
 - Many singleton patches to various parts of the tree
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
 juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
 QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
 =WJQa
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Rob Herring
c893884691 riscv: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Link: https://lore.kernel.org/r/20230714174043.4040561-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-28 13:30:52 -05:00
Jisheng Zhang
2926715163
riscv: allow kmalloc() caches aligned to the smallest value
Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
it brings some bad effects to coherent platforms:

Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
kmalloc-8 slab caches don't exist any more, they are replaced with
either kmalloc-128 or kmalloc-64.

Secondly, larger than necessary kmalloc aligned allocations results
in unnecessary cache/TLB pressure.

This issue also exists on arm64 platforms. From last year, Catalin
tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
in various drivers with ARCH_DMA_MINALIGN etc.[1]

One fact we can make use of for riscv: if the CPU doesn't support
ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
Catalin's work and above fact, we can easily solve the kmalloc align
issue for riscv: we can override dma_get_cache_alignment(), then let
it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.

So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
kmalloc minimum alignment, nothing changed in this case. This case
can be improved in the future.

After this patch, a simple test of booting to a small buildroot rootfs
on qemu shows:

kmalloc-96           5041    5041     96  ...
kmalloc-64           9606    9606     64  ...
kmalloc-32           5128    5128     32  ...
kmalloc-16           7682    7682     16  ...
kmalloc-8           10246   10246      8  ...

So we save about 1268KB memory. The saving will be much larger in normal
OS env on real HW platforms.

Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:22:00 -07:00
Sami Tolvanen
af0ead42f6
riscv: Add CFI error handling
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately
before each function and a check to validate the target function type
before indirect calls:

  ; type preamble
    .word <id>
  function:
    ...
  ; indirect call check
    lw      t1, -4(a0)
    lui     t2, <hi20>
    addiw   t2, t2, <lo12>
    beq     t1, t2, .Ltmp0
    ebreak
  .Ltmp0:
    jarl    a0

Implement error handling code for the ebreak traps emitted for the
checks. This produces the following oops on a CFI failure (generated
using lkdtm):

[   21.177245] CFI failure at lkdtm_indirect_call+0x22/0x32 [lkdtm]
(target: lkdtm_increment_int+0x0/0x18 [lkdtm]; expected type: 0x3ad55aca)
[   21.178483] Kernel BUG [#1]
[   21.178671] Modules linked in: lkdtm
[   21.179037] CPU: 1 PID: 104 Comm: sh Not tainted
6.3.0-rc6-00037-g37d5ec6297ab #1
[   21.179511] Hardware name: riscv-virtio,qemu (DT)
[   21.179818] epc : lkdtm_indirect_call+0x22/0x32 [lkdtm]
[   21.180106]  ra : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm]
[   21.180426] epc : ffffffff01387092 ra : ffffffff01386f14 sp : ff20000000453cf0
[   21.180792]  gp : ffffffff81308c38 tp : ff6000000243f080 t0 : ff20000000453b78
[   21.181157]  t1 : 000000003ad55aca t2 : 000000007e0c52a5 s0 : ff20000000453d00
[   21.181506]  s1 : 0000000000000001 a0 : ffffffff0138d170 a1 : ffffffff013870bc
[   21.181819]  a2 : b5fea48dd89aa700 a3 : 0000000000000001 a4 : 0000000000000fff
[   21.182169]  a5 : 0000000000000004 a6 : 00000000000000b7 a7 : 0000000000000000
[   21.182591]  s2 : ff20000000453e78 s3 : ffffffffffffffea s4 : 0000000000000012
[   21.183001]  s5 : ff600000023c7000 s6 : 0000000000000006 s7 : ffffffff013882a0
[   21.183653]  s8 : 0000000000000008 s9 : 0000000000000002 s10: ffffffff0138d878
[   21.184245]  s11: ffffffff0138d878 t3 : 0000000000000003 t4 : 0000000000000000
[   21.184591]  t5 : ffffffff8133df08 t6 : ffffffff8133df07
[   21.184858] status: 0000000000000120 badaddr: 0000000000000000
cause: 0000000000000003
[   21.185415] [<ffffffff01387092>] lkdtm_indirect_call+0x22/0x32 [lkdtm]
[   21.185772] [<ffffffff01386f14>] lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm]
[   21.186093] [<ffffffff01383552>] lkdtm_do_action+0x22/0x34 [lkdtm]
[   21.186445] [<ffffffff0138350c>] direct_entry+0x128/0x13a [lkdtm]
[   21.186817] [<ffffffff8033ed8c>] full_proxy_write+0x58/0xb2
[   21.187352] [<ffffffff801d4fe8>] vfs_write+0x14c/0x33a
[   21.187644] [<ffffffff801d5328>] ksys_write+0x64/0xd4
[   21.187832] [<ffffffff801d53a6>] sys_write+0xe/0x1a
[   21.188171] [<ffffffff80003996>] ret_from_syscall+0x0/0x2
[   21.188595] Code: 0513 0f65 a303 ffc5 53b7 7e0c 839b 2a53 0363 0073 (9002) 9582
[   21.189178] ---[ end trace 0000000000000000 ]---
[   21.189590] Kernel panic - not syncing: Fatal exception

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com> # ISA bits
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-12-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:39 -07:00
Sami Tolvanen
f3a0c23f25
riscv: Add ftrace_stub_graph
Commit 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and
ftrace_stub_graph()") added a separate ftrace_stub_graph function for
CFI_CLANG. Add the stub to fix FUNCTION_GRAPH_TRACER compatibility
with CFI.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-11-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:38 -07:00
Sami Tolvanen
5f59c6855b
riscv: Add types to indirectly called assembly functions
With CONFIG_CFI_CLANG, assembly functions indirectly called
from C code must be annotated with type identifiers to pass CFI
checking. Use the SYM_TYPED_START macro to add types to the
relevant functions.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-10-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:37 -07:00
Sami Tolvanen
08d0ce30e0
riscv: Implement syscall wrappers
Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved
syscall handling to C code, which exposed function pointer type
mismatches that trip fine-grained forward-edge Control-Flow Integrity
(CFI) checks as syscall handlers are all called through the same
syscall_t pointer type. To fix the type mismatches, implement pt_regs
based syscall wrappers similarly to x86 and arm64.

This patch is based on arm64 syscall wrappers added in commit
4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal
was to minimize the risk of userspace-controlled values being used
under speculation. This may be a concern for riscv in future as well.

Following other architectures, the syscall wrappers generate three
functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs
pointer and extracts arguments from registers, __se_<compat_>sys_<name>
is a sign-extension wrapper that casts the long arguments to the
correct types for the real syscall implementation, which is named
__do_<compat_>sys_<name>.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-9-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:36 -07:00
Palmer Dabbelt
e2de1646f7
Merge patch series "riscv: fix ptrace and export VLENB"
Andy Chiu <andy.chiu@sifive.com> says:

We add a vlenb field in Vector context and save it with the
riscv_vstate_save() macro. It should not cause performance regression as
VLENB is a design-time constant and is frequently used by hardware.
Also, adding this field into the __sc_riscv_v_state may benifit us on a
future compatibility issue becuse a hardware may have writable VLENB.

Adding and saving VLENB have an immediate benifit as it gives ptrace a
better view of the Vector extension and makes it possible to reconstruct
Vector register files from the dump without doing an additional csr read.

This patchset also sync the number of note types between us and gdb for
riscv to solve a conflicting note.

This is not an ABI break given that 6.5 has not been released yet.

* b4-shazam-merge:
  RISC-V: vector: export VLENB csr in __sc_riscv_v_state
  RISC-V: Remove ptrace support for vectors

Link: https://lore.kernel.org/r/20230816155450.26200-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 12:35:02 -07:00
Palmer Dabbelt
e3f9324b23
RISC-V: Remove ptrace support for vectors
We've found two bugs here: NT_RISCV_VECTOR steps on NT_RISCV_CSR (which
is only for embedded), and we don't have vlenb in the core dumps.  Given
that we've have a pair of bugs croup up as part of the GDB review we've
probably got other issues, so let's just cut this for 6.5 and get it
right.

Fixes: 0c59922c769a ("riscv: Add ptrace vector support")
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230816155450.26200-2-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-22 13:54:59 -07:00
Eric DeVolder
e6265fe777 kexec: rename ARCH_HAS_KEXEC_PURGATORY
The Kconfig refactor to consolidate KEXEC and CRASH options utilized
option names of the form ARCH_SUPPORTS_<option>. Thus rename the
ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow
the same.

Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:18:54 -07:00
Chen Jiahao
5882e5acf1
riscv: kdump: Implement crashkernel=X,[high,low]
On riscv, the current crash kernel allocation logic is trying to
allocate within 32bit addressible memory region by default, if
failed, try to allocate without 4G restriction.

In need of saving DMA zone memory while allocating a relatively large
crash kernel region, allocating the reserved memory top down in
high memory, without overlapping the DMA zone, is a mature solution.
Here introduce the parameter option crashkernel=X,[high,low].

One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low".

Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20230726175000.2536220-2-chenjiahao16@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16 07:51:48 -07:00
Nam Cao
d943705fba
riscv: kprobes: simulate c.beqz and c.bnez
kprobes currently rejects instruction c.beqz and c.bnez. Implement them.

Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/1d879dba4e4ee9a82e27625d6483b5c9cfed684f.1690704360.git.namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16 07:48:40 -07:00
Nam Cao
b18256d9b7
riscv: kprobes: simulate c.jr and c.jalr instructions
kprobes currently rejects c.jr and c.jalr instructions. Implement them.

Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/db8b7787e9208654cca50484f68334f412be2ea9.1690704360.git.namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16 07:48:39 -07:00