2019-06-03 07:44:50 +02:00
// SPDX-License-Identifier: GPL-2.0-only
2012-03-05 11:49:26 +00:00
/*
* Based on arch / arm / kernel / asm - offsets . c
*
* Copyright ( C ) 1995 - 2003 Russell King
* 2001 - 2002 Keith Owens
* Copyright ( C ) 2012 ARM Ltd .
*/
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 15:38:12 +00:00
# include <linux/arm_sdei.h>
2012-03-05 11:49:26 +00:00
# include <linux/sched.h>
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.
As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:
https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110
| # of failed: 0
| # of unresolved: 3
| # of untested: 0
| # of unsupported: 0
| # of xfailed: 1
| # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:46:03 +00:00
# include <linux/ftrace.h>
2021-09-30 14:31:05 +00:00
# include <linux/kexec.h>
2012-03-05 11:49:26 +00:00
# include <linux/mm.h>
# include <linux/dma-mapping.h>
2013-07-04 13:34:32 +01:00
# include <linux/kvm_host.h>
2018-03-29 15:13:23 +02:00
# include <linux/preempt.h>
2016-04-27 17:47:12 +01:00
# include <linux/suspend.h>
2016-09-09 14:07:16 +01:00
# include <asm/cpufeature.h>
2017-11-14 14:14:17 +00:00
# include <asm/fixmap.h>
2012-03-05 11:49:26 +00:00
# include <asm/thread_info.h>
# include <asm/memory.h>
2019-06-21 10:52:35 +01:00
# include <asm/signal32.h>
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 12:22:13 +01:00
# include <asm/smp_plat.h>
# include <asm/suspend.h>
2012-03-05 11:49:26 +00:00
# include <linux/kbuild.h>
2016-01-04 15:44:32 +01:00
# include <linux/arm-smccc.h>
2012-03-05 11:49:26 +00:00
int main ( void )
{
DEFINE ( TSK_ACTIVE_MM , offsetof ( struct task_struct , active_mm ) ) ;
BLANK ( ) ;
2021-09-14 14:10:29 +02:00
DEFINE ( TSK_TI_CPU , offsetof ( struct task_struct , thread_info . cpu ) ) ;
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-03 20:23:13 +00:00
DEFINE ( TSK_TI_FLAGS , offsetof ( struct task_struct , thread_info . flags ) ) ;
DEFINE ( TSK_TI_PREEMPT , offsetof ( struct task_struct , thread_info . preempt_count ) ) ;
2016-07-01 16:53:00 +01:00
# ifdef CONFIG_ARM64_SW_TTBR0_PAN
DEFINE ( TSK_TI_TTBR0 , offsetof ( struct task_struct , thread_info . ttbr0 ) ) ;
2020-04-27 09:00:16 -07:00
# endif
# ifdef CONFIG_SHADOW_CALL_STACK
DEFINE ( TSK_TI_SCS_BASE , offsetof ( struct task_struct , thread_info . scs_base ) ) ;
2020-05-15 14:11:05 +01:00
DEFINE ( TSK_TI_SCS_SP , offsetof ( struct task_struct , thread_info . scs_sp ) ) ;
2016-07-01 16:53:00 +01:00
# endif
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-03 20:23:13 +00:00
DEFINE ( TSK_STACK , offsetof ( struct task_struct , stack ) ) ;
2018-12-12 13:08:44 +01:00
# ifdef CONFIG_STACKPROTECTOR
DEFINE ( TSK_STACK_CANARY , offsetof ( struct task_struct , stack_canary ) ) ;
# endif
2012-03-05 11:49:26 +00:00
BLANK ( ) ;
DEFINE ( THREAD_CPU_CONTEXT , offsetof ( struct task_struct , thread . cpu_context ) ) ;
2021-03-18 20:10:53 -07:00
DEFINE ( THREAD_SCTLR_USER , offsetof ( struct task_struct , thread . sctlr_user ) ) ;
2020-03-13 14:34:51 +05:30
# ifdef CONFIG_ARM64_PTR_AUTH
DEFINE ( THREAD_KEYS_USER , offsetof ( struct task_struct , thread . keys_user ) ) ;
2021-06-13 11:26:31 +02:00
# endif
# ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
2020-03-13 14:34:56 +05:30
DEFINE ( THREAD_KEYS_KERNEL , offsetof ( struct task_struct , thread . keys_kernel ) ) ;
2020-12-22 12:01:45 -08:00
# endif
# ifdef CONFIG_ARM64_MTE
2021-07-27 13:52:55 -07:00
DEFINE ( THREAD_MTE_CTRL , offsetof ( struct task_struct , thread . mte_ctrl ) ) ;
2020-03-13 14:34:51 +05:30
# endif
2012-03-05 11:49:26 +00:00
BLANK ( ) ;
DEFINE ( S_X0 , offsetof ( struct pt_regs , regs [ 0 ] ) ) ;
DEFINE ( S_X2 , offsetof ( struct pt_regs , regs [ 2 ] ) ) ;
DEFINE ( S_X4 , offsetof ( struct pt_regs , regs [ 4 ] ) ) ;
DEFINE ( S_X6 , offsetof ( struct pt_regs , regs [ 6 ] ) ) ;
2016-07-08 12:35:52 -04:00
DEFINE ( S_X8 , offsetof ( struct pt_regs , regs [ 8 ] ) ) ;
DEFINE ( S_X10 , offsetof ( struct pt_regs , regs [ 10 ] ) ) ;
DEFINE ( S_X12 , offsetof ( struct pt_regs , regs [ 12 ] ) ) ;
DEFINE ( S_X14 , offsetof ( struct pt_regs , regs [ 14 ] ) ) ;
DEFINE ( S_X16 , offsetof ( struct pt_regs , regs [ 16 ] ) ) ;
DEFINE ( S_X18 , offsetof ( struct pt_regs , regs [ 18 ] ) ) ;
DEFINE ( S_X20 , offsetof ( struct pt_regs , regs [ 20 ] ) ) ;
DEFINE ( S_X22 , offsetof ( struct pt_regs , regs [ 22 ] ) ) ;
DEFINE ( S_X24 , offsetof ( struct pt_regs , regs [ 24 ] ) ) ;
DEFINE ( S_X26 , offsetof ( struct pt_regs , regs [ 26 ] ) ) ;
DEFINE ( S_X28 , offsetof ( struct pt_regs , regs [ 28 ] ) ) ;
2019-10-18 16:37:47 +01:00
DEFINE ( S_FP , offsetof ( struct pt_regs , regs [ 29 ] ) ) ;
2012-03-05 11:49:26 +00:00
DEFINE ( S_LR , offsetof ( struct pt_regs , regs [ 30 ] ) ) ;
DEFINE ( S_SP , offsetof ( struct pt_regs , sp ) ) ;
DEFINE ( S_PSTATE , offsetof ( struct pt_regs , pstate ) ) ;
DEFINE ( S_PC , offsetof ( struct pt_regs , pc ) ) ;
DEFINE ( S_SYSCALLNO , offsetof ( struct pt_regs , syscallno ) ) ;
arm64: uaccess: remove set_fs()
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 13:15:55 +00:00
DEFINE ( S_SDEI_TTBR1 , offsetof ( struct pt_regs , sdei_ttbr1 ) ) ;
2019-01-31 14:58:46 +00:00
DEFINE ( S_PMR_SAVE , offsetof ( struct pt_regs , pmr_save ) ) ;
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-22 18:45:33 +01:00
DEFINE ( S_STACKFRAME , offsetof ( struct pt_regs , stackframe ) ) ;
2021-01-12 09:58:13 +08:00
DEFINE ( PT_REGS_SIZE , sizeof ( struct pt_regs ) ) ;
2012-03-05 11:49:26 +00:00
BLANK ( ) ;
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-03 17:05:20 +00:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
DEFINE ( FREGS_X0 , offsetof ( struct ftrace_regs , regs [ 0 ] ) ) ;
DEFINE ( FREGS_X2 , offsetof ( struct ftrace_regs , regs [ 2 ] ) ) ;
DEFINE ( FREGS_X4 , offsetof ( struct ftrace_regs , regs [ 4 ] ) ) ;
DEFINE ( FREGS_X6 , offsetof ( struct ftrace_regs , regs [ 6 ] ) ) ;
DEFINE ( FREGS_X8 , offsetof ( struct ftrace_regs , regs [ 8 ] ) ) ;
DEFINE ( FREGS_FP , offsetof ( struct ftrace_regs , fp ) ) ;
DEFINE ( FREGS_LR , offsetof ( struct ftrace_regs , lr ) ) ;
DEFINE ( FREGS_SP , offsetof ( struct ftrace_regs , sp ) ) ;
DEFINE ( FREGS_PC , offsetof ( struct ftrace_regs , pc ) ) ;
DEFINE ( FREGS_SIZE , sizeof ( struct ftrace_regs ) ) ;
BLANK ( ) ;
# endif
2019-06-21 10:52:35 +01:00
# ifdef CONFIG_COMPAT
DEFINE ( COMPAT_SIGFRAME_REGS_OFFSET , offsetof ( struct compat_sigframe , uc . uc_mcontext . arm_r0 ) ) ;
DEFINE ( COMPAT_RT_SIGFRAME_REGS_OFFSET , offsetof ( struct compat_rt_sigframe , sig . uc . uc_mcontext . arm_r0 ) ) ;
BLANK ( ) ;
# endif
2015-10-06 18:46:24 +01:00
DEFINE ( MM_CONTEXT_ID , offsetof ( struct mm_struct , context . id . counter ) ) ;
2012-03-05 11:49:26 +00:00
BLANK ( ) ;
DEFINE ( VMA_VM_MM , offsetof ( struct vm_area_struct , vm_mm ) ) ;
DEFINE ( VMA_VM_FLAGS , offsetof ( struct vm_area_struct , vm_flags ) ) ;
BLANK ( ) ;
DEFINE ( VM_EXEC , VM_EXEC ) ;
BLANK ( ) ;
DEFINE ( PAGE_SZ , PAGE_SIZE ) ;
BLANK ( ) ;
DEFINE ( DMA_TO_DEVICE , DMA_TO_DEVICE ) ;
DEFINE ( DMA_FROM_DEVICE , DMA_FROM_DEVICE ) ;
BLANK ( ) ;
2018-03-29 15:13:23 +02:00
DEFINE ( PREEMPT_DISABLE_OFFSET , PREEMPT_DISABLE_OFFSET ) ;
2021-03-02 10:01:12 +01:00
DEFINE ( SOFTIRQ_SHIFT , SOFTIRQ_SHIFT ) ;
DEFINE ( IRQ_CPUSTAT_SOFTIRQ_PENDING , offsetof ( irq_cpustat_t , __softirq_pending ) ) ;
2018-03-29 15:13:23 +02:00
BLANK ( ) ;
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-03 20:23:13 +00:00
DEFINE ( CPU_BOOT_TASK , offsetof ( struct secondary_data , task ) ) ;
2016-02-23 10:31:42 +00:00
BLANK ( ) ;
2021-02-08 09:57:24 +00:00
DEFINE ( FTR_OVR_VAL_OFFSET , offsetof ( struct arm64_ftr_override , val ) ) ;
DEFINE ( FTR_OVR_MASK_OFFSET , offsetof ( struct arm64_ftr_override , mask ) ) ;
BLANK ( ) ;
2020-05-05 16:45:17 +01:00
# ifdef CONFIG_KVM
2012-12-10 16:40:18 +00:00
DEFINE ( VCPU_CONTEXT , offsetof ( struct kvm_vcpu , arch . ctxt ) ) ;
KVM: arm64: Handle RAS SErrors from EL2 on guest exit
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-15 19:39:05 +00:00
DEFINE ( VCPU_FAULT_DISR , offsetof ( struct kvm_vcpu , arch . fault . disr_el1 ) ) ;
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 10:12:35 +05:30
DEFINE ( VCPU_HCR_EL2 , offsetof ( struct kvm_vcpu , arch . hcr_el2 ) ) ;
2019-06-28 22:40:58 +01:00
DEFINE ( CPU_USER_PT_REGS , offsetof ( struct kvm_cpu_context , regs ) ) ;
2021-06-21 12:17:13 +01:00
DEFINE ( CPU_RGSR_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ RGSR_EL1 ] ) ) ;
DEFINE ( CPU_GCR_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ GCR_EL1 ] ) ) ;
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 10:12:35 +05:30
DEFINE ( CPU_APIAKEYLO_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ APIAKEYLO_EL1 ] ) ) ;
DEFINE ( CPU_APIBKEYLO_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ APIBKEYLO_EL1 ] ) ) ;
DEFINE ( CPU_APDAKEYLO_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ APDAKEYLO_EL1 ] ) ) ;
DEFINE ( CPU_APDBKEYLO_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ APDBKEYLO_EL1 ] ) ) ;
DEFINE ( CPU_APGAKEYLO_EL1 , offsetof ( struct kvm_cpu_context , sys_regs [ APGAKEYLO_EL1 ] ) ) ;
2017-10-08 17:01:56 +02:00
DEFINE ( HOST_CONTEXT_VCPU , offsetof ( struct kvm_cpu_context , __hyp_running_vcpu ) ) ;
2019-04-09 20:22:11 +01:00
DEFINE ( HOST_DATA_CONTEXT , offsetof ( struct kvm_host_data , host_ctxt ) ) ;
2020-12-02 18:41:07 +00:00
DEFINE ( NVHE_INIT_MAIR_EL2 , offsetof ( struct kvm_nvhe_init_params , mair_el2 ) ) ;
DEFINE ( NVHE_INIT_TCR_EL2 , offsetof ( struct kvm_nvhe_init_params , tcr_el2 ) ) ;
2020-12-02 18:41:06 +00:00
DEFINE ( NVHE_INIT_TPIDR_EL2 , offsetof ( struct kvm_nvhe_init_params , tpidr_el2 ) ) ;
DEFINE ( NVHE_INIT_STACK_HYP_VA , offsetof ( struct kvm_nvhe_init_params , stack_hyp_va ) ) ;
DEFINE ( NVHE_INIT_PGD_PA , offsetof ( struct kvm_nvhe_init_params , pgd_pa ) ) ;
2021-03-19 10:01:29 +00:00
DEFINE ( NVHE_INIT_HCR_EL2 , offsetof ( struct kvm_nvhe_init_params , hcr_el2 ) ) ;
DEFINE ( NVHE_INIT_VTTBR , offsetof ( struct kvm_nvhe_init_params , vttbr ) ) ;
DEFINE ( NVHE_INIT_VTCR , offsetof ( struct kvm_nvhe_init_params , vtcr ) ) ;
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 12:22:13 +01:00
# endif
2015-01-26 18:33:44 +00:00
# ifdef CONFIG_CPU_PM
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 12:22:13 +01:00
DEFINE ( CPU_CTX_SP , offsetof ( struct cpu_suspend_ctx , sp ) ) ;
DEFINE ( MPIDR_HASH_MASK , offsetof ( struct mpidr_hash , mask ) ) ;
DEFINE ( MPIDR_HASH_SHIFTS , offsetof ( struct mpidr_hash , shift_aff ) ) ;
2016-04-27 17:47:06 +01:00
DEFINE ( SLEEP_STACK_DATA_SYSTEM_REGS , offsetof ( struct sleep_stack_data , system_regs ) ) ;
DEFINE ( SLEEP_STACK_DATA_CALLEE_REGS , offsetof ( struct sleep_stack_data , callee_saved_regs ) ) ;
2012-12-10 16:40:18 +00:00
# endif
2017-02-01 11:28:27 -06:00
DEFINE ( ARM_SMCCC_RES_X0_OFFS , offsetof ( struct arm_smccc_res , a0 ) ) ;
DEFINE ( ARM_SMCCC_RES_X2_OFFS , offsetof ( struct arm_smccc_res , a2 ) ) ;
DEFINE ( ARM_SMCCC_QUIRK_ID_OFFS , offsetof ( struct arm_smccc_quirk , id ) ) ;
DEFINE ( ARM_SMCCC_QUIRK_STATE_OFFS , offsetof ( struct arm_smccc_quirk , state ) ) ;
2021-05-18 17:36:18 +01:00
DEFINE ( ARM_SMCCC_1_2_REGS_X0_OFFS , offsetof ( struct arm_smccc_1_2_regs , a0 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X2_OFFS , offsetof ( struct arm_smccc_1_2_regs , a2 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X4_OFFS , offsetof ( struct arm_smccc_1_2_regs , a4 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X6_OFFS , offsetof ( struct arm_smccc_1_2_regs , a6 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X8_OFFS , offsetof ( struct arm_smccc_1_2_regs , a8 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X10_OFFS , offsetof ( struct arm_smccc_1_2_regs , a10 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X12_OFFS , offsetof ( struct arm_smccc_1_2_regs , a12 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X14_OFFS , offsetof ( struct arm_smccc_1_2_regs , a14 ) ) ;
DEFINE ( ARM_SMCCC_1_2_REGS_X16_OFFS , offsetof ( struct arm_smccc_1_2_regs , a16 ) ) ;
2016-04-27 17:47:12 +01:00
BLANK ( ) ;
DEFINE ( HIBERN_PBE_ORIG , offsetof ( struct pbe , orig_address ) ) ;
DEFINE ( HIBERN_PBE_ADDR , offsetof ( struct pbe , address ) ) ;
DEFINE ( HIBERN_PBE_NEXT , offsetof ( struct pbe , next ) ) ;
2016-09-09 14:07:16 +01:00
DEFINE ( ARM64_FTR_SYSVAL , offsetof ( struct arm64_ftr_reg , sys_val ) ) ;
2017-11-14 14:14:17 +00:00
BLANK ( ) ;
# ifdef CONFIG_UNMAP_KERNEL_AT_EL0
DEFINE ( TRAMP_VALIAS , TRAMP_VALIAS ) ;
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 15:38:12 +00:00
# endif
# ifdef CONFIG_ARM_SDE_INTERFACE
DEFINE ( SDEI_EVENT_INTREGS , offsetof ( struct sdei_registered_event , interrupted_regs ) ) ;
DEFINE ( SDEI_EVENT_PRIORITY , offsetof ( struct sdei_registered_event , priority ) ) ;
2020-03-13 14:34:51 +05:30
# endif
# ifdef CONFIG_ARM64_PTR_AUTH
DEFINE ( PTRAUTH_USER_KEY_APIA , offsetof ( struct ptrauth_keys_user , apia ) ) ;
2021-06-13 11:26:32 +02:00
# ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
2020-03-13 14:34:56 +05:30
DEFINE ( PTRAUTH_KERNEL_KEY_APIA , offsetof ( struct ptrauth_keys_kernel , apia ) ) ;
2021-06-13 11:26:32 +02:00
# endif
2020-03-13 14:34:51 +05:30
BLANK ( ) ;
2021-09-30 14:31:05 +00:00
# endif
# ifdef CONFIG_KEXEC_CORE
DEFINE ( KIMAGE_ARCH_DTB_MEM , offsetof ( struct kimage , arch . dtb_mem ) ) ;
2021-09-30 14:31:06 +00:00
DEFINE ( KIMAGE_ARCH_EL2_VECTORS , offsetof ( struct kimage , arch . el2_vectors ) ) ;
2021-09-30 14:31:09 +00:00
DEFINE ( KIMAGE_ARCH_ZERO_PAGE , offsetof ( struct kimage , arch . zero_page ) ) ;
2021-09-30 14:31:10 +00:00
DEFINE ( KIMAGE_ARCH_PHYS_OFFSET , offsetof ( struct kimage , arch . phys_offset ) ) ;
2021-09-30 14:31:09 +00:00
DEFINE ( KIMAGE_ARCH_TTBR1 , offsetof ( struct kimage , arch . ttbr1 ) ) ;
2021-09-30 14:31:05 +00:00
DEFINE ( KIMAGE_HEAD , offsetof ( struct kimage , head ) ) ;
DEFINE ( KIMAGE_START , offsetof ( struct kimage , start ) ) ;
BLANK ( ) ;
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.
As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:
https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110
| # of failed: 0
| # of unresolved: 3
| # of untested: 0
| # of unsupported: 0
| # of xfailed: 1
| # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:46:03 +00:00
# endif
# ifdef CONFIG_FUNCTION_TRACER
DEFINE ( FTRACE_OPS_FUNC , offsetof ( struct ftrace_ops , func ) ) ;
2017-11-14 14:14:17 +00:00
# endif
2012-03-05 11:49:26 +00:00
return 0 ;
}