IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix the frame pointer handling in the function graph tracer entry and
exit code so we can enable HAVE_FUNCTION_GRAPH_FP_TEST. Instead of using
FP directly (which will have different values between the entry and exit
pieces of the function graph tracer), use the value of SP at entry and
exit, as we can derive the former value from the frame pointer.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Avoid explicit literal loads and instead, use accessor macros that
generate the optimal sequence depending on the architecture revision
being targeted.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tweak the ftrace return paths to avoid redundant loads of SP, as well as
unnecessary clobbering of IP.
This also fixes the inconsistency of using MOV to perform a function
return, which is sub-optimal on recent micro-architectures but more
importantly, does not perform an interworking return, unlike compiler
generated function returns in Thumb2 builds.
Let's fix this by popping PC from the stack like most ordinary code
does.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Kernel images that are large in comparison to the range of a direct
branch may fail to work as expected with ftrace, as patching a direct
branch to one of the core ftrace routines may not be possible from the
.init.text section, if it is emitted too far away from the normal .text
section.
This is more likely to affect Thumb2 builds, given that its range is
only -/+ 16 MiB (as opposed to ARM which has -/+ 32 MiB), but may occur
in either ISA.
To work around this, add a couple of trampolines to .init.text and
swap these in when the ftrace patching code is operating on callers in
.init.text.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The compiler emitted hook used for ftrace consists of a PUSH {LR} to
preserve the link register, followed by a branch-and-link (BL) to
__gnu_mount_nc. Dynamic ftrace patches away the latter to turn the
combined sequence into a NOP, using a POP {LR} instruction.
This is not necessary, since the link register does not get clobbered in
this case, and simply adding #4 to the stack pointer is sufficient, and
avoids a memory access that may take a few cycles to resolve depending
on the micro-architecture.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Using ADR to take the address of 'ftrace_stub' via a local label
produces an address that has the Thumb bit cleared, which means the
subsequent comparison is guaranteed to fail. Instead, use the badr
macro, which forces the Thumb bit to be set.
Fixes: a3ba87a61499 ("ARM: 6316/1: ftrace: add Thumb-2 support")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
This tag covers the changes between the version of vmap'ed + IRQ stacks
support pulled into rmk/devel-stable [0] (which was dropped from v5.17
due to issues discovered too late in the cycle), and my v5 proposed for
the v5.18 cycle [1].
[0] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git arm-irq-and-vmap-stacks-for-rmk
[1] https://lore.kernel.org/linux-arm-kernel/20220124174744.1054712-1-ardb@kernel.org/
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmH3+1oACgkQw08iOZLZ
jyRdyAv/TiYdEkpteCUz1MucDFEZsRz1FXYTUwFG5pxSIONUDdDm0KvjYoY80n7X
wUMZyfAwjdHpQtP0iu4RwAmi7d373KtWTqFzwAoBG9RFTSy/4j4B3ZzsPkoCn9uN
ANXpyJE2lqvN3d25WKnRq6+WGSxdvhYqBQARe1oznirgN4ilKtmBkKCL3W+gsO7l
N6q5DLsqSI80kAIorFUr0sF8b1JEK/APOokaAICLyP6fkjp3hu+jUvJENCsJk27V
rVHhFmKdtpwl02hs+I13I5nrAXwYN6COSBa9y0xuPRgBk2sgnpFKSMKAvYafwHhg
AYwUuez/Tk6AHHowu+/ggoap2At04l4rdwzV0BIE/+9vdT3C+4M5tikHglQnRjtR
PRyErdCPPEW6gz+fYdYoaCYXVfRGCQeCyInVQIl6U9HAqcVLPHNZecGz0rYBTQA2
GiUfi0YA3SASMIggP4mug4M5fwbgUbh/i3OgMYGcnCg+5phmR7Z+niJVN9j0uPf2
XMsCsTi/
=utwu
-----END PGP SIGNATURE-----
Merge tag 'arm-vmap-stacks-v6' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable
ARM: support for IRQ and vmap'ed stacks [v6]
This tag covers the changes between the version of vmap'ed + IRQ stacks
support pulled into rmk/devel-stable [0] (which was dropped from v5.17
due to issues discovered too late in the cycle), and my v5 proposed for
the v5.18 cycle [1].
[0] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git arm-irq-and-vmap-stacks-for-rmk
[1] https://lore.kernel.org/linux-arm-kernel/20220124174744.1054712-1-ardb@kernel.org/
The get_current() and __my_cpu_offset() accessors evaluate to only a
single instruction emitted inline, but due to the size of the asm string
that is created for SMP+v6 configurations, the compiler assumes
otherwise, and may emit the functions out of line instead.
So use __always_inline to avoid this.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Only SMP systems use the secondary startup path by definition, so there
is no need for SMP conditionals there.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rework the vmalloc_seq handling so it can be used safely under SMP, as
we started using it to ensure that vmap'ed stacks are guaranteed to be
mapped by the active mm before switching to a task, and here we need to
ensure that changes to the page tables are visible to other CPUs when
they observe a change in the sequence count.
Since LPAE needs none of this, fold a check against it into the
vmalloc_seq counter check after breaking it out into a separate static
inline helper.
Given that vmap'ed stacks are now also supported on !SMP configurations,
let's drop the WARN() that could potentially now fire spuriously.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Avoid using R9 in the IRQ handler code, as the entry code uses it for
tsk, and expects it to remain untouched between the IRQ entry and exit
code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Use the SMP_ON_UP patching framework to elide HWCAP_TLS tests from the
context switch and return to userspace code paths, as SMP systems are
guaranteed to have this h/w capability.
At the same time, omit the update of __entry_task if the system is
detected to be UP at runtime, as in that case, the value is never used.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Nathan reports the group relocations go out of range in pathological
cases such as allyesconfig kernels, which have little chance of actually
booting but are still used in validation.
So add a Kconfig symbol for this feature, and make it depend on
!COMPILE_TEST.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When onlining a CPU, switch to swapper_pg_dir as soon as possible so
that it is guaranteed that the vmap'ed stack is mapped before it is
used.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Nathan reports that the new get_current() and per-CPU offset accessors
may cause problems at build time due to the use of a literal to hold the
address of the respective variables. This is due to the fact that LLD
before v14 does not support the PC-relative group relocations that are
normally used for this, and the fallback relies on literals but does not
emit the literal pools explictly using the .ltorg directive.
./arch/arm/include/asm/current.h:53:6: error: out of range pc-relative fixup value
asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur));
^
./arch/arm/include/asm/insn.h:25:2: note: expanded from macro 'LOAD_SYM_ARMV6'
" ldr " #reg ", =" #sym " nt"
^
<inline asm>:1:3: note: instantiated into assembly here
ldr r0, =__current
^
Since emitting a literal pool in this particular case is not possible,
let's avoid the LOAD_SYM_ARMV6() entirely, and use the ordinary C
assigment instead.
As it turns out, there are other such cases, and here, using .ltorg to
emit the literal pool within range of the LDR instruction would be
possible due to the presence of an unconditional branch right after it.
Unfortunately, putting .ltorg directives in subsections appears to
confuse the Clang inline assembler, resulting in similar errors even
though the .ltorg is most definitely within range.
So let's fix this by emitting the literal explicitly, and not rely on
the assembler to figure this out. This means we have move the fallback
out of the LOAD_SYM_ARMV6() macro and into the callers.
Link: https://github.com/ClangBuiltLinux/linux/issues/1551
Fixes: 9c46929e7989 ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
There are several reports about the new vmap'ed stacks code breaking
suspend/resume on Exynos, Renesas and Tegra SMP platforms. While this is
under investigation, let's disable the vmap'ed stacks feature for the
time being for SMP configurations that have suspend/resume enabled.
[0] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-8-ardb@kernel.org/
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
This PR covers all the work related to implementing IRQ stacks and
vmap'ed stacks for all 32-bit ARM systems that are currently supported
by the Linux kernel, including RiscPC and Footbridge. It has been
submitted for review in three different waves:
- IRQ stacks support for v7 SMP systems [0],
- vmap'ed stacks support for v7 SMP systems[1],
- extending support for both IRQ stacks and vmap'ed stacks for all
remaining configurations, including v6/v7 SMP multiplatform kernels
and uniprocessor configurations including v7-M [2]
[0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
[1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
[2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmGuRqgACgkQw08iOZLZ
jyRV4QwAjGbIH4mS+S0OjBFap4CITRU3hS6WqHi4OKGmGS9PpnOTh3krU0pggU68
SsMQvc+ZxFl+bdvON1MQYVdLGxnXXIwoFXqHpBZYbXb1qOJ9dawRZS/KMs/STwbX
TxSVOX3hgWB+NB+QDZ5PwO27/R0IXVSDaanxlfa/ARef8VBSrmuu+I1QwXstOv+l
ViKwcjiY8v0ry8hK3DmSnc6GKaPQDiFgyUlKdG5Mk6sWhffuG9NjX8C66R6Bl2z+
96nspqU3V33l5LVNKLtUHnHv9bFhfPrNQ6F9UTSAKkHtewAOB1SocWQfuGevXgwu
1ondWnhfFUdOr7N4Zf+KT4NkkuvMTKQ4kA3QxG/8Y8jHxmbldfapo6dXVV/iPWSy
xnz08VbzsWolahZSs822i2NNNepdWRF9P8TwOfQVMynyH089R1HD7EItdN1XyawC
2s+D6CAMsHSGWT9doYTOmAemMaF32ysENn2adx7Iwg8VldzOpXGEHviX9gEiShp8
uDheZn0o
=AwPb
-----END PGP SIGNATURE-----
Merge tag 'arm-irq-and-vmap-stacks-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable
ARM: support for IRQ and vmap'ed stacks
This PR covers all the work related to implementing IRQ stacks and
vmap'ed stacks for all 32-bit ARM systems that are currently supported
by the Linux kernel, including RiscPC and Footbridge. It has been
submitted for review in three different waves:
- IRQ stacks support for v7 SMP systems [0],
- vmap'ed stacks support for v7 SMP systems[1],
- extending support for both IRQ stacks and vmap'ed stacks for all
remaining configurations, including v6/v7 SMP multiplatform kernels
and uniprocessor configurations including v7-M [2]
[0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
[1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
[2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/
Enable support for IRQ stacks on !MMU, and add the code to the IRQ entry
path to switch to the IRQ stack if not running from it already.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
On UP systems, only a single task can be 'current' at the same time,
which means we can use a global variable to track it. This means we can
also enable THREAD_INFO_IN_TASK for those systems, as in that case,
thread_info is accessed via current rather than the other way around,
removing the need to store thread_info at the base of the task stack.
This, in turn, permits us to enable IRQ stacks and vmap'ed stacks on UP
systems as well.
To partially mitigate the performance overhead of this arrangement, use
a ADD/ADD/LDR sequence with the appropriate PC-relative group
relocations to load the value of current when needed. This means that
accessing current will still only require a single load as before,
avoiding the need for a literal to carry the address of the global
variable in each function. However, accessing thread_info will now
require this load as well.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Defer TPIDURO updates for user space until exit also for CPU_V6+SMP
configurations so that we can decide at runtime whether to use it to
carry the current pointer, provided that we are running on a CPU that
actually implements this register. This is needed for
THREAD_INFO_IN_TASK support for UP systems, which requires that all SMP
capable systems use the TPIDRURO based access to 'current' as the only
remaining alternative will be a global variable which only works on UP.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Enable the use of the TLS register to hold the 'current' pointer also on
non-SMP configurations that target v6k or later CPUs. This will permit
the use of THREAD_INFO_IN_TASK as well as IRQ stacks and vmap'ed stacks
for such configurations.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Permit the use of the TPIDRPRW system register for carrying the per-CPU
offset in generic SMP configurations that also target non-SMP capable
ARMv6 cores. This uses the SMP_ON_UP code patching framework to turn all
TPIDRPRW accesses into reads/writes of entry #0 in the __per_cpu_offset
array.
While at it, switch over some existing direct TPIDRPRW accesses in asm
code to invocations of a new helper that is patched in the same way when
necessary.
Note that CPU_V6+SMP without SMP_ON_UP results in a kernel that does not
boot on v6 CPUs without SMP extensions, so add this dependency to
Kconfig as well.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
We will be adding variable loads to various hot paths, so it makes sense
to add a helper macro that can load variables from asm code without the
use of literal pool entries. On v7 or later, we can simply use MOVW/MOVT
pairs, but on earlier cores, this requires a bit of hackery to emit a
instruction sequence that implements this using a sequence of ADD/LDR
instructions.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Add support for the R_ARM_ALU_PC_Gn_NC and R_ARM_LDR_PC_G2 group
relocations [0] so we can use them in modules. These will be used to
load the current task pointer from a global variable without having to
rely on a literal pool entry to carry the address of this variable,
which may have a significant negative impact on cache utilization for
variables that are used often and in many different places, as each
occurrence will result in a literal pool entry and therefore a line in
the D-cache.
[0] 'ELF for the ARM architecture'
https://github.com/ARM-software/abi-aa/releases
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Tweak the UP stack protector handling code so that the thread info
pointer is preserved in R7 until set_current is called. This is needed
for a subsequent patch that implements THREAD_INFO_IN_TASK and
set_current for UP as well.
This also means we will prefer the per-task protector on UP systems that
implement the thread ID registers, so tweak the preprocessor
conditionals to reflect this.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Rather then restructuring the ARMv7M entrly logic per TODO, just move
NVIC to GENERIC_IRQ_MULTI_HANDLER.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The last user of arch_irq_handler_default is gone now, so the
entry-macro-multi.S file and all references to mach/entry-macro.S can
be removed, as well as the asm_do_IRQ() entrypoint into the interrupt
handling routines implemented in C.
Note: The ARMv7-M entry still uses its own top-level IRQ entry, calling
nvic_handle_irq() from assembly. This could be changed to go through
generic_handle_arch_irq() as well, but it's unclear to me if there are
any benefits.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: keep irq_handler macro as it carries all the IRQ stack handling]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
iop32x uses the entry-macro.S file for both the IRQ entry and for
hooking into the arch_ret_to_user code path. This is done because the
cp6 registers have to be enabled before accessing any of the interrupt
controller registers but have to be disabled when running in user space.
There is also a lazy-enable logic in cp6.c, but during a hardirq, we
know it has to be enabled.
Both the cp6-enable code and the code to read the IRQ status can be
lifted into the normal generic_handle_arch_irq() path, but the
cp6-disable code has to remain in the user return code. As nothing
other than iop32x uses this hook, just open-code it there with an
ifdef for the platform that can eventually be removed when iop32x
has reached the end of its life.
The cp6-enable path in the IRQ entry has an extra cp_wait barrier that
the trap version does not have, but it is harmless to do it in both
cases to simplify the logic here at the cost of a few extra cycles
for the trap.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
iop32x is one of the last platforms to use IRQ 0, and this has apparently
stopped working in a 2014 cleanup without anyone noticing. This interrupt
is used for the DMA engine, so most likely this has not actually worked
in the past 7 years, but it's also not essential for using this board.
I'm splitting out this change from my GENERIC_IRQ_MULTI_HANDLER
conversion so it can be backported if anyone cares.
Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: take +1 offset into account in mask/unmask and init as well]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Footbridge still uses the classic IRQ entry path in assembler,
but this is easily converted into an equivalent C version.
In this case, the correlation between IRQ numbers and bits in
the status register is non-obvious, and the priorities are
handled by manually checking each bit in a static order,
re-reading the status register after each handled event.
I moved the code into the new file and edited the syntax without
changing this sequence to keep the behavior as close as possible
to what it traditionally did.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
This is one of the last platforms using the old entry path.
While this code path is spread over a few files, it is fairly
straightforward to convert it into an equivalent C version,
leaving the existing algorithm and all the priority handling
the same.
Unlike most irqchip drivers, this means reading the status
register(s) in a loop and always handling the highest-priority
irq first.
The IOMD_IRQREQC and IOMD_IRQREQD registers are not actaully
used here, but I left the code in place for the time being,
to keep the conversion as direct as possible. It could be
removed in a cleanup on top.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: drop obsolete IOMD_IRQREQC/IOMD_IRQREQD handling]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
IOMD_IRQREQC nor IOMD_IRQREQD are ever defined, so any conditionally
compiled code that depends on them is dead code, and can be removed.
Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Wire up the generic support for managing task stack allocations via vmalloc,
and implement the entry code that detects whether we faulted because of a
stack overrun (or future stack overrun caused by pushing the pt_regs array)
While this adds a fair amount of tricky entry asm code, it should be
noted that it only adds a TST + branch to the svc_entry path. The code
implementing the non-trivial handling of the overflow stack is emitted
out-of-line into the .text section.
Since on ARM, we rely on do_translation_fault() to keep PMD level page
table entries that cover the vmalloc region up to date, we need to
ensure that we don't hit such a stale PMD entry when accessing the
stack. So we do a dummy read from the new stack while still running from
the old one on the context switch path, and bump the vmalloc_seq counter
when PMD level entries in the vmalloc range are modified, so that the MM
switch fetches the latest version of the entries.
Note that we need to increase the per-mode stack by 1 word, to gain some
space to stash a GPR until we know it is safe to touch the stack.
However, due to the cacheline alignment of the struct, this does not
actually increase the memory footprint of the struct stack array at all.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The original Thumb-2 enablement patches updated the stack realignment
code in svc_entry to work around the lack of a STMIB instruction in
Thumb-2, by subtracting 4 from the frame size, inverting the sense of
the misaligment check, and changing to a STMIA instruction and a final
stack push of a 4 byte quantity that results in the stack becoming
aligned at the end of the sequence. It also pushes and pops R0 to the
stack in order to have a temp register that Thumb-2 allows in general
purpose ALU instructions, as TST using SP is not permitted.
Both are a bit problematic for vmap'ed stacks, as using the stack is
only permitted after we decide that we did not overflow the stack, or
have already switched to the overflow stack.
As for the alignment check: the current approach creates a corner case
where, if the initial SUB of SP ends up right at the start of the stack,
we will end up subtracting another 8 bytes and overflowing it. This
means we would need to add the overflow check *after* the SUB that
deliberately misaligns the stack. However, this would require us to keep
local state (i.e., whether we performed the subtract or not) across the
overflow check, but without any GPRs or stack available.
So let's switch to an approach where we don't use the stack, and where
the alignment check of the stack pointer occurs in the usual way, as
this is guaranteed not to result in overflow. This means we will be able
to do the overflow check first.
While at it, switch to R1 so the mode stack pointer in R0 remains
accessible.
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The load-multiple instruction that essentially performs the switch_to
operation in ARM mode, by loading all callee save registers as well the
stack pointer and the program counter, is split into 3 separate loads
for Thumb-2, with the IP register used as a temporary to capture the
value of R4 before it gets overwritten.
We can clean this up a bit, by sticking with a single LDMIA instruction,
but one that pops SP and PC into IP and LR, respectively, and by using
ordinary move register and branch instructions to get those values into
SP and PC. This also allows us to move the set_current call closer to
the assignment of SP, reducing the window where those are mutually out
of sync. This is especially relevant for CONFIG_VMAP_STACK, which is
being introduced in a subsequent patch, where we need to issue a load
that might fault from the new stack while running from the old one, to
ensure that stale PMD entries in the VMALLOC space are synced up.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
When unwinding the stack from a stack overflow, we are likely to start
from a stack push instruction, given that this is the most common way to
grow the stack for compiler emitted code. This push instruction rarely
appears anywhere else than at offset 0x0 of the function, and if it
doesn't, the compiler tends to split up the unwind annotations, given
that the stack frame layout is apparently not the same throughout the
function.
This means that, in the general case, if the frame's PC points at the
first instruction covered by a certain unwind entry, there is no way the
stack frame that the unwind entry describes could have been created yet,
and so we are still on the stack frame of the caller in that case. So
treat this as a special case, and return with the new PC taken from the
frame's LR, without applying the unwind transformations to the virtual
register set.
This permits us to unwind the call stack on stack overflow when the
overflow was caused by a stack push on function entry.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The memset implementation carves up the code in different sections, each
covered with their own unwind info. In this case, it is done in a way
similar to how the compiler might do it, to disambiguate between parts
where the return address is in LR and the SP is unmodified, and parts
where a stack frame is live, and the unwinder needs to know the size of
the stack frame and the location of the return address within it.
Only the placement of the unwind directives is slightly odd: the stack
pushes are placed in the wrong sections, which may confuse the unwinder
when attempting to unwind with PC pointing at the stack push in
question.
So let's fix this up, by reordering the directives and instructions as
appropriate.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The memmove routine is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 207a6cb06990c ("ARM: 8224/1: Add unwinding support for memmove
function") addressed this by carving up the function in different chunks
as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the function, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Now that we have enabled IRQ stacks, any softIRQs that are handled over
the back of a hard IRQ will run from the IRQ stack as well. However, any
synchronous softirq processing that happens when re-enabling softIRQs
from task context will still execute on that task's stack.
Since any call to local_bh_enable() at any level in the task's call
stack may trigger a softIRQ processing run, which could potentially
cause a task stack overflow if the combined stack footprints exceed the
stack's size, let's run these synchronous invocations of do_softirq() on
the IRQ stack as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Restructure the code and add the unwind annotations so that both the
frame pointer unwinder as well as the EHABI unwind info based unwinder
will be able to follow the call stack through call_with_stack().
Since GCC and Clang use different formats for the stack frame, two
methods are implemented: a GCC version that pushes fp, sp, lr and pc for
compatibility with the frame pointer unwinder, and a second version that
works with Clang, as well as with the EHABI unwinder both in ARM and
Thumb2 modes.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Now that we no longer rely on the stack pointer to access the current
task struct or thread info, we can implement support for IRQ stacks
cleanly as well.
Define a per-CPU IRQ stack and switch to this stack when taking an IRQ,
provided that we were not already using that stack in the interrupted
context. This is never the case for IRQs taken from user space, but ones
taken while running in the kernel could fire while one taken from user
space has not completed yet.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The Clang backtrace code dereferences the link register value pulled
from the stack to decide whether the caller was a branch-and-link
instruction, in order to subsequently decode the offset to find the
start of the calling function. Unlike other loads in this routine, this
one is not protected by a fixup, and may therefore cause a crash if the
address in question is bogus.
So let's fix this, by treating the fault as a failure to decode the 'bl'
instruction. To avoid a label renum, reuse a fixup label that guards an
instruction that cannot fault to begin with.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The existing code that dumps the contents of the pt_regs structure
passed to __entry routines does so while unwinding the callee frame, and
dereferences the stack pointer as a struct pt_regs*. This will no longer
work when we enable support for IRQ or overflow stacks, because the
struct pt_regs may live on the task stack, while we are executing from
another stack.
The unwinder has access to this information, but only while unwinding
the calling frame. So let's combine the exception stack dumping code
with the handling of the calling frame as well. By printing it before
dumping the caller/callee addresses, the output order is preserved.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
The unwind info based stack unwinder will make its own call to
dump_mem() to dump the exception stack, so give it external linkage.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Implement support in the unwinder for dealing with multiple stacks.
This will be needed once we add support for IRQ stacks, or for the
overflow stack used by the vmap'ed stacks code.
This involves tracking the unwind opcodes that either update the virtual
stack pointer from another virtual register, or perform an explicit
subtract on the virtual stack pointer, and updating the low and high
bounds that we use to sanitize the stack pointer accordingly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Add a bl_r macro that abstract the difference between the ways indirect
calls are performed on older and newer ARM architecture revisions.
The main difference is to prefer blx instructions over explicit LR
assignments when possible, as these tend to confuse the prediction logic
in out-of-order cores when speculating across a function return.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
This code appears to be no longer used so let's get rid of it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Currently, we implement the per-task stack protector for ARM using a GCC
plugin, due to lack of native compiler support. However, work is
underway to get this implemented in the compiler, which means we will be
able to deprecate the GCC plugin at some point.
In the meantime, we will need to support both, where the native compiler
implementation is obviously preferred. So let's wire this up in Kconfig
and the Makefile.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M