linux

iv/linux

Author	SHA1	Message	Date
Ard Biesheuvel	953f534a7e	ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST Fix the frame pointer handling in the function graph tracer entry and exit code so we can enable HAVE_FUNCTION_GRAPH_FP_TEST. Instead of using FP directly (which will have different values between the entry and exit pieces of the function graph tracer), use the value of SP at entry and exit, as we can derive the former value from the frame pointer. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-02-09 09:12:33 +01:00
Ard Biesheuvel	65aa7e342a	ARM: ftrace: avoid unnecessary literal loads Avoid explicit literal loads and instead, use accessor macros that generate the optimal sequence depending on the architecture revision being targeted. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-02-09 09:12:33 +01:00
Ard Biesheuvel	d119678708	ARM: ftrace: avoid redundant loads or clobbering IP Tweak the ftrace return paths to avoid redundant loads of SP, as well as unnecessary clobbering of IP. This also fixes the inconsistency of using MOV to perform a function return, which is sub-optimal on recent micro-architectures but more importantly, does not perform an interworking return, unlike compiler generated function returns in Thumb2 builds. Let's fix this by popping PC from the stack like most ordinary code does. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-02-09 09:12:32 +01:00
Ard Biesheuvel	dc438db582	ARM: ftrace: use trampolines to keep .init.text in branching range Kernel images that are large in comparison to the range of a direct branch may fail to work as expected with ftrace, as patching a direct branch to one of the core ftrace routines may not be possible from the .init.text section, if it is emitted too far away from the normal .text section. This is more likely to affect Thumb2 builds, given that its range is only -/+ 16 MiB (as opposed to ARM which has -/+ 32 MiB), but may occur in either ISA. To work around this, add a couple of trampolines to .init.text and swap these in when the ftrace patching code is operating on callers in .init.text. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-02-09 09:12:32 +01:00
Ard Biesheuvel	ad1c2f39fd	ARM: ftrace: use ADD not POP to counter PUSH at entry The compiler emitted hook used for ftrace consists of a PUSH {LR} to preserve the link register, followed by a branch-and-link (BL) to __gnu_mount_nc. Dynamic ftrace patches away the latter to turn the combined sequence into a NOP, using a POP {LR} instruction. This is not necessary, since the link register does not get clobbered in this case, and simply adding #4 to the stack pointer is sufficient, and avoids a memory access that may take a few cycles to resolve depending on the micro-architecture. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-02-09 09:12:32 +01:00
Ard Biesheuvel	dd88b03ff0	ARM: ftrace: ensure that ADR takes the Thumb bit into account Using ADR to take the address of 'ftrace_stub' via a local label produces an address that has the Thumb bit cleared, which means the subsequent comparison is guaranteed to fail. Instead, use the badr macro, which forces the Thumb bit to be set. Fixes: a3ba87a61499 ("ARM: 6316/1: ftrace: add Thumb-2 support") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>	2022-02-09 09:12:32 +01:00
Russell King (Oracle)	2fa3948244	ARM: support for IRQ and vmap'ed stacks [v6] This tag covers the changes between the version of vmap'ed + IRQ stacks support pulled into rmk/devel-stable [0] (which was dropped from v5.17 due to issues discovered too late in the cycle), and my v5 proposed for the v5.18 cycle [1]. [0] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git arm-irq-and-vmap-stacks-for-rmk [1] https://lore.kernel.org/linux-arm-kernel/20220124174744.1054712-1-ardb@kernel.org/ -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmH3+1oACgkQw08iOZLZ jyRdyAv/TiYdEkpteCUz1MucDFEZsRz1FXYTUwFG5pxSIONUDdDm0KvjYoY80n7X wUMZyfAwjdHpQtP0iu4RwAmi7d373KtWTqFzwAoBG9RFTSy/4j4B3ZzsPkoCn9uN ANXpyJE2lqvN3d25WKnRq6+WGSxdvhYqBQARe1oznirgN4ilKtmBkKCL3W+gsO7l N6q5DLsqSI80kAIorFUr0sF8b1JEK/APOokaAICLyP6fkjp3hu+jUvJENCsJk27V rVHhFmKdtpwl02hs+I13I5nrAXwYN6COSBa9y0xuPRgBk2sgnpFKSMKAvYafwHhg AYwUuez/Tk6AHHowu+/ggoap2At04l4rdwzV0BIE/+9vdT3C+4M5tikHglQnRjtR PRyErdCPPEW6gz+fYdYoaCYXVfRGCQeCyInVQIl6U9HAqcVLPHNZecGz0rYBTQA2 GiUfi0YA3SASMIggP4mug4M5fwbgUbh/i3OgMYGcnCg+5phmR7Z+niJVN9j0uPf2 XMsCsTi/ =utwu -----END PGP SIGNATURE----- Merge tag 'arm-vmap-stacks-v6' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable ARM: support for IRQ and vmap'ed stacks [v6] This tag covers the changes between the version of vmap'ed + IRQ stacks support pulled into rmk/devel-stable [0] (which was dropped from v5.17 due to issues discovered too late in the cycle), and my v5 proposed for the v5.18 cycle [1]. [0] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git arm-irq-and-vmap-stacks-for-rmk [1] https://lore.kernel.org/linux-arm-kernel/20220124174744.1054712-1-ardb@kernel.org/	2022-01-31 15:26:45 +00:00
Ard Biesheuvel	4d5a643e73	ARM: make get_current() and __my_cpu_offset() __always_inline The get_current() and __my_cpu_offset() accessors evaluate to only a single instruction emitted inline, but due to the size of the asm string that is created for SMP+v6 configurations, the compiler assumes otherwise, and may emit the functions out of line instead. So use __always_inline to avoid this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-31 16:06:35 +01:00
Ard Biesheuvel	57a420435e	ARM: drop pointless SMP check on secondary startup path Only SMP systems use the secondary startup path by definition, so there is no need for SMP conditionals there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-25 09:53:52 +01:00
Ard Biesheuvel	a14a96d756	ARM: iop: make iop_handle_irq() static The build bots complain about iop_handle_irq() not being declared so let's make it static instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-25 09:53:52 +01:00
Ard Biesheuvel	d31e23aff0	ARM: mm: make vmalloc_seq handling SMP safe Rework the vmalloc_seq handling so it can be used safely under SMP, as we started using it to ensure that vmap'ed stacks are guaranteed to be mapped by the active mm before switching to a task, and here we need to ensure that changes to the page tables are visible to other CPUs when they observe a change in the sequence count. Since LPAE needs none of this, fold a check against it into the vmalloc_seq counter check after breaking it out into a separate static inline helper. Given that vmap'ed stacks are now also supported on !SMP configurations, let's drop the WARN() that could potentially now fire spuriously. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-25 09:53:52 +01:00
Ard Biesheuvel	aa0a20f521	ARM: entry: avoid clobbering R9 in IRQ handler Avoid using R9 in the IRQ handler code, as the entry code uses it for tsk, and expects it to remain untouched between the IRQ entry and exit code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-25 09:53:52 +01:00
Ard Biesheuvel	75fa4adc4f	ARM: smp: elide HWCAP_TLS checks or __entry_task updates on SMP+v6 Use the SMP_ON_UP patching framework to elide HWCAP_TLS tests from the context switch and return to userspace code paths, as SMP systems are guaranteed to have this h/w capability. At the same time, omit the update of __entry_task if the system is detected to be UP at runtime, as in that case, the value is never used. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-25 09:53:52 +01:00
Ard Biesheuvel	d6905849f8	ARM: assembler: define a Kconfig symbol for group relocation support Nathan reports the group relocations go out of range in pathological cases such as allyesconfig kernels, which have little chance of actually booting but are still used in validation. So add a Kconfig symbol for this feature, and make it depend on !COMPILE_TEST. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-24 21:02:34 +01:00
Ard Biesheuvel	8b806b82bc	ARM: mm: switch to swapper_pg_dir early for vmap'ed stack When onlining a CPU, switch to swapper_pg_dir as soon as possible so that it is guaranteed that the vmap'ed stack is mapped before it is used. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-01-24 20:37:55 +01:00
Ard Biesheuvel	5fe41793bc	ARM: 9176/1: avoid literal references in inline assembly Nathan reports that the new get_current() and per-CPU offset accessors may cause problems at build time due to the use of a literal to hold the address of the respective variables. This is due to the fact that LLD before v14 does not support the PC-relative group relocations that are normally used for this, and the fallback relies on literals but does not emit the literal pools explictly using the .ltorg directive. ./arch/arm/include/asm/current.h:53:6: error: out of range pc-relative fixup value asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur)); ^ ./arch/arm/include/asm/insn.h:25:2: note: expanded from macro 'LOAD_SYM_ARMV6' " ldr " #reg ", =" #sym " nt" ^ <inline asm>:1:3: note: instantiated into assembly here ldr r0, =__current ^ Since emitting a literal pool in this particular case is not possible, let's avoid the LOAD_SYM_ARMV6() entirely, and use the ordinary C assigment instead. As it turns out, there are other such cases, and here, using .ltorg to emit the literal pool within range of the LDR instruction would be possible due to the presence of an unconditional branch right after it. Unfortunately, putting .ltorg directives in subsections appears to confuse the Clang inline assembler, resulting in similar errors even though the .ltorg is most definitely within range. So let's fix this by emitting the literal explicitly, and not rely on the assembler to figure this out. This means we have move the fallback out of the LOAD_SYM_ARMV6() macro and into the callers. Link: https://github.com/ClangBuiltLinux/linux/issues/1551 Fixes: 9c46929e7989 ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	2022-01-06 12:58:58 +00:00
Ard Biesheuvel	23d9a9280e	ARM: 9177/1: disable vmap'ed stacks on suspend-capable SMP configs There are several reports about the new vmap'ed stacks code breaking suspend/resume on Exynos, Renesas and Tegra SMP platforms. While this is under investigation, let's disable the vmap'ed stacks feature for the time being for SMP configurations that have suspend/resume enabled. [0] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-8-ardb@kernel.org/ Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	2022-01-05 14:51:13 +00:00
Russell King (Oracle)	9cf72c358a	ARM: support for IRQ and vmap'ed stacks This PR covers all the work related to implementing IRQ stacks and vmap'ed stacks for all 32-bit ARM systems that are currently supported by the Linux kernel, including RiscPC and Footbridge. It has been submitted for review in three different waves: - IRQ stacks support for v7 SMP systems [0], - vmap'ed stacks support for v7 SMP systems[1], - extending support for both IRQ stacks and vmap'ed stacks for all remaining configurations, including v6/v7 SMP multiplatform kernels and uniprocessor configurations including v7-M [2] [0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/ [1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/ [2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/ -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmGuRqgACgkQw08iOZLZ jyRV4QwAjGbIH4mS+S0OjBFap4CITRU3hS6WqHi4OKGmGS9PpnOTh3krU0pggU68 SsMQvc+ZxFl+bdvON1MQYVdLGxnXXIwoFXqHpBZYbXb1qOJ9dawRZS/KMs/STwbX TxSVOX3hgWB+NB+QDZ5PwO27/R0IXVSDaanxlfa/ARef8VBSrmuu+I1QwXstOv+l ViKwcjiY8v0ry8hK3DmSnc6GKaPQDiFgyUlKdG5Mk6sWhffuG9NjX8C66R6Bl2z+ 96nspqU3V33l5LVNKLtUHnHv9bFhfPrNQ6F9UTSAKkHtewAOB1SocWQfuGevXgwu 1ondWnhfFUdOr7N4Zf+KT4NkkuvMTKQ4kA3QxG/8Y8jHxmbldfapo6dXVV/iPWSy xnz08VbzsWolahZSs822i2NNNepdWRF9P8TwOfQVMynyH089R1HD7EItdN1XyawC 2s+D6CAMsHSGWT9doYTOmAemMaF32ysENn2adx7Iwg8VldzOpXGEHviX9gEiShp8 uDheZn0o =AwPb -----END PGP SIGNATURE----- Merge tag 'arm-irq-and-vmap-stacks-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable ARM: support for IRQ and vmap'ed stacks This PR covers all the work related to implementing IRQ stacks and vmap'ed stacks for all 32-bit ARM systems that are currently supported by the Linux kernel, including RiscPC and Footbridge. It has been submitted for review in three different waves: - IRQ stacks support for v7 SMP systems [0], - vmap'ed stacks support for v7 SMP systems[1], - extending support for both IRQ stacks and vmap'ed stacks for all remaining configurations, including v6/v7 SMP multiplatform kernels and uniprocessor configurations including v7-M [2] [0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/ [1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/ [2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/	2021-12-17 11:48:13 +00:00
Ard Biesheuvel	cafc0eab16	ARM: v7m: enable support for IRQ stacks Enable support for IRQ stacks on !MMU, and add the code to the IRQ entry path to switch to the IRQ stack if not running from it already. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:17 +01:00
Ard Biesheuvel	9c46929e79	ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems On UP systems, only a single task can be 'current' at the same time, which means we can use a global variable to track it. This means we can also enable THREAD_INFO_IN_TASK for those systems, as in that case, thread_info is accessed via current rather than the other way around, removing the need to store thread_info at the base of the task stack. This, in turn, permits us to enable IRQ stacks and vmap'ed stacks on UP systems as well. To partially mitigate the performance overhead of this arrangement, use a ADD/ADD/LDR sequence with the appropriate PC-relative group relocations to load the value of current when needed. This means that accessing current will still only require a single load as before, avoiding the need for a literal to carry the address of the global variable in each function. However, accessing thread_info will now require this load as well. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:17 +01:00
Ard Biesheuvel	c275591037	ARM: smp: defer TPIDRURO update for SMP v6 configurations too Defer TPIDURO updates for user space until exit also for CPU_V6+SMP configurations so that we can decide at runtime whether to use it to carry the current pointer, provided that we are running on a CPU that actually implements this register. This is needed for THREAD_INFO_IN_TASK support for UP systems, which requires that all SMP capable systems use the TPIDRURO based access to 'current' as the only remaining alternative will be a global variable which only works on UP. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:17 +01:00
Ard Biesheuvel	b87cf9118e	ARM: use TLS register for 'current' on !SMP as well Enable the use of the TLS register to hold the 'current' pointer also on non-SMP configurations that target v6k or later CPUs. This will permit the use of THREAD_INFO_IN_TASK as well as IRQ stacks and vmap'ed stacks for such configurations. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:17 +01:00
Ard Biesheuvel	7b9896c352	ARM: percpu: add SMP_ON_UP support Permit the use of the TPIDRPRW system register for carrying the per-CPU offset in generic SMP configurations that also target non-SMP capable ARMv6 cores. This uses the SMP_ON_UP code patching framework to turn all TPIDRPRW accesses into reads/writes of entry #0 in the __per_cpu_offset array. While at it, switch over some existing direct TPIDRPRW accesses in asm code to invocations of a new helper that is patched in the same way when necessary. Note that CPU_V6+SMP without SMP_ON_UP results in a kernel that does not boot on v6 CPUs without SMP extensions, so add this dependency to Kconfig as well. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:17 +01:00
Ard Biesheuvel	4e918ab13e	ARM: assembler: add optimized ldr/str macros to load variables from memory We will be adding variable loads to various hot paths, so it makes sense to add a helper macro that can load variables from asm code without the use of literal pool entries. On v7 or later, we can simply use MOVW/MOVT pairs, but on earlier cores, this requires a bit of hackery to emit a instruction sequence that implements this using a sequence of ADD/LDR instructions. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:16 +01:00
Ard Biesheuvel	1fa8c4b195	ARM: module: implement support for PC-relative group relocations Add support for the R_ARM_ALU_PC_Gn_NC and R_ARM_LDR_PC_G2 group relocations [0] so we can use them in modules. These will be used to load the current task pointer from a global variable without having to rely on a literal pool entry to carry the address of this variable, which may have a significant negative impact on cache utilization for variables that are used often and in many different places, as each occurrence will result in a literal pool entry and therefore a line in the D-cache. [0] 'ELF for the ARM architecture' https://github.com/ARM-software/abi-aa/releases Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:16 +01:00
Ard Biesheuvel	831a469bc1	ARM: entry: preserve thread_info pointer in switch_to Tweak the UP stack protector handling code so that the thread info pointer is preserved in R7 until set_current is called. This is needed for a subsequent patch that implements THREAD_INFO_IN_TASK and set_current for UP as well. This also means we will prefer the per-task protector on UP systems that implement the thread ID registers, so tweak the preprocessor conditionals to reflect this. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:16 +01:00
Vladimir Murzin	52d2408717	irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER Rather then restructuring the ARMv7M entrly logic per TODO, just move NVIC to GENERIC_IRQ_MULTI_HANDLER. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:16 +01:00
Arnd Bergmann	54f481a230	ARM: remove old-style irq entry The last user of arch_irq_handler_default is gone now, so the entry-macro-multi.S file and all references to mach/entry-macro.S can be removed, as well as the asm_do_IRQ() entrypoint into the interrupt handling routines implemented in C. Note: The ARMv7-M entry still uses its own top-level IRQ entry, calling nvic_handle_irq() from assembly. This could be changed to go through generic_handle_arch_irq() as well, but it's unclear to me if there are any benefits. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: keep irq_handler macro as it carries all the IRQ stack handling] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>	2021-12-06 12:49:11 +01:00
Arnd Bergmann	6f5d248d05	ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER iop32x uses the entry-macro.S file for both the IRQ entry and for hooking into the arch_ret_to_user code path. This is done because the cp6 registers have to be enabled before accessing any of the interrupt controller registers but have to be disabled when running in user space. There is also a lazy-enable logic in cp6.c, but during a hardirq, we know it has to be enabled. Both the cp6-enable code and the code to read the IRQ status can be lifted into the normal generic_handle_arch_irq() path, but the cp6-disable code has to remain in the user return code. As nothing other than iop32x uses this hook, just open-code it there with an ifdef for the platform that can eventually be removed when iop32x has reached the end of its life. The cp6-enable path in the IRQ entry has an extra cp_wait barrier that the trap version does not have, but it is harmless to do it in both cases to simplify the logic here at the cost of a few extra cycles for the trap. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-06 12:49:04 +01:00
Arnd Bergmann	9d67412f24	ARM: iop32x: offset IRQ numbers by 1 iop32x is one of the last platforms to use IRQ 0, and this has apparently stopped working in a 2014 cleanup without anyone noticing. This interrupt is used for the DMA engine, so most likely this has not actually worked in the past 7 years, but it's also not essential for using this board. I'm splitting out this change from my GENERIC_IRQ_MULTI_HANDLER conversion so it can be backported if anyone cares. Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq") Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: take +1 offset into account in mask/unmask and init as well] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>	2021-12-06 12:48:52 +01:00
Arnd Bergmann	90890f17cc	ARM: footbridge: use GENERIC_IRQ_MULTI_HANDLER Footbridge still uses the classic IRQ entry path in assembler, but this is easily converted into an equivalent C version. In this case, the correlation between IRQ numbers and bits in the status register is non-obvious, and the priorities are handled by manually checking each bit in a static order, re-reading the status register after each handled event. I moved the code into the new file and edited the syntax without changing this sequence to keep the behavior as close as possible to what it traditionally did. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>	2021-12-06 12:48:47 +01:00
Arnd Bergmann	c1fe8d054c	ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER This is one of the last platforms using the old entry path. While this code path is spread over a few files, it is fairly straightforward to convert it into an equivalent C version, leaving the existing algorithm and all the priority handling the same. Unlike most irqchip drivers, this means reading the status register(s) in a loop and always handling the highest-priority irq first. The IOMD_IRQREQC and IOMD_IRQREQD registers are not actaully used here, but I left the code in place for the time being, to keep the conversion as direct as possible. It could be removed in a cleanup on top. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: drop obsolete IOMD_IRQREQC/IOMD_IRQREQD handling] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 18:43:38 +01:00
Ard Biesheuvel	d60ff2e766	ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups IOMD_IRQREQC nor IOMD_IRQREQD are ever defined, so any conditionally compiled code that depends on them is dead code, and can be removed. Suggested-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2021-12-03 18:42:22 +01:00
Ard Biesheuvel	a1c510d0ad	ARM: implement support for vmap'ed stacks Wire up the generic support for managing task stack allocations via vmalloc, and implement the entry code that detects whether we faulted because of a stack overrun (or future stack overrun caused by pushing the pt_regs array) While this adds a fair amount of tricky entry asm code, it should be noted that it only adds a TST + branch to the svc_entry path. The code implementing the non-trivial handling of the overflow stack is emitted out-of-line into the .text section. Since on ARM, we rely on do_translation_fault() to keep PMD level page table entries that cover the vmalloc region up to date, we need to ensure that we don't hit such a stale PMD entry when accessing the stack. So we do a dummy read from the new stack while still running from the old one on the context switch path, and bump the vmalloc_seq counter when PMD level entries in the vmalloc range are modified, so that the MM switch fetches the latest version of the entries. Note that we need to increase the per-mode stack by 1 word, to gain some space to stash a GPR until we know it is safe to touch the stack. However, due to the cacheline alignment of the struct, this does not actually increase the memory footprint of the struct stack array at all. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:33 +01:00
Ard Biesheuvel	ae5cc07da8	ARM: entry: rework stack realignment code in svc_entry The original Thumb-2 enablement patches updated the stack realignment code in svc_entry to work around the lack of a STMIB instruction in Thumb-2, by subtracting 4 from the frame size, inverting the sense of the misaligment check, and changing to a STMIA instruction and a final stack push of a 4 byte quantity that results in the stack becoming aligned at the end of the sequence. It also pushes and pops R0 to the stack in order to have a temp register that Thumb-2 allows in general purpose ALU instructions, as TST using SP is not permitted. Both are a bit problematic for vmap'ed stacks, as using the stack is only permitted after we decide that we did not overflow the stack, or have already switched to the overflow stack. As for the alignment check: the current approach creates a corner case where, if the initial SUB of SP ends up right at the start of the stack, we will end up subtracting another 8 bytes and overflowing it. This means we would need to add the overflow check after the SUB that deliberately misaligns the stack. However, this would require us to keep local state (i.e., whether we performed the subtract or not) across the overflow check, but without any GPRs or stack available. So let's switch to an approach where we don't use the stack, and where the alignment check of the stack pointer occurs in the usual way, as this is guaranteed not to result in overflow. This means we will be able to do the overflow check first. While at it, switch to R1 so the mode stack pointer in R0 remains accessible. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:33 +01:00
Ard Biesheuvel	b832faec33	ARM: switch_to: clean up Thumb2 code path The load-multiple instruction that essentially performs the switch_to operation in ARM mode, by loading all callee save registers as well the stack pointer and the program counter, is split into 3 separate loads for Thumb-2, with the IP register used as a temporary to capture the value of R4 before it gets overwritten. We can clean this up a bit, by sticking with a single LDMIA instruction, but one that pops SP and PC into IP and LR, respectively, and by using ordinary move register and branch instructions to get those values into SP and PC. This also allows us to move the set_current call closer to the assignment of SP, reducing the window where those are mutually out of sync. This is especially relevant for CONFIG_VMAP_STACK, which is being introduced in a subsequent patch, where we need to issue a load that might fault from the new stack while running from the old one, to ensure that stale PMD entries in the VMALLOC space are synced up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	532319b9c4	ARM: unwind: disregard unwind info before stack frame is set up When unwinding the stack from a stack overflow, we are likely to start from a stack push instruction, given that this is the most common way to grow the stack for compiler emitted code. This push instruction rarely appears anywhere else than at offset 0x0 of the function, and if it doesn't, the compiler tends to split up the unwind annotations, given that the stack frame layout is apparently not the same throughout the function. This means that, in the general case, if the frame's PC points at the first instruction covered by a certain unwind entry, there is no way the stack frame that the unwind entry describes could have been created yet, and so we are still on the stack frame of the caller in that case. So treat this as a special case, and return with the new PC taken from the frame's LR, without applying the unwind transformations to the virtual register set. This permits us to unwind the call stack on stack overflow when the overflow was caused by a stack push on function entry. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	ad3d09b547	ARM: memset: clean up unwind annotations The memset implementation carves up the code in different sections, each covered with their own unwind info. In this case, it is done in a way similar to how the compiler might do it, to disambiguate between parts where the return address is in LR and the SP is unmodified, and parts where a stack frame is live, and the unwinder needs to know the size of the stack frame and the location of the return address within it. Only the placement of the unwind directives is slightly odd: the stack pushes are placed in the wrong sections, which may confuse the unwinder when attempting to unwind with PC pointing at the stack push in question. So let's fix this up, by reordering the directives and instructions as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	ccb81601ac	ARM: memmove: use frame pointer as unwind anchor The memmove routine is a bit unusual in the way it manages the stack pointer: depending on the execution path through the function, the SP assumes different values as different subsets of the register file are preserved and restored again. This is problematic when it comes to EHABI unwind info, as it is not instruction accurate, and does not allow tracking the SP value as it changes. Commit 207a6cb06990c ("ARM: 8224/1: Add unwinding support for memmove function") addressed this by carving up the function in different chunks as far as the unwinder is concerned, and keeping a set of unwind directives for each of them, each corresponding with the state of the stack pointer during execution of the chunk in question. This not only duplicates unwind info unnecessarily, but it also complicates unwinding the stack upon overflow. Instead, let's do what the compiler does when the SP is updated halfway through a function, which is to use a frame pointer and emit the appropriate unwind directives to communicate this to the unwinder. Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's avoid touching R7 in the body of the function, so that Thumb-2 can use it as the frame pointer. R11 was not modified in the first place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	ba999a0402	ARM: memcpy: use frame pointer as unwind anchor The memcpy template is a bit unusual in the way it manages the stack pointer: depending on the execution path through the function, the SP assumes different values as different subsets of the register file are preserved and restored again. This is problematic when it comes to EHABI unwind info, as it is not instruction accurate, and does not allow tracking the SP value as it changes. Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory copy functions") addressed this by carving up the function in different chunks as far as the unwinder is concerned, and keeping a set of unwind directives for each of them, each corresponding with the state of the stack pointer during execution of the chunk in question. This not only duplicates unwind info unnecessarily, but it also complicates unwinding the stack upon overflow. Instead, let's do what the compiler does when the SP is updated halfway through a function, which is to use a frame pointer and emit the appropriate unwind directives to communicate this to the unwinder. Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's avoid touching R7 in the body of the template, so that Thumb-2 can use it as the frame pointer. R11 was not modified in the first place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	9974f85776	ARM: run softirqs on the per-CPU IRQ stack Now that we have enabled IRQ stacks, any softIRQs that are handled over the back of a hard IRQ will run from the IRQ stack as well. However, any synchronous softirq processing that happens when re-enabling softIRQs from task context will still execute on that task's stack. Since any call to local_bh_enable() at any level in the task's call stack may trigger a softIRQ processing run, which could potentially cause a task stack overflow if the combined stack footprints exceed the stack's size, let's run these synchronous invocations of do_softirq() on the IRQ stack as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	0b78f2e92d	ARM: call_with_stack: add unwind support Restructure the code and add the unwind annotations so that both the frame pointer unwinder as well as the EHABI unwind info based unwinder will be able to follow the call stack through call_with_stack(). Since GCC and Clang use different formats for the stack frame, two methods are implemented: a GCC version that pushes fp, sp, lr and pc for compatibility with the frame pointer unwinder, and a second version that works with Clang, as well as with the EHABI unwinder both in ARM and Thumb2 modes. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:32 +01:00
Ard Biesheuvel	d4664b6c98	ARM: implement IRQ stacks Now that we no longer rely on the stack pointer to access the current task struct or thread info, we can implement support for IRQ stacks cleanly as well. Define a per-CPU IRQ stack and switch to this stack when taking an IRQ, provided that we were not already using that stack in the interrupted context. This is never the case for IRQs taken from user space, but ones taken while running in the kernel could fire while one taken from user space has not completed yet. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	eae9523fdd	ARM: backtrace-clang: avoid crash on bogus frame pointer The Clang backtrace code dereferences the link register value pulled from the stack to decide whether the caller was a branch-and-link instruction, in order to subsequently decode the offset to find the start of the calling function. Unlike other loads in this routine, this one is not protected by a fixup, and may therefore cause a crash if the address in question is bogus. So let's fix this, by treating the fault as a failure to decode the 'bl' instruction. To avoid a label renum, reuse a fixup label that guards an instruction that cannot fault to begin with. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	4ab6827081	ARM: unwind: dump exception stack from calling frame The existing code that dumps the contents of the pt_regs structure passed to __entry routines does so while unwinding the callee frame, and dereferences the stack pointer as a struct pt_regs*. This will no longer work when we enable support for IRQ or overflow stacks, because the struct pt_regs may live on the task stack, while we are executing from another stack. The unwinder has access to this information, but only while unwinding the calling frame. So let's combine the exception stack dumping code with the handling of the calling frame as well. By printing it before dumping the caller/callee addresses, the output order is preserved. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	8cdfdf7fe4	ARM: export dump_mem() to other objects The unwind info based stack unwinder will make its own call to dump_mem() to dump the exception stack, so give it external linkage. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	b6506981f8	ARM: unwind: support unwinding across multiple stacks Implement support in the unwinder for dealing with multiple stacks. This will be needed once we add support for IRQ stacks, or for the overflow stack used by the vmap'ed stacks code. This involves tracking the unwind opcodes that either update the virtual stack pointer from another virtual register, or perform an explicit subtract on the virtual stack pointer, and updating the low and high bounds that we use to sanitize the stack pointer accordingly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	b3ab60b179	ARM: assembler: introduce bl_r macro Add a bl_r macro that abstract the difference between the ways indirect calls are performed on older and newer ARM architecture revisions. The main difference is to prefer blx instructions over explicit LR assignments when possible, as these tend to confuse the prediction logic in out-of-order cores when speculating across a function return. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	08572cd419	ARM: remove some dead code This code appears to be no longer used so let's get rid of it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:31 +01:00
Ard Biesheuvel	f05eb1d24e	ARM: stackprotector: prefer compiler for TLS based per-task protector Currently, we implement the per-task stack protector for ARM using a GCC plugin, due to lack of native compiler support. However, work is underway to get this implemented in the compiler, which means we will be able to deprecate the GCC plugin at some point. In the meantime, we will need to support both, where the native compiler implementation is obviously preferred. So let's wire this up in Kconfig and the Makefile. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M	2021-12-03 15:11:30 +01:00

1 2 3 4 5 ...

1058102 Commits