2019-06-03 08:44:50 +03:00
/* SPDX-License-Identifier: GPL-2.0-only */
2012-03-05 15:49:27 +04:00
/ *
* Low- l e v e l C P U i n i t i a l i s a t i o n
* Based o n a r c h / a r m / k e r n e l / h e a d . S
*
* Copyright ( C ) 1 9 9 4 - 2 0 0 2 R u s s e l l K i n g
* Copyright ( C ) 2 0 0 3 - 2 0 1 2 A R M L t d .
* Authors : Catalin M a r i n a s < c a t a l i n . m a r i n a s @arm.com>
* Will D e a c o n < w i l l . d e a c o n @arm.com>
* /
# include < l i n u x / l i n k a g e . h >
# include < l i n u x / i n i t . h >
2020-06-09 07:32:42 +03:00
# include < l i n u x / p g t a b l e . h >
2012-03-05 15:49:27 +04:00
arm64: simplify ptrauth initialization
Currently __cpu_setup conditionally initializes the address
authentication keys and enables them in SCTLR_EL1, doing so differently
for the primary CPU and secondary CPUs, and skipping this work for CPUs
returning from an idle state. For the latter case, cpu_do_resume
restores the keys and SCTLR_EL1 value after the MMU has been enabled.
This flow is rather difficult to follow, so instead let's move the
primary and secondary CPU initialization into their respective boot
paths. By following the example of cpu_do_resume and doing so once the
MMU is enabled, we can always initialize the keys from the values in
thread_struct, and avoid the machinery necessary to pass the keys in
secondary_data or open-coding initialization for the boot CPU.
This means we perform an additional RMW of SCTLR_EL1, but we already do
this in the cpu_do_resume path, and for other features in cpufeature.c,
so this isn't a major concern in a bringup path. Note that even while
the enable bits are clear, the key registers are accessible.
As this now renders the argument to __cpu_setup redundant, let's also
remove that entirely. Future extensions can follow a similar approach to
initialize values that differ for primary/secondary CPUs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200423101606.37601-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-23 13:16:06 +03:00
# include < a s m / a s m _ p o i n t e r _ a u t h . h >
2012-03-05 15:49:27 +04:00
# include < a s m / a s s e m b l e r . h >
2016-04-18 18:09:47 +03:00
# include < a s m / b o o t . h >
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
# include < a s m / b u g . h >
2012-03-05 15:49:27 +04:00
# include < a s m / p t r a c e . h >
# include < a s m / a s m - o f f s e t s . h >
2014-03-26 22:25:55 +04:00
# include < a s m / c a c h e . h >
2012-08-29 21:32:18 +04:00
# include < a s m / c p u t y p e . h >
2020-12-02 21:41:04 +03:00
# include < a s m / e l 2 _ s e t u p . h >
2016-01-26 11:13:44 +03:00
# include < a s m / e l f . h >
2018-11-15 08:52:46 +03:00
# include < a s m / i m a g e . h >
2015-10-19 16:19:27 +03:00
# include < a s m / k e r n e l - p g t a b l e . h >
2014-02-19 13:33:14 +04:00
# include < a s m / k v m _ a r m . h >
2012-03-05 15:49:27 +04:00
# include < a s m / m e m o r y . h >
# include < a s m / p g t a b l e - h w d e f . h >
# include < a s m / p a g e . h >
2020-04-27 19:00:16 +03:00
# include < a s m / s c s . h >
2016-02-23 13:31:42 +03:00
# include < a s m / s m p . h >
2015-10-19 16:19:35 +03:00
# include < a s m / s y s r e g . h >
# include < a s m / t h r e a d _ i n f o . h >
2012-10-26 18:40:05 +04:00
# include < a s m / v i r t . h >
2012-03-05 15:49:27 +04:00
2017-03-23 22:00:46 +03:00
# include " e f i - h e a d e r . S "
2020-08-25 16:54:40 +03:00
# if ( P A G E _ O F F S E T & 0 x1 f f f f f ) ! = 0
2014-06-24 19:51:37 +04:00
# error P A G E _ O F F S E T m u s t b e a t l e a s t 2 M B a l i g n e d
2012-03-05 15:49:27 +04:00
# endif
/ *
* Kernel s t a r t u p e n t r y p o i n t .
* - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
* The r e q u i r e m e n t s a r e :
* MMU = o f f , D - c a c h e = o f f , I - c a c h e = o n o r o f f ,
* x0 = p h y s i c a l a d d r e s s t o t h e F D T b l o b .
*
* Note t h a t t h e c a l l e e - s a v e d r e g i s t e r s a r e u s e d f o r s t o r i n g v a r i a b l e s
* that a r e u s e f u l b e f o r e t h e M M U i s e n a b l e d . T h e a l l o c a t i o n s a r e d e s c r i b e d
* in t h e e n t r y r o u t i n e s .
* /
_ _ HEAD
/ *
* DO N O T M O D I F Y . I m a g e h e a d e r e x p e c t e d b y L i n u x b o o t - l o a d e r s .
* /
2020-11-17 15:47:29 +03:00
efi_ s i g n a t u r e _ n o p / / s p e c i a l N O P t o i d e n t i t y a s P E / C O F F e x e c u t a b l e
2020-03-26 20:14:23 +03:00
b p r i m a r y _ e n t r y / / b r a n c h t o k e r n e l s t a r t , m a g i c
2020-08-25 16:54:40 +03:00
.quad 0 / / Image l o a d o f f s e t f r o m s t a r t o f R A M , l i t t l e - e n d i a n
2015-12-26 15:48:02 +03:00
le6 4 s y m _ k e r n e l _ s i z e _ l e / / E f f e c t i v e s i z e o f k e r n e l i m a g e , l i t t l e - e n d i a n
le6 4 s y m _ k e r n e l _ f l a g s _ l e / / I n f o r m a t i v e f l a g s , l i t t l e - e n d i a n
2013-08-15 03:10:00 +04:00
.quad 0 / / reserved
.quad 0 / / reserved
.quad 0 / / reserved
2018-11-15 08:52:46 +03:00
.ascii ARM64_IMAGE_MAGIC / / Magic n u m b e r
2020-11-17 15:47:29 +03:00
.long .Lpe_header_offset / / Offset t o t h e P E h e a d e r .
2014-04-16 06:47:52 +04:00
2017-03-23 22:00:46 +03:00
_ _ EFI_ P E _ H E A D E R
2012-03-05 15:49:27 +04:00
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 19:23:17 +03:00
.section " .idmap .text " , " a"
2016-03-30 18:43:07 +03:00
2016-08-31 14:05:17 +03:00
/ *
* The f o l l o w i n g c a l l e e s a v e d g e n e r a l p u r p o s e r e g i s t e r s a r e u s e d o n t h e
* primary l o w l e v e l b o o t p a t h :
*
* Register S c o p e P u r p o s e
2023-01-11 13:22:33 +03:00
* x1 9 p r i m a r y _ e n t r y ( ) . . s t a r t _ k e r n e l ( ) w h e t h e r w e e n t e r e d w i t h t h e M M U o n
2022-06-24 18:06:48 +03:00
* x2 0 p r i m a r y _ e n t r y ( ) . . _ _ p r i m a r y _ s w i t c h ( ) C P U b o o t m o d e
2020-03-26 20:14:23 +03:00
* x2 1 p r i m a r y _ e n t r y ( ) . . s t a r t _ k e r n e l ( ) F D T p o i n t e r p a s s e d a t b o o t i n x0
2022-06-24 18:06:44 +03:00
* x2 2 c r e a t e _ i d m a p ( ) . . s t a r t _ k e r n e l ( ) I D m a p V A o f t h e D T b l o b
2020-03-26 20:14:23 +03:00
* x2 3 p r i m a r y _ e n t r y ( ) . . s t a r t _ k e r n e l ( ) p h y s i c a l m i s a l i g n m e n t / K A S L R o f f s e t
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
* x2 4 _ _ p r i m a r y _ s w i t c h ( ) l i n e a r m a p K A S L R s e e d
2022-07-01 14:10:45 +03:00
* x2 5 p r i m a r y _ e n t r y ( ) . . s t a r t _ k e r n e l ( ) s u p p o r t e d V A s i z e
2022-06-24 18:06:42 +03:00
* x2 8 c r e a t e _ i d m a p ( ) c a l l e e p r e s e r v e d t e m p r e g i s t e r
2016-08-31 14:05:17 +03:00
* /
2020-03-26 20:14:23 +03:00
SYM_ C O D E _ S T A R T ( p r i m a r y _ e n t r y )
2023-01-11 13:22:33 +03:00
bl r e c o r d _ m m u _ s t a t e
2015-03-17 12:55:12 +03:00
bl p r e s e r v e _ b o o t _ a r g s
2023-01-11 13:22:34 +03:00
bl c r e a t e _ i d m a p
2023-01-11 13:22:35 +03:00
/ *
* If w e e n t e r e d w i t h t h e M M U a n d c a c h e s o n , c l e a n t h e I D m a p p e d p a r t
* of t h e p r i m a r y b o o t c o d e t o t h e P o C s o w e c a n s a f e l y e x e c u t e i t w i t h
* the M M U o f f .
* /
cbz x19 , 0 f
adrp x0 , _ _ i d m a p _ t e x t _ s t a r t
adr_ l x1 , _ _ i d m a p _ t e x t _ e n d
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 19:23:17 +03:00
adr_ l x2 , d c a c h e _ c l e a n _ p o c
blr x2
2023-01-11 13:22:35 +03:00
0 : mov x0 , x19
2020-11-13 15:49:23 +03:00
bl i n i t _ k e r n e l _ e l / / w0 =cpu_boot_mode
2022-06-24 18:06:48 +03:00
mov x20 , x0
2022-06-24 18:06:37 +03:00
2012-03-05 15:49:27 +04:00
/ *
2015-03-18 17:55:20 +03:00
* The f o l l o w i n g c a l l s C P U s e t u p c o d e , s e e a r c h / a r m 6 4 / m m / p r o c . S f o r
* details.
2012-03-05 15:49:27 +04:00
* On r e t u r n , t h e C P U w i l l b e r e a d y f o r t h e M M U t o b e t u r n e d o n a n d
* the T C R w i l l h a v e b e e n s e t .
* /
2022-07-01 14:10:45 +03:00
# if V A _ B I T S > 4 8
mrs_ s x0 , S Y S _ I D _ A A 6 4 M M F R 2 _ E L 1
2022-09-06 01:54:08 +03:00
tst x0 , #0xf < < I D _ A A 6 4 M M F R 2 _ E L 1 _ V A R a n g e _ S H I F T
2022-07-01 14:10:45 +03:00
mov x0 , #V A _ B I T S
mov x25 , #V A _ B I T S _ M I N
csel x25 , x25 , x0 , e q
mov x0 , x25
# endif
2016-04-18 18:09:43 +03:00
bl _ _ c p u _ s e t u p / / i n i t i a l i s e p r o c e s s o r
2016-08-31 14:05:13 +03:00
b _ _ p r i m a r y _ s w i t c h
2020-03-26 20:14:23 +03:00
SYM_ C O D E _ E N D ( p r i m a r y _ e n t r y )
2012-03-05 15:49:27 +04:00
2023-01-11 13:22:35 +03:00
_ _ INIT
2023-01-11 13:22:33 +03:00
SYM_ C O D E _ S T A R T _ L O C A L ( r e c o r d _ m m u _ s t a t e )
mrs x19 , C u r r e n t E L
cmp x19 , #C u r r e n t E L _ E L 2
mrs x19 , s c t l r _ e l 1
b. n e 0 f
mrs x19 , s c t l r _ e l 2
2023-01-25 21:59:10 +03:00
0 :
CPU_ L E ( t b n z x19 , #S C T L R _ E L x _ E E _ S H I F T , 1 f )
CPU_ B E ( t b z x19 , #S C T L R _ E L x _ E E _ S H I F T , 1 f )
tst x19 , #S C T L R _ E L x _ C / / Z : = ( C = = 0 )
2023-01-11 13:22:33 +03:00
and x19 , x19 , #S C T L R _ E L x _ M / / i s o l a t e M b i t
csel x19 , x z r , x19 , e q / / c l e a r x19 i f Z
ret
2023-01-25 21:59:10 +03:00
/ *
* Set t h e c o r r e c t e n d i a n n e s s e a r l y s o a l l m e m o r y a c c e s s e s i s s u e d
* before i n i t _ k e r n e l _ e l ( ) o c c u r i n t h e c o r r e c t b y t e o r d e r . N o t e t h a t
* this m e a n s t h e M M U m u s t b e d i s a b l e d , o r t h e a c t i v e I D m a p w i l l e n d
* up g e t t i n g i n t e r p r e t e d w i t h t h e w r o n g b y t e o r d e r .
* /
1 : eor x19 , x19 , #S C T L R _ E L x _ E E
bic x19 , x19 , #S C T L R _ E L x _ M
b. n e 2 f
pre_ d i s a b l e _ m m u _ w o r k a r o u n d
msr s c t l r _ e l 2 , x19
b 3 f
pre_ d i s a b l e _ m m u _ w o r k a r o u n d
2 : msr s c t l r _ e l 1 , x19
3 : isb
mov x19 , x z r
ret
2023-01-11 13:22:33 +03:00
SYM_ C O D E _ E N D ( r e c o r d _ m m u _ s t a t e )
2015-03-17 12:55:12 +03:00
/ *
* Preserve t h e a r g u m e n t s p a s s e d b y t h e b o o t l o a d e r i n x0 . . x3
* /
2020-02-18 22:58:34 +03:00
SYM_ C O D E _ S T A R T _ L O C A L ( p r e s e r v e _ b o o t _ a r g s )
2015-03-17 12:55:12 +03:00
mov x21 , x0 / / x21 =FDT
adr_ l x0 , b o o t _ a r g s / / r e c o r d t h e c o n t e n t s o f
stp x21 , x1 , [ x0 ] / / x0 . . x3 a t k e r n e l e n t r y
stp x2 , x3 , [ x0 , #16 ]
2023-01-11 13:22:33 +03:00
cbnz x19 , 0 f / / s k i p c a c h e i n v a l i d a t i o n i f M M U i s o n
2015-03-17 12:55:12 +03:00
dmb s y / / n e e d e d b e f o r e d c i v a c w i t h
/ / MMU o f f
2021-05-24 11:29:53 +03:00
add x1 , x0 , #0x20 / / 4 x 8 b y t e s
arm64: Rename arm64-internal cache maintenance functions
Although naming across the codebase isn't that consistent, it
tends to follow certain patterns. Moreover, the term "flush"
isn't defined in the Arm Architecture reference manual, and might
be interpreted to mean clean, invalidate, or both for a cache.
Rename arm64-internal functions to make the naming internally
consistent, as well as making it consistent with the Arm ARM, by
specifying whether it applies to the instruction, data, or both
caches, whether the operation is a clean, invalidate, or both.
Also specify which point the operation applies to, i.e., to the
point of unification (PoU), coherency (PoC), or persistence
(PoP).
This commit applies the following sed transformation to all files
under arch/arm64:
"s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\
"s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\
"s/\binvalidate_icache_range\b/icache_inval_pou/g;"\
"s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\
"s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\
"s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\
"s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\
"s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\
"s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\
"s/\b__flush_icache_all\b/icache_inval_all_pou/g;"
Note that __clean_dcache_area_poc is deliberately missing a word
boundary check at the beginning in order to match the efistub
symbols in image-vars.h.
Also note that, despite its name, __flush_icache_range operates
on both instruction and data caches. The name change here
reflects that.
No functional change intended.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-24 11:30:01 +03:00
b d c a c h e _ i n v a l _ p o c / / t a i l c a l l
2023-01-11 13:22:33 +03:00
0 : str_ l x19 , m m u _ e n a b l e d _ a t _ b o o t , x0
ret
2020-02-18 22:58:34 +03:00
SYM_ C O D E _ E N D ( p r e s e r v e _ b o o t _ a r g s )
2015-03-17 12:55:12 +03:00
2022-06-24 18:06:37 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( c l e a r _ p a g e _ t a b l e s )
/ *
* Clear t h e i n i t p a g e t a b l e s .
* /
adrp x0 , i n i t _ p g _ d i r
adrp x1 , i n i t _ p g _ e n d
2022-06-24 18:06:47 +03:00
sub x2 , x1 , x0
mov x1 , x z r
b _ _ p i _ m e m s e t / / t a i l c a l l
2022-06-24 18:06:37 +03:00
SYM_ F U N C _ E N D ( c l e a r _ p a g e _ t a b l e s )
2014-11-22 00:50:41 +03:00
/ *
2018-01-11 13:11:59 +03:00
* Macro t o p o p u l a t e p a g e t a b l e e n t r i e s , t h e s e e n t r i e s c a n b e p o i n t e r s t o t h e n e x t l e v e l
* or l a s t l e v e l e n t r i e s p o i n t i n g t o p h y s i c a l m e m o r y .
2014-11-22 00:50:41 +03:00
*
2018-01-11 13:11:59 +03:00
* tbl : page t a b l e a d d r e s s
* rtbl : pointer t o p a g e t a b l e o r p h y s i c a l m e m o r y
* index : start i n d e x t o w r i t e
* eindex : end i n d e x t o w r i t e - [ i n d e x , e i n d e x ] w r i t t e n t o
* flags : flags f o r p a g e t a b l e e n t r y t o o r i n
* inc : increment t o r t b l b e t w e e n e a c h e n t r y
* tmp1 : temporary v a r i a b l e
*
* Preserves : tbl, e i n d e x , f l a g s , i n c
* Corrupts : index, t m p1
* Returns : rtbl
2014-11-22 00:50:41 +03:00
* /
2018-01-11 13:11:59 +03:00
.macro populate_ e n t r i e s , t b l , r t b l , i n d e x , e i n d e x , f l a g s , i n c , t m p1
2018-01-29 14:59:59 +03:00
.Lpe \ @: phys_to_pte \tmp1, \rtbl
2018-01-11 13:11:59 +03:00
orr \ t m p1 , \ t m p1 , \ f l a g s / / t m p1 = t a b l e e n t r y
str \ t m p1 , [ \ t b l , \ i n d e x , l s l #3 ]
add \ r t b l , \ r t b l , \ i n c / / r t b l = p a n e x t l e v e l
add \ i n d e x , \ i n d e x , #1
cmp \ i n d e x , \ e i n d e x
b. l s . L p e \ @
.endm
/ *
* Compute i n d i c e s o f t a b l e e n t r i e s f r o m v i r t u a l a d d r e s s r a n g e . I f m u l t i p l e e n t r i e s
* were n e e d e d i n t h e p r e v i o u s p a g e t a b l e l e v e l t h e n t h e n e x t p a g e t a b l e l e v e l i s a s s u m e d
* to b e c o m p o s e d o f m u l t i p l e p a g e s . ( T h i s e f f e c t i v e l y s c a l e s t h e e n d i n d e x ) .
*
* vstart : virtual a d d r e s s o f s t a r t o f r a n g e
arm64: head: avoid over-mapping in map_memory
The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.
We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:
* In `compute_indices`, the computed `iend` will be in the page/block *after*
the final byte of the intended mapping.
* In `populate_entries`, an unnecessary entry will be created at the end
of each level of table. At the leaf level, this entry will map up to
SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
to map.
As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.
The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.
Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on. For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.
Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23 13:12:53 +03:00
* vend : virtual a d d r e s s o f e n d o f r a n g e - w e m a p [ v s t a r t , v e n d ]
2018-01-11 13:11:59 +03:00
* shift : shift u s e d t o t r a n s f o r m v i r t u a l a d d r e s s i n t o i n d e x
2022-06-24 18:06:35 +03:00
* order : # imm 2 l o g ( n u m b e r o f e n t r i e s i n p a g e t a b l e )
2018-01-11 13:11:59 +03:00
* istart : index i n t a b l e c o r r e s p o n d i n g t o v s t a r t
* iend : index i n t a b l e c o r r e s p o n d i n g t o v e n d
* count : On e n t r y : h o w m a n y e x t r a e n t r i e s w e r e r e q u i r e d i n p r e v i o u s l e v e l , s c a l e s
* our e n d i n d e x .
* On e x i t : r e t u r n s h o w m a n y e x t r a e n t r i e s r e q u i r e d f o r n e x t p a g e t a b l e l e v e l
*
2022-06-24 18:06:35 +03:00
* Preserves : vstart, v e n d
2018-01-11 13:11:59 +03:00
* Returns : istart, i e n d , c o u n t
* /
2022-06-24 18:06:35 +03:00
.macro compute_ i n d i c e s , v s t a r t , v e n d , s h i f t , o r d e r , i s t a r t , i e n d , c o u n t
ubfx \ i s t a r t , \ v s t a r t , \ s h i f t , \ o r d e r
ubfx \ i e n d , \ v e n d , \ s h i f t , \ o r d e r
add \ i e n d , \ i e n d , \ c o u n t , l s l \ o r d e r
2018-01-11 13:11:59 +03:00
sub \ c o u n t , \ i e n d , \ i s t a r t
2014-11-22 00:50:41 +03:00
.endm
/ *
2018-01-11 13:11:59 +03:00
* Map m e m o r y f o r s p e c i f i e d v i r t u a l a d d r e s s r a n g e . E a c h l e v e l o f p a g e t a b l e n e e d e d s u p p o r t s
* multiple e n t r i e s . I f a l e v e l r e q u i r e s n e n t r i e s t h e n e x t p a g e t a b l e l e v e l i s a s s u m e d t o b e
* formed f r o m n p a g e s .
*
* tbl : location o f p a g e t a b l e
* rtbl : address t o b e u s e d f o r f i r s t l e v e l p a g e t a b l e e n t r y ( t y p i c a l l y t b l + P A G E _ S I Z E )
arm64: head: avoid over-mapping in map_memory
The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.
We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:
* In `compute_indices`, the computed `iend` will be in the page/block *after*
the final byte of the intended mapping.
* In `populate_entries`, an unnecessary entry will be created at the end
of each level of table. At the leaf level, this entry will map up to
SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
to map.
As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.
The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.
Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on. For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.
Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23 13:12:53 +03:00
* vstart : virtual a d d r e s s o f s t a r t o f r a n g e
* vend : virtual a d d r e s s o f e n d o f r a n g e - w e m a p [ v s t a r t , v e n d - 1 ]
2018-01-11 13:11:59 +03:00
* flags : flags t o u s e t o m a p l a s t l e v e l e n t r i e s
* phys : physical a d d r e s s c o r r e s p o n d i n g t o v s t a r t - p h y s i c a l m e m o r y i s c o n t i g u o u s
2022-06-24 18:06:35 +03:00
* order : # imm 2 l o g ( n u m b e r o f e n t r i e s i n P G D t a b l e )
2014-11-22 00:50:41 +03:00
*
2022-06-24 18:06:36 +03:00
* If e x t r a _ s h i f t i s s e t , a n e x t r a l e v e l w i l l b e p o p u l a t e d i f t h e e n d a d d r e s s d o e s
* not f i t i n ' e x t r a _ s h i f t ' b i t s . T h i s a s s u m e s v e n d i s i n t h e T T B R 0 r a n g e .
*
2018-01-11 13:11:59 +03:00
* Temporaries : istart, i e n d , t m p , c o u n t , s v - t h e s e n e e d t o b e d i f f e r e n t r e g i s t e r s
arm64: head: avoid over-mapping in map_memory
The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.
We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:
* In `compute_indices`, the computed `iend` will be in the page/block *after*
the final byte of the intended mapping.
* In `populate_entries`, an unnecessary entry will be created at the end
of each level of table. At the leaf level, this entry will map up to
SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
to map.
As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.
The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.
Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on. For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.
Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23 13:12:53 +03:00
* Preserves : vstart, f l a g s
* Corrupts : tbl, r t b l , v e n d , i s t a r t , i e n d , t m p , c o u n t , s v
2014-11-22 00:50:41 +03:00
* /
2022-06-24 18:06:36 +03:00
.macro map_ m e m o r y , t b l , r t b l , v s t a r t , v e n d , f l a g s , p h y s , o r d e r , i s t a r t , i e n d , t m p , c o u n t , s v , e x t r a _ s h i f t
arm64: head: avoid over-mapping in map_memory
The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.
We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:
* In `compute_indices`, the computed `iend` will be in the page/block *after*
the final byte of the intended mapping.
* In `populate_entries`, an unnecessary entry will be created at the end
of each level of table. At the leaf level, this entry will map up to
SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
to map.
As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.
The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.
Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on. For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.
Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23 13:12:53 +03:00
sub \ v e n d , \ v e n d , #1
2018-01-11 13:11:59 +03:00
add \ r t b l , \ t b l , #P A G E _ S I Z E
mov \ c o u n t , #0
2022-06-24 18:06:35 +03:00
2022-06-24 18:06:36 +03:00
.ifnb \ extra_ s h i f t
tst \ v e n d , #~ ( ( 1 < < ( \ e x t r a _ s h i f t ) ) - 1 )
b. e q . L _ \ @
compute_ i n d i c e s \ v s t a r t , \ v e n d , #\ e x t r a _ s h i f t , # ( P A G E _ S H I F T - 3 ) , \ i s t a r t , \ i e n d , \ c o u n t
mov \ s v , \ r t b l
populate_ e n t r i e s \ t b l , \ r t b l , \ i s t a r t , \ i e n d , #P M D _ T Y P E _ T A B L E , # P A G E _ S I Z E , \ t m p
mov \ t b l , \ s v
.endif
.L_ \ @:
2022-06-24 18:06:35 +03:00
compute_ i n d i c e s \ v s t a r t , \ v e n d , #P G D I R _ S H I F T , # \ o r d e r , \ i s t a r t , \ i e n d , \ c o u n t
mov \ s v , \ r t b l
2018-01-11 13:11:59 +03:00
populate_ e n t r i e s \ t b l , \ r t b l , \ i s t a r t , \ i e n d , #P M D _ T Y P E _ T A B L E , # P A G E _ S I Z E , \ t m p
mov \ t b l , \ s v
# if S W A P P E R _ P G T A B L E _ L E V E L S > 3
2022-06-24 18:06:35 +03:00
compute_ i n d i c e s \ v s t a r t , \ v e n d , #P U D _ S H I F T , # ( P A G E _ S H I F T - 3 ) , \ i s t a r t , \ i e n d , \ c o u n t
mov \ s v , \ r t b l
2018-01-11 13:11:59 +03:00
populate_ e n t r i e s \ t b l , \ r t b l , \ i s t a r t , \ i e n d , #P M D _ T Y P E _ T A B L E , # P A G E _ S I Z E , \ t m p
mov \ t b l , \ s v
# endif
# if S W A P P E R _ P G T A B L E _ L E V E L S > 2
2022-06-24 18:06:35 +03:00
compute_ i n d i c e s \ v s t a r t , \ v e n d , #S W A P P E R _ T A B L E _ S H I F T , # ( P A G E _ S H I F T - 3 ) , \ i s t a r t , \ i e n d , \ c o u n t
mov \ s v , \ r t b l
2018-01-11 13:11:59 +03:00
populate_ e n t r i e s \ t b l , \ r t b l , \ i s t a r t , \ i e n d , #P M D _ T Y P E _ T A B L E , # P A G E _ S I Z E , \ t m p
mov \ t b l , \ s v
# endif
2022-06-24 18:06:35 +03:00
compute_ i n d i c e s \ v s t a r t , \ v e n d , #S W A P P E R _ B L O C K _ S H I F T , # ( P A G E _ S H I F T - 3 ) , \ i s t a r t , \ i e n d , \ c o u n t
bic \ r t b l , \ p h y s , #S W A P P E R _ B L O C K _ S I Z E - 1
populate_ e n t r i e s \ t b l , \ r t b l , \ i s t a r t , \ i e n d , \ f l a g s , #S W A P P E R _ B L O C K _ S I Z E , \ t m p
2014-11-22 00:50:41 +03:00
.endm
2022-06-24 18:06:41 +03:00
/ *
* Remap a s u b r e g i o n c r e a t e d w i t h t h e m a p _ m e m o r y m a c r o w i t h m o d i f i e d a t t r i b u t e s
* or o u t p u t a d d r e s s . T h e e n t i r e r e m a p p e d r e g i o n m u s t h a v e b e e n c o v e r e d i n t h e
* invocation o f m a p _ m e m o r y .
*
* x0 : last l e v e l t a b l e a d d r e s s ( r e t u r n e d i n f i r s t a r g u m e n t t o m a p _ m e m o r y )
* x1 : start V A o f t h e e x i s t i n g m a p p i n g
* x2 : start V A o f t h e r e g i o n t o u p d a t e
* x3 : end V A o f t h e r e g i o n t o u p d a t e ( e x c l u s i v e )
* x4 : start P A a s s o c i a t e d w i t h t h e r e g i o n t o u p d a t e
* x5 : attributes t o s e t o n t h e u p d a t e d r e g i o n
* x6 : order o f t h e l a s t l e v e l m a p p i n g s
* /
SYM_ F U N C _ S T A R T _ L O C A L ( r e m a p _ r e g i o n )
sub x3 , x3 , #1 / / m a k e e n d i n c l u s i v e
/ / Get t h e i n d e x o f f s e t f o r t h e s t a r t o f t h e l a s t l e v e l t a b l e
lsr x1 , x1 , x6
bfi x1 , x z r , #0 , #P A G E _ S H I F T - 3
/ / Derive t h e s t a r t a n d e n d i n d e x e s i n t o t h e l a s t l e v e l t a b l e
/ / associated w i t h t h e p r o v i d e d r e g i o n
lsr x2 , x2 , x6
lsr x3 , x3 , x6
sub x2 , x2 , x1
sub x3 , x3 , x1
mov x1 , #1
lsl x6 , x1 , x6 / / b l o c k s i z e a t t h i s l e v e l
populate_ e n t r i e s x0 , x4 , x2 , x3 , x5 , x6 , x7
ret
SYM_ F U N C _ E N D ( r e m a p _ r e g i o n )
2014-11-22 00:50:41 +03:00
2022-06-24 18:06:37 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( c r e a t e _ i d m a p )
2022-06-24 18:06:42 +03:00
mov x28 , l r
arm64: mm: increase VA range of identity map
The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.
This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-19 19:42:27 +03:00
/ *
2022-06-24 18:06:36 +03:00
* The I D m a p c a r r i e s a 1 : 1 m a p p i n g o f t h e p h y s i c a l a d d r e s s r a n g e
* covered b y t h e l o a d e d i m a g e , w h i c h c o u l d b e a n y w h e r e i n D R A M . T h i s
* means t h a t t h e r e q u i r e d s i z e o f t h e V A ( = = P A ) s p a c e i s d e c i d e d a t
* boot t i m e , a n d c o u l d b e m o r e t h a n t h e c o n f i g u r e d s i z e o f t h e V A
* space f o r o r d i n a r y k e r n e l a n d u s e r s p a c e m a p p i n g s .
*
* There a r e t h r e e c a s e s t o c o n s i d e r h e r e :
* - 3 9 < = VA_ B I T S < 4 8 , a n d t h e I D m a p n e e d s u p t o 4 8 V A b i t s t o c o v e r
* the p l a c e m e n t o f t h e i m a g e . I n t h i s c a s e , w e c o n f i g u r e o n e e x t r a
* level o f t r a n s l a t i o n o n t h e f l y f o r t h e I D m a p o n l y . ( T h i s c a s e
* also c o v e r s 4 2 - b i t V A / 5 2 - b i t P A o n 6 4 k p a g e s ) .
*
* - VA_ B I T S = = 4 8 , a n d t h e I D m a p n e e d s m o r e t h a n 4 8 V A b i t s . T h i s c a n
* only h a p p e n w h e n u s i n g 6 4 k p a g e s , i n w h i c h c a s e w e n e e d t o e x t e n d
* the r o o t l e v e l t a b l e r a t h e r t h a n a d d a l e v e l . N o t e t h a t w e c a n
* treat t h i s c a s e a s ' a l w a y s e x t e n d e d ' a s l o n g a s w e t a k e c a r e n o t
* to p r o g r a m a n u n s u p p o r t e d T 0 S Z v a l u e i n t o t h e T C R r e g i s t e r .
*
* - Combinations t h a t w o u l d r e q u i r e t w o a d d i t i o n a l l e v e l s o f
* translation a r e n o t s u p p o r t e d , e . g . , V A _ B I T S = =36 o n 1 6 k p a g e s , o r
* VA_ B I T S = =39 / 4 k p a g e s w i t h 5 - l e v e l p a g i n g , w h e r e t h e i n p u t a d d r e s s
* requires m o r e t h a n 4 7 o r 4 8 b i t s , r e s p e c t i v e l y .
arm64: mm: increase VA range of identity map
The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.
This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-19 19:42:27 +03:00
* /
arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-13 20:07:24 +03:00
# if ( V A _ B I T S < 4 8 )
2022-06-24 18:06:35 +03:00
# define I D M A P _ P G D _ O R D E R ( V A _ B I T S - P G D I R _ S H I F T )
arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-13 20:07:24 +03:00
# define E X T R A _ S H I F T ( P G D I R _ S H I F T + P A G E _ S H I F T - 3 )
/ *
* If V A _ B I T S < 4 8 , w e h a v e t o c o n f i g u r e a n a d d i t i o n a l t a b l e l e v e l .
* First, w e h a v e t o v e r i f y o u r a s s u m p t i o n t h a t t h e c u r r e n t v a l u e o f
* VA_ B I T S w a s c h o s e n s u c h t h a t a l l t r a n s l a t i o n l e v e l s a r e f u l l y
* utilised, a n d t h a t l o w e r i n g T 0 S Z w i l l a l w a y s r e s u l t i n a n a d d i t i o n a l
* translation l e v e l t o b e c o n f i g u r e d .
* /
# if V A _ B I T S ! = E X T R A _ S H I F T
# error " M i s m a t c h b e t w e e n V A _ B I T S a n d p a g e s i z e / n u m b e r o f t r a n s l a t i o n l e v e l s "
arm64: mm: increase VA range of identity map
The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.
This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-19 19:42:27 +03:00
# endif
arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-13 20:07:24 +03:00
# else
2022-06-24 18:06:35 +03:00
# define I D M A P _ P G D _ O R D E R ( P H Y S _ M A S K _ S H I F T - P G D I R _ S H I F T )
2022-06-24 18:06:36 +03:00
# define E X T R A _ S H I F T
arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-13 20:07:24 +03:00
/ *
* If V A _ B I T S = = 4 8 , w e d o n ' t h a v e t o c o n f i g u r e a n a d d i t i o n a l
* translation l e v e l , b u t t h e t o p - l e v e l t a b l e h a s m o r e e n t r i e s .
* /
# endif
2022-06-24 18:06:42 +03:00
adrp x0 , i n i t _ i d m a p _ p g _ d i r
adrp x3 , _ t e x t
2022-06-24 18:06:44 +03:00
adrp x6 , _ e n d + M A X _ F D T _ S I Z E + S W A P P E R _ B L O C K _ S I Z E
2022-06-24 18:06:42 +03:00
mov x7 , S W A P P E R _ R X _ M M U F L A G S
2018-01-11 13:11:59 +03:00
2022-06-24 18:06:36 +03:00
map_ m e m o r y x0 , x1 , x3 , x6 , x7 , x3 , I D M A P _ P G D _ O R D E R , x10 , x11 , x12 , x13 , x14 , E X T R A _ S H I F T
2014-11-22 00:50:41 +03:00
2022-06-24 18:06:42 +03:00
/* Remap the kernel page tables r/w in the ID map */
adrp x1 , _ t e x t
adrp x2 , i n i t _ p g _ d i r
adrp x3 , i n i t _ p g _ e n d
bic x4 , x2 , #S W A P P E R _ B L O C K _ S I Z E - 1
mov x5 , S W A P P E R _ R W _ M M U F L A G S
mov x6 , #S W A P P E R _ B L O C K _ S H I F T
bl r e m a p _ r e g i o n
2022-06-24 18:06:44 +03:00
/* Remap the FDT after the kernel image */
adrp x1 , _ t e x t
adrp x22 , _ e n d + S W A P P E R _ B L O C K _ S I Z E
bic x2 , x22 , #S W A P P E R _ B L O C K _ S I Z E - 1
bfi x22 , x21 , #0 , #S W A P P E R _ B L O C K _ S H I F T / / r e m a p p e d F D T a d d r e s s
add x3 , x2 , #M A X _ F D T _ S I Z E + S W A P P E R _ B L O C K _ S I Z E
bic x4 , x21 , #S W A P P E R _ B L O C K _ S I Z E - 1
mov x5 , S W A P P E R _ R W _ M M U F L A G S
mov x6 , #S W A P P E R _ B L O C K _ S H I F T
bl r e m a p _ r e g i o n
2014-11-22 00:50:41 +03:00
/ *
2022-06-24 18:06:37 +03:00
* Since t h e p a g e t a b l e s h a v e b e e n p o p u l a t e d w i t h n o n - c a c h e a b l e
* accesses ( M M U d i s a b l e d ) , i n v a l i d a t e t h o s e t a b l e s a g a i n t o
* remove a n y s p e c u l a t i v e l y l o a d e d c a c h e l i n e s .
2014-11-22 00:50:41 +03:00
* /
2023-01-11 13:22:34 +03:00
cbnz x19 , 0 f / / s k i p c a c h e i n v a l i d a t i o n i f M M U i s o n
2022-06-24 18:06:37 +03:00
dmb s y
2022-06-24 18:06:42 +03:00
adrp x0 , i n i t _ i d m a p _ p g _ d i r
adrp x1 , i n i t _ i d m a p _ p g _ e n d
bl d c a c h e _ i n v a l _ p o c
2023-01-11 13:22:34 +03:00
0 : ret x28
2022-06-24 18:06:37 +03:00
SYM_ F U N C _ E N D ( c r e a t e _ i d m a p )
SYM_ F U N C _ S T A R T _ L O C A L ( c r e a t e _ k e r n e l _ m a p p i n g )
arm64/mm: Separate boot-time page tables from swapper_pg_dir
Since the address of swapper_pg_dir is fixed for a given kernel image,
it is an attractive target for manipulation via an arbitrary write. To
mitigate this we'd like to make it read-only by moving it into the
rodata section.
We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
and reserved_ttbr0, so these will also need to move into rodata.
However, swapper_pg_dir is allocated along with some transient page
tables used for boot which we do not want to move into rodata.
As a step towards this, this patch separates the boot-time page tables
into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
needs to be. This allows us to retain the relationship between
swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
separating these from the boot-time page tables.
The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
boot, and all of these levels will be freed when we switch to the
swapper_pg_dir, which is initialized by the existing code in
paging_init(). Since we start off on the init_pg_dir, we no longer need
to allocate a transient page table in paging_init() in order to ensure
that swapper_pg_dir isn't live while we initialize it.
There should be no functional change as a result of this patch.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: place init_pg_dir after BSS, fold mm changes, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-24 17:47:49 +03:00
adrp x0 , i n i t _ p g _ d i r
2020-08-25 16:54:40 +03:00
mov_ q x5 , K I M A G E _ V A D D R / / c o m p i l e t i m e _ _ v a ( _ t e x t )
2022-08-27 10:09:04 +03:00
# ifdef C O N F I G _ R E L O C A T A B L E
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 16:12:01 +03:00
add x5 , x5 , x23 / / a d d K A S L R d i s p l a c e m e n t
2022-08-27 10:09:04 +03:00
# endif
arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it
For historical reasons, the kernel Image must be loaded into physical
memory at a 512 KB offset above a 2 MB aligned base address. The region
between the base address and the start of the kernel Image has no
significance to the kernel itself, but it is currently mapped explicitly
into the early kernel VMA range for all translation granules.
In some cases (i.e., 4 KB granule), this is unavoidable, due to the 2 MB
granularity of the early kernel mappings. However, in other cases, e.g.,
when running with larger page sizes, or in the future, with more granular
KASLR, there is no reason to map it explicitly like we do currently.
So update the logic so that the region is mapped only if that happens as
a side effect of rounding the start address of the kernel to swapper block
size, and leave it unmapped otherwise.
Since the symbol kernel_img_size now simply resolves to the memory
footprint of the kernel Image, we can drop its definition from image.h
and opencode its calculation.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-18 18:09:46 +03:00
adrp x6 , _ e n d / / r u n t i m e _ _ p a ( _ e n d )
adrp x3 , _ t e x t / / r u n t i m e _ _ p a ( _ t e x t )
sub x6 , x6 , x3 / / _ e n d - _ t e x t
add x6 , x6 , x5 / / r u n t i m e _ _ v a ( _ e n d )
2022-06-24 18:06:42 +03:00
mov x7 , S W A P P E R _ R W _ M M U F L A G S
2018-01-11 13:11:59 +03:00
2022-06-24 18:06:35 +03:00
map_ m e m o r y x0 , x1 , x5 , x6 , x7 , x3 , ( V A _ B I T S - P G D I R _ S H I F T ) , x10 , x11 , x12 , x13 , x14
2014-11-22 00:50:41 +03:00
2022-06-24 18:06:47 +03:00
dsb i s h s t / / s y n c w i t h p a g e t a b l e w a l k e r
ret
2022-06-24 18:06:37 +03:00
SYM_ F U N C _ E N D ( c r e a t e _ k e r n e l _ m a p p i n g )
2014-11-22 00:50:41 +03:00
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
/ *
2021-05-20 14:50:30 +03:00
* Initialize C P U r e g i s t e r s w i t h t a s k - s p e c i f i c a n d c p u - s p e c i f i c c o n t e x t .
*
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
* Create a f i n a l f r a m e r e c o r d a t t a s k _ p t _ r e g s ( c u r r e n t ) - > s t a c k f r a m e , s o
* that t h e u n w i n d e r c a n i d e n t i f y t h e f i n a l f r a m e r e c o r d o f a n y t a s k b y
* its l o c a t i o n i n t h e t a s k s t a c k . W e r e s e r v e t h e e n t i r e p t _ r e g s s p a c e
* for c o n s i s t e n c y w i t h u s e r t a s k s a n d k t h r e a d s .
* /
2021-05-20 14:50:31 +03:00
.macro init_cpu_task tsk, t m p1 , t m p2
2021-05-20 14:50:30 +03:00
msr s p _ e l 0 , \ t s k
2021-05-20 14:50:31 +03:00
ldr \ t m p1 , [ \ t s k , #T S K _ S T A C K ]
add s p , \ t m p1 , #T H R E A D _ S I Z E
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
sub s p , s p , #P T _ R E G S _ S I Z E
2021-05-20 14:50:30 +03:00
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
stp x z r , x z r , [ s p , #S _ S T A C K F R A M E ]
add x29 , s p , #S _ S T A C K F R A M E
2021-05-20 14:50:30 +03:00
2023-01-09 20:47:59 +03:00
scs_ l o a d _ c u r r e n t
2021-05-20 14:50:31 +03:00
adr_ l \ t m p1 , _ _ p e r _ c p u _ o f f s e t
2021-09-14 15:10:33 +03:00
ldr w \ t m p2 , [ \ t s k , #T S K _ T I _ C P U ]
2021-05-20 14:50:31 +03:00
ldr \ t m p1 , [ \ t m p1 , \ t m p2 , l s l #3 ]
set_ t h i s _ c p u _ o f f s e t \ t m p1
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
.endm
2014-11-22 00:50:41 +03:00
/ *
2015-03-04 13:51:48 +03:00
* The f o l l o w i n g f r a g m e n t o f c o d e i s e x e c u t e d w i t h t h e M M U e n a b l e d .
2016-08-31 14:05:15 +03:00
*
2022-06-29 07:12:07 +03:00
* x0 = _ _ p a ( K E R N E L _ S T A R T )
2014-11-22 00:50:41 +03:00
* /
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ p r i m a r y _ s w i t c h e d )
2021-05-20 14:50:30 +03:00
adr_ l x4 , i n i t _ t a s k
2021-05-20 14:50:31 +03:00
init_ c p u _ t a s k x4 , x5 , x6
2016-08-31 14:05:16 +03:00
2015-12-26 14:46:40 +03:00
adr_ l x8 , v e c t o r s / / l o a d V B A R _ E L 1 w i t h v i r t u a l
msr v b a r _ e l 1 , x8 / / v e c t o r t a b l e a d d r e s s
isb
2021-05-20 14:50:30 +03:00
stp x29 , x30 , [ s p , #- 16 ] !
2016-08-31 14:05:16 +03:00
mov x29 , s p
2016-08-31 14:05:15 +03:00
str_ l x21 , _ _ f d t _ p o i n t e r , x5 / / S a v e F D T p o i n t e r
ldr_ l x4 , k i m a g e _ v a d d r / / S a v e t h e o f f s e t b e t w e e n
sub x4 , x4 , x0 / / t h e k e r n e l v i r t u a l a n d
str_ l x4 , k i m a g e _ v o f f s e t , x5 / / p h y s i c a l m a p p i n g s
2022-06-24 18:06:48 +03:00
mov x0 , x20
bl s e t _ c p u _ b o o t _ m o d e _ f l a g
2016-01-06 14:05:27 +03:00
/ / Clear B S S
adr_ l x0 , _ _ b s s _ s t a r t
mov x1 , x z r
adr_ l x2 , _ _ b s s _ s t o p
sub x2 , x2 , x0
bl _ _ p i _ m e m s e t
arm64: mm: place empty_zero_page in bss
Currently the zero page is set up in paging_init, and thus we cannot use
the zero page earlier. We use the zero page as a reserved TTBR value
from which no TLB entries may be allocated (e.g. when uninstalling the
idmap). To enable such usage earlier (as may be required for invasive
changes to the kernel page tables), and to minimise the time that the
idmap is active, we need to be able to use the zero page before
paging_init.
This patch follows the example set by x86, by allocating the zero page
at compile time, in .bss. This means that the zero page itself is
available immediately upon entry to start_kernel (as we zero .bss before
this), and also means that the zero page takes up no space in the raw
Image binary. The associated struct page is allocated in bootmem_init,
and remains unavailable until this time.
Outside of arch code, the only users of empty_zero_page assume that the
empty_zero_page symbol refers to the zeroed memory itself, and that
ZERO_PAGE(x) must be used to acquire the associated struct page,
following the example of x86. This patch also brings arm64 inline with
these assumptions.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-25 14:44:57 +03:00
dsb i s h s t / / M a k e z e r o p a g e v i s i b l e t o P T W
2016-01-06 14:05:27 +03:00
2022-07-01 14:10:45 +03:00
# if V A _ B I T S > 4 8
adr_ l x8 , v a b i t s _ a c t u a l / / S e t t h i s e a r l y s o K A S A N e a r l y i n i t
str x25 , [ x8 ] / / . . . o b s e r v e s t h e c o r r e c t v a l u e
dc c i v a c , x8 / / M a k e v i s i b l e t o b o o t i n g s e c o n d a r i e s
# endif
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
# ifdef C O N F I G _ R A N D O M I Z E _ B A S E
adrp x5 , m e m s t a r t _ o f f s e t _ s e e d / / S a v e K A S L R l i n e a r m a p s e e d
strh w24 , [ x5 , : l o 1 2 : m e m s t a r t _ o f f s e t _ s e e d ]
# endif
2020-12-22 23:02:06 +03:00
# if d e f i n e d ( C O N F I G _ K A S A N _ G E N E R I C ) | | d e f i n e d ( C O N F I G _ K A S A N _ S W _ T A G S )
2015-10-12 18:52:58 +03:00
bl k a s a n _ e a r l y _ i n i t
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 16:12:01 +03:00
# endif
2022-07-13 17:09:49 +03:00
mov x0 , x21 / / p a s s F D T a d d r e s s i n x0
bl e a r l y _ f d t _ m a p / / T r y m a p p i n g t h e F D T e a r l y
mov x0 , x20 / / p a s s t h e f u l l b o o t s t a t u s
2021-02-08 12:57:22 +03:00
bl i n i t _ f e a t u r e _ o v e r r i d e / / P a r s e c p u f e a t u r e o v e r r i d e s
2022-10-27 18:59:08 +03:00
# ifdef C O N F I G _ U N W I N D _ P A T C H _ P A C _ I N T O _ S C S
bl s c s _ p a t c h _ v m l i n u x
# endif
2022-06-24 18:06:48 +03:00
mov x0 , x20
2022-06-30 19:04:52 +03:00
bl f i n a l i s e _ e l 2 / / P r e f e r V H E i f p o s s i b l e
2021-05-20 14:50:30 +03:00
ldp x29 , x30 , [ s p ] , #16
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
bl s t a r t _ k e r n e l
ASM_ B U G ( )
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ p r i m a r y _ s w i t c h e d )
2014-11-22 00:50:41 +03:00
/ *
* end e a r l y h e a d s e c t i o n , b e g i n h e a d c o d e t h a t i s a l s o u s e d f o r
* hotplug a n d n e e d s t o h a v e t h e s a m e p r o t e c t i o n s a s t h e t e x t r e g i o n
* /
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 19:23:17 +03:00
.section " .idmap .text " , " a"
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 16:12:01 +03:00
2012-03-05 15:49:27 +04:00
/ *
2020-11-13 15:49:23 +03:00
* Starting f r o m E L 2 o r E L 1 , c o n f i g u r e t h e C P U t o e x e c u t e a t t h e h i g h e s t
* reachable E L s u p p o r t e d b y t h e k e r n e l i n a c h o s e n d e f a u l t s t a t e . I f d r o p p i n g
* from E L 2 t o E L 1 , c o n f i g u r e E L 2 b e f o r e c o n f i g u r i n g E L 1 .
2013-10-11 17:52:16 +04:00
*
2020-11-13 15:49:25 +03:00
* Since w e c a n n o t a l w a y s r e l y o n E R E T s y n c h r o n i z i n g w r i t e s t o s y s r e g s ( e . g . i f
* SCTLR_ E L x . E O S i s c l e a r ) , w e p l a c e a n I S B p r i o r t o E R E T .
2013-10-11 17:52:16 +04:00
*
2022-06-30 19:04:53 +03:00
* Returns e i t h e r B O O T _ C P U _ M O D E _ E L 1 o r B O O T _ C P U _ M O D E _ E L 2 i n x0 i f
* booted i n E L 1 o r E L 2 r e s p e c t i v e l y , w i t h t h e t o p 3 2 b i t s c o n t a i n i n g
* potential c o n t e x t f l a g s . T h e s e f l a g s a r e * n o t * s t o r e d i n _ _ b o o t _ c p u _ m o d e .
2023-01-11 13:22:35 +03:00
*
* x0 : whether w e a r e b e i n g c a l l e d f r o m t h e p r i m a r y b o o t p a t h w i t h t h e M M U o n
2012-03-05 15:49:27 +04:00
* /
2020-11-13 15:49:23 +03:00
SYM_ F U N C _ S T A R T ( i n i t _ k e r n e l _ e l )
2023-01-11 13:22:35 +03:00
mrs x1 , C u r r e n t E L
cmp x1 , #C u r r e n t E L _ E L 2
2020-11-13 15:49:25 +03:00
b. e q i n i t _ e l 2
SYM_ I N N E R _ L A B E L ( i n i t _ e l 1 , S Y M _ L _ L O C A L )
2021-04-08 16:10:09 +03:00
mov_ q x0 , I N I T _ S C T L R _ E L 1 _ M M U _ O F F
2023-01-11 13:22:33 +03:00
pre_ d i s a b l e _ m m u _ w o r k a r o u n d
2021-04-08 16:10:09 +03:00
msr s c t l r _ e l 1 , x0
2013-10-11 17:52:17 +04:00
isb
2020-11-13 15:49:25 +03:00
mov_ q x0 , I N I T _ P S T A T E _ E L 1
msr s p s r _ e l 1 , x0
msr e l r _ e l 1 , l r
mov w0 , #B O O T _ C P U _ M O D E _ E L 1
eret
2012-03-05 15:49:27 +04:00
2020-11-13 15:49:25 +03:00
SYM_ I N N E R _ L A B E L ( i n i t _ e l 2 , S Y M _ L _ L O C A L )
2023-01-11 13:22:35 +03:00
msr e l r _ e l 2 , l r
/ / clean a l l H Y P c o d e t o t h e P o C i f w e b o o t e d a t E L 2 w i t h t h e M M U o n
cbz x0 , 0 f
adrp x0 , _ _ h y p _ i d m a p _ t e x t _ s t a r t
adr_ l x1 , _ _ h y p _ t e x t _ e n d
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 19:23:17 +03:00
adr_ l x2 , d c a c h e _ c l e a n _ p o c
blr x2
2023-01-11 13:22:35 +03:00
0 :
2020-12-02 21:41:04 +03:00
mov_ q x0 , H C R _ H O S T _ N V H E _ F L A G S
msr h c r _ e l 2 , x0
2017-10-31 18:51:04 +03:00
isb
2020-12-02 21:41:04 +03:00
2021-02-08 12:57:17 +03:00
init_ e l 2 _ s t a t e
2017-10-31 18:51:04 +03:00
2012-10-19 20:46:27 +04:00
/* Hypervisor stub */
2020-12-02 21:41:04 +03:00
adr_ l x0 , _ _ h y p _ s t u b _ v e c t o r s
2012-10-19 20:46:27 +04:00
msr v b a r _ e l 2 , x0
2020-11-13 15:49:25 +03:00
isb
2020-12-02 21:41:04 +03:00
2022-06-30 19:04:54 +03:00
mov_ q x1 , I N I T _ S C T L R _ E L 1 _ M M U _ O F F
2021-04-08 16:10:09 +03:00
/ *
* Fruity C P U s s e e m t o h a v e H C R _ E L 2 . E 2 H s e t t o R E S 1 ,
* making i t i m p o s s i b l e t o s t a r t i n n V H E m o d e . I s t h a t
* compliant w i t h t h e a r c h i t e c t u r e ? A b s o l u t e l y n o t !
* /
mrs x0 , h c r _ e l 2
and x0 , x0 , #H C R _ E 2 H
cbz x0 , 1 f
2022-06-30 19:04:54 +03:00
/* Set a sane SCTLR_EL1, the VHE way */
2023-01-11 13:22:33 +03:00
pre_ d i s a b l e _ m m u _ w o r k a r o u n d
2022-06-30 19:04:54 +03:00
msr_ s S Y S _ S C T L R _ E L 1 2 , x1
mov x2 , #B O O T _ C P U _ F L A G _ E 2 H
b 2 f
2021-04-08 16:10:09 +03:00
1 :
2023-01-11 13:22:33 +03:00
pre_ d i s a b l e _ m m u _ w o r k a r o u n d
2022-06-30 19:04:54 +03:00
msr s c t l r _ e l 1 , x1
mov x2 , x z r
2 :
2020-11-13 15:49:25 +03:00
mov w0 , #B O O T _ C P U _ M O D E _ E L 2
2022-06-30 19:04:54 +03:00
orr x0 , x0 , x2
2012-03-05 15:49:27 +04:00
eret
2020-11-13 15:49:23 +03:00
SYM_ F U N C _ E N D ( i n i t _ k e r n e l _ e l )
2012-03-05 15:49:27 +04:00
/ *
* This p r o v i d e s a " h o l d i n g p e n " f o r p l a t f o r m s t o h o l d a l l s e c o n d a r y
* cores a r e h e l d u n t i l w e ' r e r e a d y f o r t h e m t o i n i t i a l i s e .
* /
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T ( s e c o n d a r y _ h o l d i n g _ p e n )
2023-01-11 13:22:35 +03:00
mov x0 , x z r
2020-11-13 15:49:23 +03:00
bl i n i t _ k e r n e l _ e l / / w0 =cpu_boot_mode
2022-06-24 18:06:48 +03:00
mrs x2 , m p i d r _ e l 1
2016-04-18 18:09:45 +03:00
mov_ q x1 , M P I D R _ H W I D _ B I T M A S K
2022-06-24 18:06:48 +03:00
and x2 , x2 , x1
2015-03-10 17:00:03 +03:00
adr_ l x3 , s e c o n d a r y _ h o l d i n g _ p e n _ r e l e a s e
2012-03-05 15:49:27 +04:00
pen : ldr x4 , [ x3 ]
2022-06-24 18:06:48 +03:00
cmp x4 , x2
2012-03-05 15:49:27 +04:00
b. e q s e c o n d a r y _ s t a r t u p
wfe
b p e n
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( s e c o n d a r y _ h o l d i n g _ p e n )
arm64: factor out spin-table boot method
The arm64 kernel has an internal holding pen, which is necessary for
some systems where we can't bring CPUs online individually and must hold
multiple CPUs in a safe area until the kernel is able to handle them.
The current SMP infrastructure for arm64 is closely coupled to this
holding pen, and alternative boot methods must launch CPUs into the pen,
where they sit before they are launched into the kernel proper.
With PSCI (and possibly other future boot methods), we can bring CPUs
online individually, and need not perform the secondary_holding_pen
dance. Instead, this patch factors the holding pen management code out
to the spin-table boot method code, as it is the only boot method
requiring the pen.
A new entry point for secondaries, secondary_entry is added for other
boot methods to use, which bypasses the holding pen and its associated
overhead when bringing CPUs online. The smp.pen.text section is also
removed, as the pen can live in head.text without problem.
The cpu_operations structure is extended with two new functions,
cpu_boot and cpu_postboot, for bringing a cpu into the kernel and
performing any post-boot cleanup required by a bootmethod (e.g.
resetting the secondary_holding_pen_release to INVALID_HWID).
Documentation is added for cpu_operations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-10-24 23:30:16 +04:00
/ *
* Secondary e n t r y p o i n t t h a t j u m p s s t r a i g h t i n t o t h e k e r n e l . O n l y t o
* be u s e d w h e r e C P U s a r e b r o u g h t o n l i n e d y n a m i c a l l y b y t h e k e r n e l .
* /
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T ( s e c o n d a r y _ e n t r y )
2023-01-11 13:22:35 +03:00
mov x0 , x z r
2020-11-13 15:49:23 +03:00
bl i n i t _ k e r n e l _ e l / / w0 =cpu_boot_mode
arm64: factor out spin-table boot method
The arm64 kernel has an internal holding pen, which is necessary for
some systems where we can't bring CPUs online individually and must hold
multiple CPUs in a safe area until the kernel is able to handle them.
The current SMP infrastructure for arm64 is closely coupled to this
holding pen, and alternative boot methods must launch CPUs into the pen,
where they sit before they are launched into the kernel proper.
With PSCI (and possibly other future boot methods), we can bring CPUs
online individually, and need not perform the secondary_holding_pen
dance. Instead, this patch factors the holding pen management code out
to the spin-table boot method code, as it is the only boot method
requiring the pen.
A new entry point for secondaries, secondary_entry is added for other
boot methods to use, which bypasses the holding pen and its associated
overhead when bringing CPUs online. The smp.pen.text section is also
removed, as the pen can live in head.text without problem.
The cpu_operations structure is extended with two new functions,
cpu_boot and cpu_postboot, for bringing a cpu into the kernel and
performing any post-boot cleanup required by a bootmethod (e.g.
resetting the secondary_holding_pen_release to INVALID_HWID).
Documentation is added for cpu_operations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-10-24 23:30:16 +04:00
b s e c o n d a r y _ s t a r t u p
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( s e c o n d a r y _ e n t r y )
2012-03-05 15:49:27 +04:00
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( s e c o n d a r y _ s t a r t u p )
2012-03-05 15:49:27 +04:00
/ *
* Common e n t r y p o i n t f o r s e c o n d a r y C P U s .
* /
2022-06-24 18:06:48 +03:00
mov x20 , x0 / / p r e s e r v e b o o t m o d e
2018-12-07 01:50:40 +03:00
bl _ _ c p u _ s e c o n d a r y _ c h e c k 5 2 b i t v a
2022-07-01 14:10:45 +03:00
# if V A _ B I T S > 4 8
ldr_ l x0 , v a b i t s _ a c t u a l
# endif
2015-03-18 17:55:20 +03:00
bl _ _ c p u _ s e t u p / / i n i t i a l i s e p r o c e s s o r
2018-09-24 16:51:13 +03:00
adrp x1 , s w a p p e r _ p g _ d i r
2022-06-24 18:06:39 +03:00
adrp x2 , i d m a p _ p g _ d i r
2016-08-31 14:05:14 +03:00
bl _ _ e n a b l e _ m m u
ldr x8 , =__secondary_switched
br x8
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( s e c o n d a r y _ s t a r t u p )
2012-03-05 15:49:27 +04:00
2023-01-11 13:22:32 +03:00
.text
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ s e c o n d a r y _ s w i t c h e d )
2022-06-24 18:06:48 +03:00
mov x0 , x20
bl s e t _ c p u _ b o o t _ m o d e _ f l a g
2023-01-11 13:22:31 +03:00
mov x0 , x20
bl f i n a l i s e _ e l 2
2022-06-24 18:06:48 +03:00
str_ l x z r , _ _ e a r l y _ c p u _ b o o t _ s t a t u s , x3
2015-12-26 14:46:40 +03:00
adr_ l x5 , v e c t o r s
msr v b a r _ e l 1 , x5
isb
2016-02-23 13:31:42 +03:00
adr_ l x0 , s e c o n d a r y _ d a t a
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-03 23:23:13 +03:00
ldr x2 , [ x0 , #C P U _ B O O T _ T A S K ]
2019-08-27 16:36:38 +03:00
cbz x2 , _ _ s e c o n d a r y _ t o o _ s l o w
2021-05-20 14:50:29 +03:00
2021-05-20 14:50:31 +03:00
init_ c p u _ t a s k x2 , x1 , x3
arm64: simplify ptrauth initialization
Currently __cpu_setup conditionally initializes the address
authentication keys and enables them in SCTLR_EL1, doing so differently
for the primary CPU and secondary CPUs, and skipping this work for CPUs
returning from an idle state. For the latter case, cpu_do_resume
restores the keys and SCTLR_EL1 value after the MMU has been enabled.
This flow is rather difficult to follow, so instead let's move the
primary and secondary CPU initialization into their respective boot
paths. By following the example of cpu_do_resume and doing so once the
MMU is enabled, we can always initialize the keys from the values in
thread_struct, and avoid the machinery necessary to pass the keys in
secondary_data or open-coding initialization for the boot CPU.
This means we perform an additional RMW of SCTLR_EL1, but we already do
this in the cpu_do_resume path, and for other features in cpufeature.c,
so this isn't a major concern in a bringup path. Note that even while
the enable bits are clear, the key registers are accessible.
As this now renders the argument to __cpu_setup redundant, let's also
remove that entirely. Future extensions can follow a similar approach to
initialize values that differ for primary/secondary CPUs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200423101606.37601-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-23 13:16:06 +03:00
# ifdef C O N F I G _ A R M 6 4 _ P T R _ A U T H
ptrauth_ k e y s _ i n i t _ c p u x2 , x3 , x4 , x5
# endif
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 14:00:26 +03:00
bl s e c o n d a r y _ s t a r t _ k e r n e l
ASM_ B U G ( )
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ s e c o n d a r y _ s w i t c h e d )
2012-03-05 15:49:27 +04:00
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ s e c o n d a r y _ t o o _ s l o w )
2019-08-27 16:36:38 +03:00
wfe
wfi
b _ _ s e c o n d a r y _ t o o _ s l o w
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ s e c o n d a r y _ t o o _ s l o w )
2019-08-27 16:36:38 +03:00
2023-01-11 13:22:32 +03:00
/ *
* Sets t h e _ _ b o o t _ c p u _ m o d e f l a g d e p e n d i n g o n t h e C P U b o o t m o d e p a s s e d
* in w0 . S e e a r c h / a r m 6 4 / i n c l u d e / a s m / v i r t . h f o r m o r e i n f o .
* /
SYM_ F U N C _ S T A R T _ L O C A L ( s e t _ c p u _ b o o t _ m o d e _ f l a g )
adr_ l x1 , _ _ b o o t _ c p u _ m o d e
cmp w0 , #B O O T _ C P U _ M O D E _ E L 2
b. n e 1 f
add x1 , x1 , #4
1 : str w0 , [ x1 ] / / S a v e C P U b o o t m o d e
ret
SYM_ F U N C _ E N D ( s e t _ c p u _ b o o t _ m o d e _ f l a g )
2016-02-23 13:31:42 +03:00
/ *
* The b o o t i n g C P U u p d a t e s t h e f a i l e d s t a t u s @__early_cpu_boot_status,
* with M M U t u r n e d o f f .
*
* update_ e a r l y _ c p u _ b o o t _ s t a t u s t m p , s t a t u s
* - Corrupts t m p1 , t m p2
* - Writes ' s t a t u s ' t o _ _ e a r l y _ c p u _ b o o t _ s t a t u s a n d m a k e s s u r e
* it i s c o m m i t t e d t o m e m o r y .
* /
.macro update_early_cpu_boot_status status, t m p1 , t m p2
mov \ t m p2 , #\ s t a t u s
arm64: fix invalidation of wrong __early_cpu_boot_status cacheline
In head.S, the str_l macro, which takes a source register, a symbol name
and a temp register, is used to store a status value to the variable
__early_cpu_boot_status. Subsequently, the value of the temp register is
reused to invalidate any cachelines covering this variable.
However, since str_l resolves to
adrp \tmp, \sym
str \src, [\tmp, :lo12:\sym]
the temp register never actually holds the address of the variable but
only of the 4 KB window that covers it, and reusing it leads to the
wrong cacheline being invalidated. So instead, take the address
explicitly before doing the store, and reuse that value to perform
the cache invalidation.
Fixes: bb9052744f4b ("arm64: Handle early CPU boot failures")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-04-15 13:11:21 +03:00
adr_ l \ t m p1 , _ _ e a r l y _ c p u _ b o o t _ s t a t u s
str \ t m p2 , [ \ t m p1 ]
2016-02-23 13:31:42 +03:00
dmb s y
dc i v a c , \ t m p1 / / I n v a l i d a t e p o t e n t i a l l y s t a l e c a c h e l i n e
.endm
2012-03-05 15:49:27 +04:00
/ *
2015-03-17 10:59:53 +03:00
* Enable t h e M M U .
2012-03-05 15:49:27 +04:00
*
2015-03-17 10:59:53 +03:00
* x0 = S C T L R _ E L 1 v a l u e f o r t u r n i n g o n t h e M M U .
2018-09-24 16:51:13 +03:00
* x1 = T T B R 1 _ E L 1 v a l u e
2022-06-24 18:06:39 +03:00
* x2 = I D m a p r o o t t a b l e a d d r e s s
2015-03-17 10:59:53 +03:00
*
2016-08-31 14:05:14 +03:00
* Returns t o t h e c a l l e r v i a x30 / l r . T h i s r e q u i r e s t h e c a l l e r t o b e c o v e r e d
* by t h e . i d m a p . t e x t s e c t i o n .
2015-10-19 16:19:35 +03:00
*
* Checks i f t h e s e l e c t e d g r a n u l e s i z e i s s u p p o r t e d b y t h e C P U .
* If i t i s n ' t , p a r k t h e C P U
2012-03-05 15:49:27 +04:00
* /
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 19:23:17 +03:00
.section " .idmap .text " , " a"
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T ( _ _ e n a b l e _ m m u )
2022-06-24 18:06:39 +03:00
mrs x3 , I D _ A A 6 4 M M F R 0 _ E L 1
2022-09-06 01:54:01 +03:00
ubfx x3 , x3 , #I D _ A A 64 M M F R 0 _ E L 1 _ T G R A N _ S H I F T , 4
cmp x3 , #I D _ A A 64 M M F R 0 _ E L 1 _ T G R A N _ S U P P O R T E D _ M I N
2021-03-10 08:53:10 +03:00
b. l t _ _ n o _ g r a n u l e _ s u p p o r t
2022-09-06 01:54:01 +03:00
cmp x3 , #I D _ A A 64 M M F R 0 _ E L 1 _ T G R A N _ S U P P O R T E D _ M A X
2021-03-10 08:53:10 +03:00
b. g t _ _ n o _ g r a n u l e _ s u p p o r t
2018-09-24 16:51:13 +03:00
phys_ t o _ t t b r x2 , x2
msr t t b r0 _ e l 1 , x2 / / l o a d T T B R 0
2022-06-24 18:06:46 +03:00
load_ t t b r1 x1 , x1 , x3
2021-02-08 12:57:12 +03:00
set_ s c t l r _ e l 1 x0
2016-08-31 14:05:14 +03:00
ret
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ e n a b l e _ m m u )
2015-10-19 16:19:35 +03:00
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T ( _ _ c p u _ s e c o n d a r y _ c h e c k 5 2 b i t v a )
2022-06-24 18:06:32 +03:00
# if V A _ B I T S > 4 8
2019-08-07 18:55:23 +03:00
ldr_ l x0 , v a b i t s _ a c t u a l
2018-12-07 01:50:40 +03:00
cmp x0 , #52
b. n e 2 f
mrs_ s x0 , S Y S _ I D _ A A 6 4 M M F R 2 _ E L 1
2022-09-06 01:54:08 +03:00
and x0 , x0 , #( 0xf < < I D _ A A 6 4 M M F R 2 _ E L 1 _ V A R a n g e _ S H I F T )
2018-12-07 01:50:40 +03:00
cbnz x0 , 2 f
2018-12-10 17:21:13 +03:00
update_ e a r l y _ c p u _ b o o t _ s t a t u s \
CPU_ S T U C K _ I N _ K E R N E L | C P U _ S T U C K _ R E A S O N _ 5 2 _ B I T _ V A , x0 , x1
2018-12-07 01:50:40 +03:00
1 : wfe
wfi
b 1 b
# endif
2 : ret
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ c p u _ s e c o n d a r y _ c h e c k 5 2 b i t v a )
2018-12-07 01:50:40 +03:00
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ n o _ g r a n u l e _ s u p p o r t )
2016-02-23 13:31:42 +03:00
/* Indicate that this CPU can't boot and is stuck in the kernel */
2018-12-10 17:21:13 +03:00
update_ e a r l y _ c p u _ b o o t _ s t a t u s \
CPU_ S T U C K _ I N _ K E R N E L | C P U _ S T U C K _ R E A S O N _ N O _ G R A N , x1 , x2
2016-02-23 13:31:42 +03:00
1 :
2015-10-19 16:19:35 +03:00
wfe
2016-02-23 13:31:42 +03:00
wfi
2016-08-31 14:05:13 +03:00
b 1 b
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ n o _ g r a n u l e _ s u p p o r t )
2016-04-18 18:09:42 +03:00
2016-04-18 18:09:43 +03:00
# ifdef C O N F I G _ R E L O C A T A B L E
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ r e l o c a t e _ k e r n e l )
2016-04-18 18:09:43 +03:00
/ *
* Iterate o v e r e a c h e n t r y i n t h e r e l o c a t i o n t a b l e , a n d a p p l y t h e
* relocations i n p l a c e .
* /
2022-06-24 18:06:43 +03:00
adr_ l x9 , _ _ r e l a _ s t a r t
adr_ l x10 , _ _ r e l a _ e n d
2016-04-18 18:09:45 +03:00
mov_ q x11 , K I M A G E _ V A D D R / / d e f a u l t v i r t u a l o f f s e t
2016-04-18 18:09:43 +03:00
add x11 , x11 , x23 / / a c t u a l v i r t u a l o f f s e t
0 : cmp x9 , x10
arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
The linker routines that we rely on to produce a relocatable PIE binary
treat it as a shared ELF object in some ways, i.e., it emits symbol based
R_AARCH64_ABS64 relocations into the final binary since doing so would be
appropriate when linking a shared library that is subject to symbol
preemption. (This means that an executable can override certain symbols
that are exported by a shared library it is linked with, and that the
shared library *must* update all its internal references as well, and point
them to the version provided by the executable.)
Symbol preemption does not occur for OS hosted PIE executables, let alone
for vmlinux, and so we would prefer to get rid of these symbol based
relocations. This would allow us to simplify the relocation routines, and
to strip the .dynsym, .dynstr and .hash sections from the binary. (Note
that these are tiny, and are placed in the .init segment, but they clutter
up the vmlinux binary.)
Note that these R_AARCH64_ABS64 relocations are only emitted for absolute
references to symbols defined in the linker script, all other relocatable
quantities are covered by anonymous R_AARCH64_RELATIVE relocations that
simply list the offsets to all 64-bit values in the binary that need to be
fixed up based on the offset between the link time and run time addresses.
Fortunately, GNU ld has a -Bsymbolic option, which is intended for shared
libraries to allow them to ignore symbol preemption, and unconditionally
bind all internal symbol references to its own definitions. So set it for
our PIE binary as well, and get rid of the asoociated sections and the
relocation code that processes them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: fixed conflict with __dynsym_offset linker script entry]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-24 15:00:13 +03:00
b. h s 1 f
2019-08-01 04:18:42 +03:00
ldp x12 , x13 , [ x9 ] , #24
ldr x14 , [ x9 , #- 8 ]
cmp w13 , #R _ A A R C H 64 _ R E L A T I V E
arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
The linker routines that we rely on to produce a relocatable PIE binary
treat it as a shared ELF object in some ways, i.e., it emits symbol based
R_AARCH64_ABS64 relocations into the final binary since doing so would be
appropriate when linking a shared library that is subject to symbol
preemption. (This means that an executable can override certain symbols
that are exported by a shared library it is linked with, and that the
shared library *must* update all its internal references as well, and point
them to the version provided by the executable.)
Symbol preemption does not occur for OS hosted PIE executables, let alone
for vmlinux, and so we would prefer to get rid of these symbol based
relocations. This would allow us to simplify the relocation routines, and
to strip the .dynsym, .dynstr and .hash sections from the binary. (Note
that these are tiny, and are placed in the .init segment, but they clutter
up the vmlinux binary.)
Note that these R_AARCH64_ABS64 relocations are only emitted for absolute
references to symbols defined in the linker script, all other relocatable
quantities are covered by anonymous R_AARCH64_RELATIVE relocations that
simply list the offsets to all 64-bit values in the binary that need to be
fixed up based on the offset between the link time and run time addresses.
Fortunately, GNU ld has a -Bsymbolic option, which is intended for shared
libraries to allow them to ignore symbol preemption, and unconditionally
bind all internal symbol references to its own definitions. So set it for
our PIE binary as well, and get rid of the asoociated sections and the
relocation code that processes them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: fixed conflict with __dynsym_offset linker script entry]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-24 15:00:13 +03:00
b. n e 0 b
2019-08-01 04:18:42 +03:00
add x14 , x14 , x23 / / r e l o c a t e
str x14 , [ x12 , x23 ]
2016-04-18 18:09:43 +03:00
b 0 b
2019-08-01 04:18:42 +03:00
1 :
# ifdef C O N F I G _ R E L R
/ *
* Apply R E L R r e l o c a t i o n s .
*
* RELR i s a c o m p r e s s e d f o r m a t f o r s t o r i n g r e l a t i v e r e l o c a t i o n s . T h e
* encoded s e q u e n c e o f e n t r i e s l o o k s l i k e :
* [ AAAAAAAA B B B B B B B 1 B B B B B B B 1 . . . A A A A A A A A B B B B B B 1 . . . ]
*
* i. e . s t a r t w i t h a n a d d r e s s , f o l l o w e d b y a n y n u m b e r o f b i t m a p s . T h e
* address e n t r y e n c o d e s 1 r e l o c a t i o n . T h e s u b s e q u e n t b i t m a p e n t r i e s
* encode u p t o 6 3 r e l o c a t i o n s e a c h , a t s u b s e q u e n t o f f s e t s f o l l o w i n g
* the l a s t a d d r e s s e n t r y .
*
* The b i t m a p e n t r i e s m u s t h a v e 1 i n t h e l e a s t s i g n i f i c a n t b i t . T h e
* assumption h e r e i s t h a t a n a d d r e s s c a n n o t h a v e 1 i n l s b . O d d
* addresses a r e n o t s u p p o r t e d . A n y o d d a d d r e s s e s a r e s t o r e d i n t h e R E L A
* section, w h i c h i s h a n d l e d a b o v e .
*
* Excluding t h e l e a s t s i g n i f i c a n t b i t i n t h e b i t m a p , e a c h n o n - z e r o
* bit i n t h e b i t m a p r e p r e s e n t s a r e l o c a t i o n t o b e a p p l i e d t o
* a c o r r e s p o n d i n g m a c h i n e w o r d t h a t f o l l o w s t h e b a s e a d d r e s s
* word. T h e s e c o n d l e a s t s i g n i f i c a n t b i t r e p r e s e n t s t h e m a c h i n e
* word i m m e d i a t e l y f o l l o w i n g t h e i n i t i a l a d d r e s s , a n d e a c h b i t
* that f o l l o w s r e p r e s e n t s t h e n e x t w o r d , i n l i n e a r o r d e r . A s s u c h ,
* a s i n g l e b i t m a p c a n e n c o d e u p t o 6 3 r e l o c a t i o n s i n a 6 4 - b i t o b j e c t .
*
* In t h i s i m p l e m e n t a t i o n w e s t o r e t h e a d d r e s s o f t h e n e x t R E L R t a b l e
* entry i n x9 , t h e a d d r e s s b e i n g r e l o c a t e d b y t h e c u r r e n t a d d r e s s o r
* bitmap e n t r y i n x13 a n d t h e a d d r e s s b e i n g r e l o c a t e d b y t h e c u r r e n t
* bit i n x14 .
* /
2022-06-24 18:06:43 +03:00
adr_ l x9 , _ _ r e l r _ s t a r t
adr_ l x10 , _ _ r e l r _ e n d
2019-08-01 04:18:42 +03:00
2 : cmp x9 , x10
b. h s 7 f
ldr x11 , [ x9 ] , #8
tbnz x11 , #0 , 3 f / / b r a n c h t o h a n d l e b i t m a p s
add x13 , x11 , x23
ldr x12 , [ x13 ] / / r e l o c a t e a d d r e s s e n t r y
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
add x12 , x12 , x23
2019-08-01 04:18:42 +03:00
str x12 , [ x13 ] , #8 / / a d j u s t t o s t a r t o f b i t m a p
b 2 b
3 : mov x14 , x13
4 : lsr x11 , x11 , #1
cbz x11 , 6 f
tbz x11 , #0 , 5 f / / s k i p b i t i f n o t s e t
ldr x12 , [ x14 ] / / r e l o c a t e b i t
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
add x12 , x12 , x23
2019-08-01 04:18:42 +03:00
str x12 , [ x14 ]
5 : add x14 , x14 , #8 / / m o v e t o n e x t b i t ' s a d d r e s s
b 4 b
6 : / *
* Move t o t h e n e x t b i t m a p ' s a d d r e s s . 8 i s t h e w o r d s i z e , a n d 6 3 i s t h e
* number o f s i g n i f i c a n t b i t s i n a b i t m a p e n t r y .
* /
add x13 , x13 , #( 8 * 6 3 )
b 2 b
7 :
# endif
ret
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ r e l o c a t e _ k e r n e l )
2016-08-31 14:05:13 +03:00
# endif
2016-04-18 18:09:43 +03:00
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ S T A R T _ L O C A L ( _ _ p r i m a r y _ s w i t c h )
2022-06-24 18:06:47 +03:00
adrp x1 , r e s e r v e d _ p g _ d i r
2022-06-24 18:06:42 +03:00
adrp x2 , i n i t _ i d m a p _ p g _ d i r
2016-08-31 14:05:14 +03:00
bl _ _ e n a b l e _ m m u
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
# ifdef C O N F I G _ R E L O C A T A B L E
2022-06-29 07:12:07 +03:00
adrp x23 , K E R N E L _ S T A R T
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 18:06:50 +03:00
and x23 , x23 , M I N _ K I M G _ A L I G N - 1
# ifdef C O N F I G _ R A N D O M I Z E _ B A S E
mov x0 , x22
adrp x1 , i n i t _ p g _ e n d
mov s p , x1
mov x29 , x z r
bl _ _ p i _ k a s l r _ e a r l y _ i n i t
and x24 , x0 , #S Z _ 2 M - 1 / / c a p t u r e m e m s t a r t o f f s e t s e e d
bic x0 , x0 , #S Z _ 2 M - 1
orr x23 , x23 , x0 / / r e c o r d k e r n e l o f f s e t
# endif
# endif
2022-06-24 18:06:47 +03:00
bl c l e a r _ p a g e _ t a b l e s
bl c r e a t e _ k e r n e l _ m a p p i n g
adrp x1 , i n i t _ p g _ d i r
load_ t t b r1 x1 , x1 , x2
2016-08-31 14:05:13 +03:00
# ifdef C O N F I G _ R E L O C A T A B L E
bl _ _ r e l o c a t e _ k e r n e l
2016-04-18 18:09:43 +03:00
# endif
ldr x8 , =__primary_switched
2022-06-29 07:12:07 +03:00
adrp x0 , K E R N E L _ S T A R T / / _ _ p a ( K E R N E L _ S T A R T )
2016-04-18 18:09:43 +03:00
br x8
2020-02-18 22:58:33 +03:00
SYM_ F U N C _ E N D ( _ _ p r i m a r y _ s w i t c h )