License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
/* SPDX-License-Identifier: GPL-2.0 */
2012-04-20 14:45:54 +01:00
/ *
* ld s c r i p t t o m a k e A R M L i n u x k e r n e l
* taken f r o m t h e i 3 8 6 v e r s i o n b y R u s s e l l K i n g
* Written b y M a r t i n M a r e s < m j @atrey.karlin.mff.cuni.cz>
* /
2020-09-22 21:49:02 +01:00
# include < a s m / h y p _ i m a g e . h >
2020-09-22 21:49:03 +01:00
# ifdef C O N F I G _ K V M
2020-08-21 15:07:05 +01:00
# define H Y P E R V I S O R _ E X T A B L E \
. = ALIGN( S Z _ 8 ) ; \
_ _ start_ _ _ k v m _ e x _ t a b l e = . ; \
* ( _ _ kvm_ e x _ t a b l e ) \
_ _ stop_ _ _ k v m _ e x _ t a b l e = . ;
2020-09-22 21:49:09 +01:00
2020-12-02 18:41:08 +00:00
# define H Y P E R V I S O R _ D A T A _ S E C T I O N S \
2021-01-05 18:05:35 +00:00
HYP_ S E C T I O N _ N A M E ( . r o d a t a ) : { \
2021-03-19 10:01:44 +00:00
. = ALIGN( P A G E _ S I Z E ) ; \
2021-01-05 18:05:35 +00:00
_ _ hyp_ r o d a t a _ s t a r t = . ; \
2020-12-02 18:41:08 +00:00
* ( HYP_ S E C T I O N _ N A M E ( . d a t a . . r o _ a f t e r _ i n i t ) ) \
2021-01-05 18:05:35 +00:00
* ( HYP_ S E C T I O N _ N A M E ( . r o d a t a ) ) \
2021-03-19 10:01:44 +00:00
. = ALIGN( P A G E _ S I Z E ) ; \
2021-01-05 18:05:35 +00:00
_ _ hyp_ r o d a t a _ e n d = . ; \
2020-12-02 18:41:08 +00:00
}
2020-09-22 21:49:09 +01:00
# define H Y P E R V I S O R _ P E R C P U _ S E C T I O N \
. = ALIGN( P A G E _ S I Z E ) ; \
HYP_ S E C T I O N _ N A M E ( . d a t a . . p e r c p u ) : { \
* ( HYP_ S E C T I O N _ N A M E ( . d a t a . . p e r c p u ) ) \
}
2021-01-05 18:05:37 +00:00
# define H Y P E R V I S O R _ R E L O C _ S E C T I O N \
.hyp .reloc : ALIGN( 4 ) { \
_ _ hyp_ r e l o c _ b e g i n = . ; \
* ( .hyp .reloc ) \
_ _ hyp_ r e l o c _ e n d = . ; \
}
2021-03-19 10:01:15 +00:00
# define B S S _ F I R S T _ S E C T I O N S \
_ _ hyp_ b s s _ s t a r t = . ; \
* ( HYP_ S E C T I O N _ N A M E ( . b s s ) ) \
. = ALIGN( P A G E _ S I Z E ) ; \
_ _ hyp_ b s s _ e n d = . ;
/ *
* We r e q u i r e t h a t _ _ h y p _ b s s _ s t a r t a n d _ _ b s s _ s t a r t a r e a l i g n e d , a n d e n f o r c e i t
* with a n a s s e r t i o n . B u t t h e B S S _ S E C T I O N m a c r o p l a c e s a n e m p t y . s b s s s e c t i o n
* between t h e m , w h i c h c a n i n s o m e c a s e s c a u s e t h e l i n k e r t o m i s a l i g n t h e m . T o
* work a r o u n d t h e i s s u e , f o r c e a p a g e a l i g n m e n t f o r _ _ b s s _ s t a r t .
* /
# define S B S S _ A L I G N P A G E _ S I Z E
2020-09-22 21:49:03 +01:00
# else / * C O N F I G _ K V M * /
# define H Y P E R V I S O R _ E X T A B L E
2020-12-02 18:41:08 +00:00
# define H Y P E R V I S O R _ D A T A _ S E C T I O N S
2020-09-22 21:49:09 +01:00
# define H Y P E R V I S O R _ P E R C P U _ S E C T I O N
2021-01-05 18:05:37 +00:00
# define H Y P E R V I S O R _ R E L O C _ S E C T I O N
2021-03-19 10:01:15 +00:00
# define S B S S _ A L I G N 0
2020-09-22 21:49:03 +01:00
# endif
2020-08-21 15:07:05 +01:00
arm64: extable: add `type` and `data` fields
Subsequent patches will add specialized handlers for fixups, in addition
to the simple PC fixup and BPF handlers we have today. In preparation,
this patch adds a new `type` field to struct exception_table_entry, and
uses this to distinguish the fixup and BPF cases. A `data` field is also
added so that subsequent patches can associate data specific to each
exception site (e.g. register numbers).
Handlers are named ex_handler_*() for consistency, following the exmaple
of x86. At the same time, get_ex_fixup() is split out into a helper so
that it can be used by other ex_handler_*() functions ins subsequent
patches.
This patch will increase the size of the exception tables, which will be
remedied by subsequent patches removing redundant fixup code. There
should be no functional change as a result of this patch.
Since each entry is now 12 bytes in size, we must reduce the alignment
of each entry from `.align 3` (i.e. 8 bytes) to `.align 2` (i.e. 4
bytes), which is the natrual alignment of the `insn` and `fixup` fields.
The current 8-byte alignment is a holdover from when the `insn` and
`fixup` fields was 8 bytes, and while not harmful has not been necessary
since commit:
6c94f27ac847ff8e ("arm64: switch to relative exception tables")
Similarly, RO_EXCEPTION_TABLE_ALIGN is dropped to 4 bytes.
Concurrently with this patch, x86's exception table entry format is
being updated (similarly to a 12-byte format, with 32-bytes of absolute
data). Once both have been merged it should be possible to unify the
sorttable logic for the two.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morse <james.morse@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-19 17:02:16 +01:00
# define R O _ E X C E P T I O N _ T A B L E _ A L I G N 4
2021-03-19 10:01:15 +00:00
# define R U N T I M E _ D I S C A R D _ E X I T
# include < a s m - g e n e r i c / v m l i n u x . l d s . h >
# include < a s m / c a c h e . h >
# include < a s m / k e r n e l - p g t a b l e . h >
2021-09-30 14:31:08 +00:00
# include < a s m / k e x e c . h >
2021-03-19 10:01:15 +00:00
# include < a s m / m e m o r y . h >
# include < a s m / p a g e . h >
# include " i m a g e . h "
OUTPUT_ A R C H ( a a r c h64 )
ENTRY( _ t e x t )
jiffies = j i f f i e s _ 6 4 ;
2012-12-07 18:40:43 +00:00
# define H Y P E R V I S O R _ T E X T \
2021-03-19 10:01:44 +00:00
. = ALIGN( P A G E _ S I Z E ) ; \
2018-05-09 16:46:26 +09:00
_ _ hyp_ i d m a p _ t e x t _ s t a r t = . ; \
2012-12-07 18:40:43 +00:00
* ( .hyp .idmap .text ) \
2018-05-09 16:46:26 +09:00
_ _ hyp_ i d m a p _ t e x t _ e n d = . ; \
_ _ hyp_ t e x t _ s t a r t = . ; \
2012-12-07 18:40:43 +00:00
* ( .hyp .text ) \
2020-08-21 15:07:05 +01:00
HYPERVISOR_ E X T A B L E \
2021-03-19 10:01:44 +00:00
. = ALIGN( P A G E _ S I Z E ) ; \
2018-05-09 16:46:26 +09:00
_ _ hyp_ t e x t _ e n d = . ;
2012-12-07 18:40:43 +00:00
2015-06-01 13:40:33 +02:00
# define I D M A P _ T E X T \
. = ALIGN( S Z _ 4 K ) ; \
2018-05-09 16:46:26 +09:00
_ _ idmap_ t e x t _ s t a r t = . ; \
2015-06-01 13:40:33 +02:00
* ( .idmap .text ) \
2018-05-09 16:46:26 +09:00
_ _ idmap_ t e x t _ e n d = . ;
2015-06-01 13:40:33 +02:00
2016-04-27 17:47:12 +01:00
# ifdef C O N F I G _ H I B E R N A T I O N
# define H I B E R N A T E _ T E X T \
arm64: avoid executing padding bytes during kexec / hibernation
Currently we rely on the HIBERNATE_TEXT section starting with the entry
point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting
with the entry point to arm64_relocate_new_kernel. In both cases we copy
the entire section into a dynamically-allocated page, and then later
branch to the start of this page.
SYM_FUNC_START() will align the function entry points to
CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the
assembled code it will place padding bytes before the function entry
point if the location counter was not already sufficiently aligned. The
linker happens to use the value zero for these padding bytes.
This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT
is greater than 4, which can be the case with
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y.
When such padding is applied, attempting to kexec or resume from
hibernate will result ina crash: the kernel will branch to the padding
bytes as the start of the dynamically-allocated page, and as those bytes
are zero they will decode as UDF #0, which reliably triggers an
UNDEFINED exception. For example:
| # ./kexec --reuse-cmdline -f Image
| [ 46.965800] kexec_core: Starting new kernel
| [ 47.143641] psci: CPU1 killed (polled 0 ms)
| [ 47.233653] psci: CPU2 killed (polled 0 ms)
| [ 47.323465] psci: CPU3 killed (polled 0 ms)
| [ 47.324776] Bye!
| [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| [ 47.328510] Modules linked in:
| [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3
| [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [ 47.332782] pc : 0x43a95000
| [ 47.333338] lr : machine_kexec+0x190/0x1e0
| [ 47.334169] sp : ffff80000d293b70
| [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000
| [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010
| [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006
| [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658
| [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0
| [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00
| [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008
| [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000
| [ 47.349334] Call trace:
| [ 47.349834] 0x43a95000
| [ 47.350338] kernel_kexec+0x88/0x100
| [ 47.351070] __do_sys_reboot+0x108/0x268
| [ 47.351873] __arm64_sys_reboot+0x2c/0x40
| [ 47.352689] invoke_syscall+0x78/0x108
| [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100
| [ 47.354426] do_el0_svc+0x34/0x50
| [ 47.355102] el0_svc+0x34/0x108
| [ 47.355747] el0t_64_sync_handler+0xf4/0x120
| [ 47.356617] el0t_64_sync+0x194/0x198
| [ 47.357374] Code: bad PC value
| [ 47.357999] ---[ end trace 0000000000000000 ]---
| [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| [ 47.360515] Kernel Offset: disabled
| [ 47.361230] CPU features: 0x002000,00050108,c8004203
| [ 47.362232] Memory Limit: none
Note: Unfortunately the code dump reports "bad PC value" as it attempts
to dump some instructions prior to the UDF (i.e. before the start of the
page), and terminates early upon a fault, obscuring the problem.
This patch fixes this issue by aligning the section starter markes to
CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which
ensures that the linker never needs to place padding bytes within the
section. Assertions are added to verify each section begins with the
function we expect, making our implicit requirement explicit.
In future it might be nice to rework the kexec and hibernation code to
decouple the section start from the entry point, but that involves much
more significant changes that come with a higher risk of error, so I've
tried to keep this fix as simple as possible for now.
Fixes: 47a15aa54427 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT")
Reported-by: CKI Project <cki-project@redhat.com>
Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-27 11:08:20 +00:00
ALIGN_ F U N C T I O N ( ) ; \
2018-05-09 16:46:26 +09:00
_ _ hibernate_ e x i t _ t e x t _ s t a r t = . ; \
2016-04-27 17:47:12 +01:00
* ( .hibernate_exit .text ) \
2018-05-09 16:46:26 +09:00
_ _ hibernate_ e x i t _ t e x t _ e n d = . ;
2016-04-27 17:47:12 +01:00
# else
# define H I B E R N A T E _ T E X T
# endif
2021-09-30 14:31:08 +00:00
# ifdef C O N F I G _ K E X E C _ C O R E
# define K E X E C _ T E X T \
arm64: avoid executing padding bytes during kexec / hibernation
Currently we rely on the HIBERNATE_TEXT section starting with the entry
point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting
with the entry point to arm64_relocate_new_kernel. In both cases we copy
the entire section into a dynamically-allocated page, and then later
branch to the start of this page.
SYM_FUNC_START() will align the function entry points to
CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the
assembled code it will place padding bytes before the function entry
point if the location counter was not already sufficiently aligned. The
linker happens to use the value zero for these padding bytes.
This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT
is greater than 4, which can be the case with
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y.
When such padding is applied, attempting to kexec or resume from
hibernate will result ina crash: the kernel will branch to the padding
bytes as the start of the dynamically-allocated page, and as those bytes
are zero they will decode as UDF #0, which reliably triggers an
UNDEFINED exception. For example:
| # ./kexec --reuse-cmdline -f Image
| [ 46.965800] kexec_core: Starting new kernel
| [ 47.143641] psci: CPU1 killed (polled 0 ms)
| [ 47.233653] psci: CPU2 killed (polled 0 ms)
| [ 47.323465] psci: CPU3 killed (polled 0 ms)
| [ 47.324776] Bye!
| [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| [ 47.328510] Modules linked in:
| [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3
| [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [ 47.332782] pc : 0x43a95000
| [ 47.333338] lr : machine_kexec+0x190/0x1e0
| [ 47.334169] sp : ffff80000d293b70
| [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000
| [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010
| [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006
| [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658
| [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0
| [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00
| [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008
| [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000
| [ 47.349334] Call trace:
| [ 47.349834] 0x43a95000
| [ 47.350338] kernel_kexec+0x88/0x100
| [ 47.351070] __do_sys_reboot+0x108/0x268
| [ 47.351873] __arm64_sys_reboot+0x2c/0x40
| [ 47.352689] invoke_syscall+0x78/0x108
| [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100
| [ 47.354426] do_el0_svc+0x34/0x50
| [ 47.355102] el0_svc+0x34/0x108
| [ 47.355747] el0t_64_sync_handler+0xf4/0x120
| [ 47.356617] el0t_64_sync+0x194/0x198
| [ 47.357374] Code: bad PC value
| [ 47.357999] ---[ end trace 0000000000000000 ]---
| [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| [ 47.360515] Kernel Offset: disabled
| [ 47.361230] CPU features: 0x002000,00050108,c8004203
| [ 47.362232] Memory Limit: none
Note: Unfortunately the code dump reports "bad PC value" as it attempts
to dump some instructions prior to the UDF (i.e. before the start of the
page), and terminates early upon a fault, obscuring the problem.
This patch fixes this issue by aligning the section starter markes to
CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which
ensures that the linker never needs to place padding bytes within the
section. Assertions are added to verify each section begins with the
function we expect, making our implicit requirement explicit.
In future it might be nice to rework the kexec and hibernation code to
decouple the section start from the entry point, but that involves much
more significant changes that come with a higher risk of error, so I've
tried to keep this fix as simple as possible for now.
Fixes: 47a15aa54427 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT")
Reported-by: CKI Project <cki-project@redhat.com>
Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-27 11:08:20 +00:00
ALIGN_ F U N C T I O N ( ) ; \
2021-09-30 14:31:08 +00:00
_ _ relocate_ n e w _ k e r n e l _ s t a r t = . ; \
* ( .kexec_relocate .text ) \
_ _ relocate_ n e w _ k e r n e l _ e n d = . ;
# else
# define K E X E C _ T E X T
# endif
2017-11-14 14:07:40 +00:00
# ifdef C O N F I G _ U N M A P _ K E R N E L _ A T _ E L 0
# define T R A M P _ T E X T \
. = ALIGN( P A G E _ S I Z E ) ; \
2018-05-09 16:46:26 +09:00
_ _ entry_ t r a m p _ t e x t _ s t a r t = . ; \
2017-11-14 14:07:40 +00:00
* ( .entry .tramp .text ) \
. = ALIGN( P A G E _ S I Z E ) ; \
2022-06-22 18:10:10 +02:00
_ _ entry_ t r a m p _ t e x t _ e n d = . ; \
* ( .entry .tramp .rodata )
2017-11-14 14:07:40 +00:00
# else
# define T R A M P _ T E X T
# endif
2022-10-27 17:59:06 +02:00
# ifdef C O N F I G _ U N W I N D _ T A B L E S
# define U N W I N D _ D A T A _ S E C T I O N S \
.eh_frame : { \
_ _ eh_ f r a m e _ s t a r t = . ; \
* ( .eh_frame ) \
_ _ eh_ f r a m e _ e n d = . ; \
}
# else
# define U N W I N D _ D A T A _ S E C T I O N S
# endif
2014-10-10 18:42:55 +02:00
/ *
* The s i z e o f t h e P E / C O F F s e c t i o n t h a t c o v e r s t h e k e r n e l i m a g e , w h i c h
2020-03-26 18:14:23 +01:00
* runs f r o m _ s t e x t t o _ e d a t a , m u s t b e a r o u n d m u l t i p l e o f t h e P E / C O F F
* FileAlignment, w h i c h w e s e t t o i t s m i n i m u m v a l u e o f 0 x20 0 . ' _ s t e x t '
2014-10-10 18:42:55 +02:00
* itself i s 4 K B a l i g n e d , s o p a d d i n g o u t _ e d a t a t o a 0 x20 0 a l i g n e d
* boundary s h o u l d b e s u f f i c i e n t .
* /
PECOFF_ F I L E _ A L I G N M E N T = 0 x20 0 ;
# ifdef C O N F I G _ E F I
# define P E C O F F _ E D A T A _ P A D D I N G \
.pecoff_edata_padding : { BYTE( 0 ) ; . = ALIGN(PECOFF_FILE_ALIGNMENT); }
# else
# define P E C O F F _ E D A T A _ P A D D I N G
# endif
2012-04-20 14:45:54 +01:00
SECTIONS
{
/ *
* XXX : The l i n k e r d o e s n o t d e f i n e h o w o u t p u t s e c t i o n s a r e
* assigned t o i n p u t s e c t i o n s w h e n t h e r e a r e m u l t i p l e s t a t e m e n t s
* matching t h e s a m e i n p u t s e c t i o n n a m e . T h e r e i s n o d o c u m e n t e d
* order o f m a t c h i n g .
* /
2020-08-21 12:42:52 -07:00
DISCARDS
2012-04-20 14:45:54 +01:00
/ DISCARD/ : {
2016-01-26 09:13:44 +01:00
* ( .interp .dynamic )
arm64: relocatable: fix inconsistencies in linker script and options
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):
readelf: Warning: [21]: Link field (0) should index a symtab section.
readelf: Warning: [21]: Info field (0) should index a relocatable section.
Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.
- Drop the -pie linker argument and just leave -shared. In ld.bfd,
the differences between them are unclear (except for the ELF type
of the produced image [0]) but lld chokes on seeing both at the
same time.
- Rename the .rela output section to .rela.dyn, as is customary for
shared libraries and PIE executables, so that it is not misidentified
by readelf as a static relocation section (producing the warnings
above).
- Pass the -z notext and -z norelro options to explicitly instruct the
linker to permit text relocations, and to omit the RELRO program
header (which requires a certain section layout that we don't adhere
to in the kernel). These are the defaults for current versions of
ld.bfd.
- Discard .eh_frame and .gnu.hash sections to avoid them from being
emitted between .head.text and .text, screwing up the section layout.
These changes only affect the ELF image, and produce the same binary
image.
[0] b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-03 20:58:05 +01:00
* ( .dynsym .dynstr .hash .gnu .hash )
2012-04-20 14:45:54 +01:00
}
2020-08-25 15:54:40 +02:00
. = KIMAGE_ V A D D R ;
2012-04-20 14:45:54 +01:00
.head .text : {
_ text = . ;
HEAD_ T E X T
}
arm64: omit [_text, _stext) from permanent kernel mapping
In a previous patch, we increased the size of the EFI PE/COFF header
to 64 KB, which resulted in the _stext symbol to appear at a fixed
offset of 64 KB into the image.
Since 64 KB is also the largest page size we support, this completely
removes the need to map the first 64 KB of the kernel image, given that
it only contains the arm64 Image header and the EFI header, neither of
which we ever access again after booting the kernel. More importantly,
we should avoid an executable mapping of non-executable and not entirely
predictable data, to deal with the unlikely event that we inadvertently
emitted something that looks like an opcode that could be used as a
gadget for speculative execution.
So let's limit the kernel mapping of .text to the [_stext, _etext)
region, which matches the view of generic code (such as kallsyms) when
it reasons about the boundaries of the kernel's .text section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-17 13:47:27 +01:00
.text : ALIGN( S E G M E N T _ A L I G N ) { / * R e a l t e x t s e g m e n t * /
2012-04-20 14:45:54 +01:00
_ stext = . ; /* Text and read-only data */
IRQENTRY_ T E X T
2016-03-25 14:22:05 -07:00
SOFTIRQENTRY_ T E X T
2016-07-08 12:35:50 -04:00
ENTRY_ T E X T
2012-04-20 14:45:54 +01:00
TEXT_ T E X T
SCHED_ T E X T
LOCK_ T E X T
arm64: Kprobes with single stepping support
Add support for basic kernel probes(kprobes) and jump probes
(jprobes) for ARM64.
Kprobes utilizes software breakpoint and single step debug
exceptions supported on ARM v8.
A software breakpoint is placed at the probe address to trap the
kernel execution into the kprobe handler.
ARM v8 supports enabling single stepping before the break exception
return (ERET), with next PC in exception return address (ELR_EL1). The
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed, and
enables single stepping. The PC is set to the out-of-line slot address
before the ERET. With this scheme, the instruction is executed with the
exact same register context except for the PC (and DAIF) registers.
Debug mask (PSTATE.D) is enabled only when single stepping a recursive
kprobe, e.g.: during kprobes reenter so that probed instruction can be
single stepped within the kprobe handler -exception- context.
The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
any further re-entry is prevented by not calling handlers and the case
counted as a missed kprobe).
Single stepping from the x-o-l slot has a drawback for PC-relative accesses
like branching and symbolic literals access as the offset from the new PC
(slot address) may not be ensured to fit in the immediate value of
the opcode. Such instructions need simulation, so reject
probing them.
Instructions generating exceptions or cpu mode change are rejected
for probing.
Exclusive load/store instructions are rejected too. Additionally, the
code is checked to see if it is inside an exclusive load/store sequence
(code from Pratyush).
System instructions are mostly enabled for stepping, except MSR/MRS
accesses to "DAIF" flags in PSTATE, which are not safe for
probing.
This also changes arch/arm64/include/asm/ptrace.h to use
include/asm-generic/ptrace.h.
Thanks to Steve Capper and Pratyush Anand for several suggested
Changes.
Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-08 12:35:48 -04:00
KPROBES_ T E X T
2012-12-07 18:40:43 +00:00
HYPERVISOR_ T E X T
2012-04-20 14:45:54 +01:00
* ( .gnu .warning )
}
arm64: simplify kernel segment mapping granularity
The mapping of the kernel consist of four segments, each of which is mapped
with different permission attributes and/or lifetimes. To optimize the TLB
and translation table footprint, we define various opaque constants in the
linker script that resolve to different aligment values depending on the
page size and whether CONFIG_DEBUG_ALIGN_RODATA is set.
Considering that
- a 4 KB granule kernel benefits from a 64 KB segment alignment (due to
the fact that it allows the use of the contiguous bit),
- the minimum alignment of the .data segment is THREAD_SIZE already, not
PAGE_SIZE (i.e., we already have padding between _data and the start of
the .data payload in many cases),
- 2 MB is a suitable alignment value on all granule sizes, either for
mapping directly (level 2 on 4 KB), or via the contiguous bit (level 3 on
16 KB and 64 KB),
- anything beyond 2 MB exceeds the minimum alignment mandated by the boot
protocol, and can only be mapped efficiently if the physical alignment
happens to be the same,
we can simplify this by standardizing on 64 KB (or 2 MB) explicitly, i.e.,
regardless of granule size, all segments are aligned either to 64 KB, or to
2 MB if CONFIG_DEBUG_ALIGN_RODATA=y. This also means we can drop the Kconfig
dependency of CONFIG_DEBUG_ALIGN_RODATA on CONFIG_ARM64_4K_PAGES.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-30 17:43:09 +02:00
. = ALIGN( S E G M E N T _ A L I G N ) ;
arm64: mm: fix location of _etext
As Kees Cook notes in the ARM counterpart of this patch [0]:
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext.
In particular, Kees is referring to the HARDENED_USERCOPY patch set [1],
which rejects attempts to call copy_to_user() on kernel ranges containing
executable code, but does allow access to the .rodata segment. Regardless
of whether one may or may not agree with the distinction, it makes sense
for _etext to have the same meaning across architectures.
So let's put _etext where it belongs, between .text and .rodata, and fix
up existing references to use __init_begin instead, which unlike _end_rodata
includes the exception and notes sections as well.
The _etext references in kaslr.c are left untouched, since its references
to [_stext, _etext) are meant to capture potential jump instruction targets,
and so disregarding .rodata is actually an improvement here.
[0] http://article.gmane.org/gmane.linux.kernel/2245084
[1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-23 15:53:17 +02:00
_ etext = . ; /* End of text section */
2019-10-29 14:13:40 -07:00
/* everything from this point to __init_begin will be marked RO NX */
RO_ D A T A ( P A G E _ S I Z E )
2012-04-20 14:45:54 +01:00
2021-08-02 13:38:29 +01:00
HYPERVISOR_ D A T A _ S E C T I O N S
arm64: lds: move .got section out of .text
Currently, the .got section is placed within the output section .text.
However, when .got is non-empty, the SHF_WRITE flag is set for .text
when linked by lld. GNU ld recognizes .text as a special section and
ignores the SHF_WRITE flag. By renaming .text, we can also get the
SHF_WRITE flag.
The kernel has performed R_AARCH64_RELATIVE resolving very early, and can
then assume that .got is read-only. Let's move .got to the vmlinux_rodata
pseudo-segment.
As Ard Biesheuvel notes:
"This matters to consumers of the vmlinux ELF representation of the
kernel image, such as syzkaller, which disregards writable PT_LOAD
segments when resolving code symbols. The kernel itself does not care
about this distinction, but given that the GOT contains data and not
code, it does not require executable permissions, and therefore does
not belong in .text to begin with."
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20230502074105.1541926-1-maskray@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-05-02 07:41:05 +00:00
.got : { * ( .got ) }
/ *
* Make s u r e t h a t t h e . g o t . p l t i s e i t h e r c o m p l e t e l y e m p t y o r i t
* contains o n l y t h e l a z y d i s p a t c h e n t r i e s .
* /
.got .plt : { * ( .got .plt ) }
ASSERT( S I Z E O F ( . g o t . p l t ) = = 0 | | S I Z E O F ( . g o t . p l t ) = = 0 x18 ,
" Unexpected G O T / P L T e n t r i e s d e t e c t e d ! " )
2022-04-29 15:13:46 +02:00
/* code sections that are never executed via the kernel mapping */
.rodata .text : {
TRAMP_ T E X T
HIBERNATE_ T E X T
KEXEC_ T E X T
2023-01-11 11:22:32 +01:00
IDMAP_ T E X T
2022-04-29 15:13:46 +02:00
. = ALIGN( P A G E _ S I Z E ) ;
}
2018-09-24 17:56:18 +01:00
idmap_ p g _ d i r = . ;
2022-06-24 17:06:42 +02:00
. + = PAGE_ S I Z E ;
2018-09-24 17:56:18 +01:00
# ifdef C O N F I G _ U N M A P _ K E R N E L _ A T _ E L 0
tramp_ p g _ d i r = . ;
. + = PAGE_ S I Z E ;
# endif
2020-11-03 10:22:29 +00:00
reserved_ p g _ d i r = . ;
. + = PAGE_ S I Z E ;
2018-09-24 17:56:18 +01:00
swapper_ p g _ d i r = . ;
. + = PAGE_ S I Z E ;
arm64: simplify kernel segment mapping granularity
The mapping of the kernel consist of four segments, each of which is mapped
with different permission attributes and/or lifetimes. To optimize the TLB
and translation table footprint, we define various opaque constants in the
linker script that resolve to different aligment values depending on the
page size and whether CONFIG_DEBUG_ALIGN_RODATA is set.
Considering that
- a 4 KB granule kernel benefits from a 64 KB segment alignment (due to
the fact that it allows the use of the contiguous bit),
- the minimum alignment of the .data segment is THREAD_SIZE already, not
PAGE_SIZE (i.e., we already have padding between _data and the start of
the .data payload in many cases),
- 2 MB is a suitable alignment value on all granule sizes, either for
mapping directly (level 2 on 4 KB), or via the contiguous bit (level 3 on
16 KB and 64 KB),
- anything beyond 2 MB exceeds the minimum alignment mandated by the boot
protocol, and can only be mapped efficiently if the physical alignment
happens to be the same,
we can simplify this by standardizing on 64 KB (or 2 MB) explicitly, i.e.,
regardless of granule size, all segments are aligned either to 64 KB, or to
2 MB if CONFIG_DEBUG_ALIGN_RODATA=y. This also means we can drop the Kconfig
dependency of CONFIG_DEBUG_ALIGN_RODATA on CONFIG_ARM64_4K_PAGES.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-30 17:43:09 +02:00
. = ALIGN( S E G M E N T _ A L I G N ) ;
2012-04-20 14:45:54 +01:00
_ _ init_ b e g i n = . ;
2017-03-09 21:52:03 +01:00
_ _ inittext_ b e g i n = . ;
2012-04-20 14:45:54 +01:00
INIT_ T E X T _ S E C T I O N ( 8 )
arm64: insn: consistently handle exit text
A kernel built with KASAN && FTRACE_WITH_REGS && !MODULES, produces a
boot-time splat in the bowels of ftrace:
| [ 0.000000] ftrace: allocating 32281 entries in 127 pages
| [ 0.000000] ------------[ cut here ]------------
| [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2019 ftrace_bug+0x27c/0x328
| [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00008-g7f08ae53a7e3 #13
| [ 0.000000] Hardware name: linux,dummy-virt (DT)
| [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
| [ 0.000000] pc : ftrace_bug+0x27c/0x328
| [ 0.000000] lr : ftrace_init+0x640/0x6cc
| [ 0.000000] sp : ffffa000120e7e00
| [ 0.000000] x29: ffffa000120e7e00 x28: ffff00006ac01b10
| [ 0.000000] x27: ffff00006ac898c0 x26: dfffa00000000000
| [ 0.000000] x25: ffffa000120ef290 x24: ffffa0001216df40
| [ 0.000000] x23: 000000000000018d x22: ffffa0001244c700
| [ 0.000000] x21: ffffa00011bf393c x20: ffff00006ac898c0
| [ 0.000000] x19: 00000000ffffffff x18: 0000000000001584
| [ 0.000000] x17: 0000000000001540 x16: 0000000000000007
| [ 0.000000] x15: 0000000000000000 x14: ffffa00010432770
| [ 0.000000] x13: ffff940002483519 x12: 1ffff40002483518
| [ 0.000000] x11: 1ffff40002483518 x10: ffff940002483518
| [ 0.000000] x9 : dfffa00000000000 x8 : 0000000000000001
| [ 0.000000] x7 : ffff940002483519 x6 : ffffa0001241a8c0
| [ 0.000000] x5 : ffff940002483519 x4 : ffff940002483519
| [ 0.000000] x3 : ffffa00011780870 x2 : 0000000000000001
| [ 0.000000] x1 : 1fffe0000d591318 x0 : 0000000000000000
| [ 0.000000] Call trace:
| [ 0.000000] ftrace_bug+0x27c/0x328
| [ 0.000000] ftrace_init+0x640/0x6cc
| [ 0.000000] start_kernel+0x27c/0x654
| [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x30/0x60 with crng_init=0
| [ 0.000000] ---[ end trace 0000000000000000 ]---
| [ 0.000000] ftrace faulted on writing
| [ 0.000000] [<ffffa00011bf393c>] _GLOBAL__sub_D_65535_0___tracepoint_initcall_level+0x4/0x28
| [ 0.000000] Initializing ftrace call sites
| [ 0.000000] ftrace record flags: 0
| [ 0.000000] (0)
| [ 0.000000] expected tramp: ffffa000100b3344
This is due to an unfortunate combination of several factors.
Building with KASAN results in the compiler generating anonymous
functions to register/unregister global variables against the shadow
memory. These functions are placed in .text.startup/.text.exit, and
given mangled names like _GLOBAL__sub_{I,D}_65535_0_$OTHER_SYMBOL. The
kernel linker script places these in .init.text and .exit.text
respectively, which are both discarded at runtime as part of initmem.
Building with FTRACE_WITH_REGS uses -fpatchable-function-entry=2, which
also instruments KASAN's anonymous functions. When these are discarded
with the rest of initmem, ftrace removes dangling references to these
call sites.
Building without MODULES implicitly disables STRICT_MODULE_RWX, and
causes arm64's patch_map() function to treat any !core_kernel_text()
symbol as something that can be modified in-place. As core_kernel_text()
is only true for .text and .init.text, with the latter depending on
system_state < SYSTEM_RUNNING, we'll treat .exit.text as something that
can be patched in-place. However, .exit.text is mapped read-only.
Hence in this configuration the ftrace init code blows up while trying
to patch one of the functions generated by KASAN.
We could try to filter out the call sites in .exit.text rather than
initializing them, but this would be inconsistent with how we handle
.init.text, and requires hooking into core bits of ftrace. The behaviour
of patch_map() is also inconsistent today, so instead let's clean that
up and have it consistently handle .exit.text.
This patch teaches patch_map() to handle .exit.text at init time,
preventing the boot-time splat above. The flow of patch_map() is
reworked to make the logic clearer and minimize redundant
conditionality.
Fixes: 3b23e4991fb66f6d ("arm64: implement ftrace with regs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-12-02 16:11:07 +00:00
_ _ exittext_ b e g i n = . ;
2012-04-20 14:45:54 +01:00
.exit .text : {
2020-04-16 15:27:30 +02:00
EXIT_ T E X T
2012-04-20 14:45:54 +01:00
}
arm64: insn: consistently handle exit text
A kernel built with KASAN && FTRACE_WITH_REGS && !MODULES, produces a
boot-time splat in the bowels of ftrace:
| [ 0.000000] ftrace: allocating 32281 entries in 127 pages
| [ 0.000000] ------------[ cut here ]------------
| [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2019 ftrace_bug+0x27c/0x328
| [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00008-g7f08ae53a7e3 #13
| [ 0.000000] Hardware name: linux,dummy-virt (DT)
| [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
| [ 0.000000] pc : ftrace_bug+0x27c/0x328
| [ 0.000000] lr : ftrace_init+0x640/0x6cc
| [ 0.000000] sp : ffffa000120e7e00
| [ 0.000000] x29: ffffa000120e7e00 x28: ffff00006ac01b10
| [ 0.000000] x27: ffff00006ac898c0 x26: dfffa00000000000
| [ 0.000000] x25: ffffa000120ef290 x24: ffffa0001216df40
| [ 0.000000] x23: 000000000000018d x22: ffffa0001244c700
| [ 0.000000] x21: ffffa00011bf393c x20: ffff00006ac898c0
| [ 0.000000] x19: 00000000ffffffff x18: 0000000000001584
| [ 0.000000] x17: 0000000000001540 x16: 0000000000000007
| [ 0.000000] x15: 0000000000000000 x14: ffffa00010432770
| [ 0.000000] x13: ffff940002483519 x12: 1ffff40002483518
| [ 0.000000] x11: 1ffff40002483518 x10: ffff940002483518
| [ 0.000000] x9 : dfffa00000000000 x8 : 0000000000000001
| [ 0.000000] x7 : ffff940002483519 x6 : ffffa0001241a8c0
| [ 0.000000] x5 : ffff940002483519 x4 : ffff940002483519
| [ 0.000000] x3 : ffffa00011780870 x2 : 0000000000000001
| [ 0.000000] x1 : 1fffe0000d591318 x0 : 0000000000000000
| [ 0.000000] Call trace:
| [ 0.000000] ftrace_bug+0x27c/0x328
| [ 0.000000] ftrace_init+0x640/0x6cc
| [ 0.000000] start_kernel+0x27c/0x654
| [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x30/0x60 with crng_init=0
| [ 0.000000] ---[ end trace 0000000000000000 ]---
| [ 0.000000] ftrace faulted on writing
| [ 0.000000] [<ffffa00011bf393c>] _GLOBAL__sub_D_65535_0___tracepoint_initcall_level+0x4/0x28
| [ 0.000000] Initializing ftrace call sites
| [ 0.000000] ftrace record flags: 0
| [ 0.000000] (0)
| [ 0.000000] expected tramp: ffffa000100b3344
This is due to an unfortunate combination of several factors.
Building with KASAN results in the compiler generating anonymous
functions to register/unregister global variables against the shadow
memory. These functions are placed in .text.startup/.text.exit, and
given mangled names like _GLOBAL__sub_{I,D}_65535_0_$OTHER_SYMBOL. The
kernel linker script places these in .init.text and .exit.text
respectively, which are both discarded at runtime as part of initmem.
Building with FTRACE_WITH_REGS uses -fpatchable-function-entry=2, which
also instruments KASAN's anonymous functions. When these are discarded
with the rest of initmem, ftrace removes dangling references to these
call sites.
Building without MODULES implicitly disables STRICT_MODULE_RWX, and
causes arm64's patch_map() function to treat any !core_kernel_text()
symbol as something that can be modified in-place. As core_kernel_text()
is only true for .text and .init.text, with the latter depending on
system_state < SYSTEM_RUNNING, we'll treat .exit.text as something that
can be patched in-place. However, .exit.text is mapped read-only.
Hence in this configuration the ftrace init code blows up while trying
to patch one of the functions generated by KASAN.
We could try to filter out the call sites in .exit.text rather than
initializing them, but this would be inconsistent with how we handle
.init.text, and requires hooking into core bits of ftrace. The behaviour
of patch_map() is also inconsistent today, so instead let's clean that
up and have it consistently handle .exit.text.
This patch teaches patch_map() to handle .exit.text at init time,
preventing the boot-time splat above. The flow of patch_map() is
reworked to make the logic clearer and minimize redundant
conditionality.
Fixes: 3b23e4991fb66f6d ("arm64: implement ftrace with regs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-12-02 16:11:07 +00:00
_ _ exittext_ e n d = . ;
2015-01-21 17:36:06 -08:00
2017-03-09 21:52:03 +01:00
. = ALIGN( 4 ) ;
.altinstructions : {
_ _ alt_ i n s t r u c t i o n s = . ;
* ( .altinstructions )
_ _ alt_ i n s t r u c t i o n s _ e n d = . ;
}
2022-10-27 17:59:06 +02:00
UNWIND_ D A T A _ S E C T I O N S
2020-04-13 17:55:18 +02:00
. = ALIGN( S E G M E N T _ A L I G N ) ;
2017-03-09 21:52:03 +01:00
_ _ inittext_ e n d = . ;
_ _ initdata_ b e g i n = . ;
2022-06-24 17:06:42 +02:00
init_ i d m a p _ p g _ d i r = . ;
. + = INIT_ I D M A P _ D I R _ S I Z E ;
init_ i d m a p _ p g _ e n d = . ;
2012-04-20 14:45:54 +01:00
.init .data : {
INIT_ D A T A
INIT_ S E T U P ( 1 6 )
INIT_ C A L L S
CON_ I N I T C A L L
INIT_ R A M _ F S
2020-12-09 18:04:48 +00:00
* ( .init .altinstructions .init .bss ) /* from the EFI stub */
2012-04-20 14:45:54 +01:00
}
.exit .data : {
2020-04-16 15:27:30 +02:00
EXIT_ D A T A
2012-04-20 14:45:54 +01:00
}
2015-12-01 13:20:40 +01:00
PERCPU_ S E C T I O N ( L 1 _ C A C H E _ B Y T E S )
2020-09-22 21:49:09 +01:00
HYPERVISOR_ P E R C P U _ S E C T I O N
2012-04-20 14:45:54 +01:00
2021-01-05 18:05:37 +00:00
HYPERVISOR_ R E L O C _ S E C T I O N
arm64: relocatable: fix inconsistencies in linker script and options
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):
readelf: Warning: [21]: Link field (0) should index a symtab section.
readelf: Warning: [21]: Info field (0) should index a relocatable section.
Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.
- Drop the -pie linker argument and just leave -shared. In ld.bfd,
the differences between them are unclear (except for the ELF type
of the produced image [0]) but lld chokes on seeing both at the
same time.
- Rename the .rela output section to .rela.dyn, as is customary for
shared libraries and PIE executables, so that it is not misidentified
by readelf as a static relocation section (producing the warnings
above).
- Pass the -z notext and -z norelro options to explicitly instruct the
linker to permit text relocations, and to omit the RELRO program
header (which requires a certain section layout that we don't adhere
to in the kernel). These are the defaults for current versions of
ld.bfd.
- Discard .eh_frame and .gnu.hash sections to avoid them from being
emitted between .head.text and .text, screwing up the section layout.
These changes only affect the ELF image, and produce the same binary
image.
[0] b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-03 20:58:05 +01:00
.rela .dyn : ALIGN( 8 ) {
2022-06-24 17:06:43 +02:00
_ _ rela_ s t a r t = . ;
2016-01-26 09:13:44 +01:00
* ( .rela .rela * )
2022-06-24 17:06:43 +02:00
_ _ rela_ e n d = . ;
2016-01-26 09:13:44 +01:00
}
2014-11-14 15:54:08 +00:00
2019-07-31 18:18:42 -07:00
.relr .dyn : ALIGN( 8 ) {
2022-06-24 17:06:43 +02:00
_ _ relr_ s t a r t = . ;
2019-07-31 18:18:42 -07:00
* ( .relr .dyn )
2022-06-24 17:06:43 +02:00
_ _ relr_ e n d = . ;
2019-07-31 18:18:42 -07:00
}
arm64: simplify kernel segment mapping granularity
The mapping of the kernel consist of four segments, each of which is mapped
with different permission attributes and/or lifetimes. To optimize the TLB
and translation table footprint, we define various opaque constants in the
linker script that resolve to different aligment values depending on the
page size and whether CONFIG_DEBUG_ALIGN_RODATA is set.
Considering that
- a 4 KB granule kernel benefits from a 64 KB segment alignment (due to
the fact that it allows the use of the contiguous bit),
- the minimum alignment of the .data segment is THREAD_SIZE already, not
PAGE_SIZE (i.e., we already have padding between _data and the start of
the .data payload in many cases),
- 2 MB is a suitable alignment value on all granule sizes, either for
mapping directly (level 2 on 4 KB), or via the contiguous bit (level 3 on
16 KB and 64 KB),
- anything beyond 2 MB exceeds the minimum alignment mandated by the boot
protocol, and can only be mapped efficiently if the physical alignment
happens to be the same,
we can simplify this by standardizing on 64 KB (or 2 MB) explicitly, i.e.,
regardless of granule size, all segments are aligned either to 64 KB, or to
2 MB if CONFIG_DEBUG_ALIGN_RODATA=y. This also means we can drop the Kconfig
dependency of CONFIG_DEBUG_ALIGN_RODATA on CONFIG_ARM64_4K_PAGES.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-30 17:43:09 +02:00
. = ALIGN( S E G M E N T _ A L I G N ) ;
2017-03-09 21:52:03 +01:00
_ _ initdata_ e n d = . ;
2015-12-09 12:44:38 +00:00
_ _ init_ e n d = . ;
2013-11-04 16:38:47 +00:00
_ data = . ;
_ sdata = . ;
2019-10-29 14:13:35 -07:00
RW_ D A T A ( L 1 _ C A C H E _ B Y T E S , P A G E _ S I Z E , T H R E A D _ A L I G N )
2016-08-24 18:27:29 +01:00
/ *
* Data w r i t t e n w i t h t h e M M U o f f b u t r e a d w i t h t h e M M U o n r e q u i r e s
* cache l i n e s t o b e i n v a l i d a t e d , d i s c a r d i n g u p t o a C a c h e W r i t e b a c k
* Granule ( C W G ) o f d a t a f r o m t h e c a c h e . K e e p t h e s e c t i o n t h a t
* requires t h i s t y p e o f m a i n t e n a n c e t o b e i n i t s o w n C a c h e W r i t e b a c k
* Granule ( C W G ) a r e a s o t h e c a c h e m a i n t e n a n c e o p e r a t i o n s d o n ' t
* interfere w i t h a d j a c e n t d a t a .
* /
.mmuoff .data .write : ALIGN( S Z _ 2 K ) {
_ _ mmuoff_ d a t a _ s t a r t = . ;
* ( .mmuoff .data .write )
}
. = ALIGN( S Z _ 2 K ) ;
.mmuoff .data .read : {
* ( .mmuoff .data .read )
_ _ mmuoff_ d a t a _ e n d = . ;
}
2014-10-10 18:42:55 +02:00
PECOFF_ E D A T A _ P A D D I N G
2017-03-23 19:00:51 +00:00
_ _ pecoff_ d a t a _ r a w s i z e = A B S O L U T E ( . - _ _ i n i t d a t a _ b e g i n ) ;
2013-11-04 16:38:47 +00:00
_ edata = . ;
2012-04-20 14:45:54 +01:00
2021-03-19 10:01:15 +00:00
BSS_ S E C T I O N ( S B S S _ A L I G N , 0 , 0 )
2014-06-24 16:51:35 +01:00
. = ALIGN( P A G E _ S I Z E ) ;
arm64/mm: Separate boot-time page tables from swapper_pg_dir
Since the address of swapper_pg_dir is fixed for a given kernel image,
it is an attractive target for manipulation via an arbitrary write. To
mitigate this we'd like to make it read-only by moving it into the
rodata section.
We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
and reserved_ttbr0, so these will also need to move into rodata.
However, swapper_pg_dir is allocated along with some transient page
tables used for boot which we do not want to move into rodata.
As a step towards this, this patch separates the boot-time page tables
into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
needs to be. This allows us to retain the relationship between
swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
separating these from the boot-time page tables.
The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
boot, and all of these levels will be freed when we switch to the
swapper_pg_dir, which is initialized by the existing code in
paging_init(). Since we start off on the init_pg_dir, we no longer need
to allocate a transient page table in paging_init() in order to ensure
that swapper_pg_dir isn't live while we initialize it.
There should be no functional change as a result of this patch.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: place init_pg_dir after BSS, fold mm changes, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-24 15:47:49 +01:00
init_ p g _ d i r = . ;
. + = INIT_ D I R _ S I Z E ;
init_ p g _ e n d = . ;
2020-04-13 17:55:18 +02:00
. = ALIGN( S E G M E N T _ A L I G N ) ;
2017-03-23 19:00:51 +00:00
_ _ pecoff_ d a t a _ s i z e = A B S O L U T E ( . - _ _ i n i t d a t a _ b e g i n ) ;
2012-04-20 14:45:54 +01:00
_ end = . ;
STABS_ D E B U G
2020-08-21 12:42:53 -07:00
DWARF_ D E B U G
2020-08-21 12:42:45 -07:00
ELF_ D E T A I L S
arm64: Update the Image header
Currently the kernel Image is stripped of everything past the initial
stack, and at runtime the memory is initialised and used by the kernel.
This makes the effective minimum memory footprint of the kernel larger
than the size of the loaded binary, though bootloaders have no mechanism
to identify how large this minimum memory footprint is. This makes it
difficult to choose safe locations to place both the kernel and other
binaries required at boot (DTB, initrd, etc), such that the kernel won't
clobber said binaries or other reserved memory during initialisation.
Additionally when big endian support was added the image load offset was
overlooked, and is currently of an arbitrary endianness, which makes it
difficult for bootloaders to make use of it. It seems that bootloaders
aren't respecting the image load offset at present anyway, and are
assuming that offset 0x80000 will always be correct.
This patch adds an effective image size to the kernel header which
describes the amount of memory from the start of the kernel Image binary
which the kernel expects to use before detecting memory and handling any
memory reservations. This can be used by bootloaders to choose suitable
locations to load the kernel and/or other binaries such that the kernel
will not clobber any memory unexpectedly. As before, memory reservations
are required to prevent the kernel from clobbering these locations
later.
Both the image load offset and the effective image size are forced to be
little-endian regardless of the native endianness of the kernel to
enable bootloaders to load a kernel of arbitrary endianness. Bootloaders
which wish to make use of the load offset can inspect the effective
image size field for a non-zero value to determine if the offset is of a
known endianness. To enable software to determine the endinanness of the
kernel as may be required for certain use-cases, a new flags field (also
little-endian) is added to the kernel header to export this information.
The documentation is updated to clarify these details. To discourage
future assumptions regarding the value of text_offset, the value at this
point in time is removed from the main flow of the documentation (though
kept as a compatibility note). Some minor formatting issues in the
documentation are also corrected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Tom Rini <trini@ti.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Kevin Hilman <kevin.hilman@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-06-24 16:51:36 +01:00
HEAD_ S Y M B O L S
2020-08-21 12:42:54 -07:00
/ *
* Sections t h a t s h o u l d s t a y z e r o s i z e d , w h i c h i s s a f e r t o
* explicitly c h e c k i n s t e a d o f b l i n d l y d i s c a r d i n g .
* /
.plt : {
2020-10-28 14:33:32 +01:00
* ( .plt ) * ( .plt . * ) * ( .iplt ) * ( .igot .igot .plt )
2020-08-21 12:42:54 -07:00
}
ASSERT( S I Z E O F ( . p l t ) = = 0 , " U n e x p e c t e d r u n - t i m e p r o c e d u r e l i n k a g e s d e t e c t e d ! " )
.data .rel .ro : { * ( .data .rel .ro ) }
ASSERT( S I Z E O F ( . d a t a . r e l . r o ) = = 0 , " U n e x p e c t e d R E L R O d e t e c t e d ! " )
2012-04-20 14:45:54 +01:00
}
2012-12-07 18:40:43 +00:00
2019-08-13 16:04:50 -07:00
# include " i m a g e - v a r s . h "
2012-12-07 18:40:43 +00:00
/ *
2021-03-19 10:01:44 +00:00
* The H Y P i n i t c o d e a n d I D m a p t e x t c a n ' t b e l o n g e r t h a n a p a g e e a c h . T h e
* former i s p a g e - a l i g n e d , b u t t h e l a t t e r m a y n o t b e w i t h 1 6 K o r 6 4 K p a g e s , s o
* it s h o u l d a l s o n o t c r o s s a p a g e b o u n d a r y .
2012-12-07 18:40:43 +00:00
* /
2021-03-19 10:01:44 +00:00
ASSERT( _ _ h y p _ i d m a p _ t e x t _ e n d - _ _ h y p _ i d m a p _ t e x t _ s t a r t < = P A G E _ S I Z E ,
" HYP i n i t c o d e t o o b i g " )
2015-06-01 13:40:33 +02:00
ASSERT( _ _ i d m a p _ t e x t _ e n d - ( _ _ i d m a p _ t e x t _ s t a r t & ~ ( S Z _ 4 K - 1 ) ) < = S Z _ 4 K ,
" ID m a p t e x t t o o b i g o r m i s a l i g n e d " )
2016-04-27 17:47:12 +01:00
# ifdef C O N F I G _ H I B E R N A T I O N
2022-04-29 15:13:46 +02:00
ASSERT( _ _ h i b e r n a t e _ e x i t _ t e x t _ e n d - _ _ h i b e r n a t e _ e x i t _ t e x t _ s t a r t < = S Z _ 4 K ,
" Hibernate e x i t t e x t i s b i g g e r t h a n 4 K i B " )
arm64: avoid executing padding bytes during kexec / hibernation
Currently we rely on the HIBERNATE_TEXT section starting with the entry
point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting
with the entry point to arm64_relocate_new_kernel. In both cases we copy
the entire section into a dynamically-allocated page, and then later
branch to the start of this page.
SYM_FUNC_START() will align the function entry points to
CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the
assembled code it will place padding bytes before the function entry
point if the location counter was not already sufficiently aligned. The
linker happens to use the value zero for these padding bytes.
This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT
is greater than 4, which can be the case with
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y.
When such padding is applied, attempting to kexec or resume from
hibernate will result ina crash: the kernel will branch to the padding
bytes as the start of the dynamically-allocated page, and as those bytes
are zero they will decode as UDF #0, which reliably triggers an
UNDEFINED exception. For example:
| # ./kexec --reuse-cmdline -f Image
| [ 46.965800] kexec_core: Starting new kernel
| [ 47.143641] psci: CPU1 killed (polled 0 ms)
| [ 47.233653] psci: CPU2 killed (polled 0 ms)
| [ 47.323465] psci: CPU3 killed (polled 0 ms)
| [ 47.324776] Bye!
| [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| [ 47.328510] Modules linked in:
| [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3
| [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [ 47.332782] pc : 0x43a95000
| [ 47.333338] lr : machine_kexec+0x190/0x1e0
| [ 47.334169] sp : ffff80000d293b70
| [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000
| [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010
| [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006
| [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658
| [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0
| [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00
| [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008
| [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000
| [ 47.349334] Call trace:
| [ 47.349834] 0x43a95000
| [ 47.350338] kernel_kexec+0x88/0x100
| [ 47.351070] __do_sys_reboot+0x108/0x268
| [ 47.351873] __arm64_sys_reboot+0x2c/0x40
| [ 47.352689] invoke_syscall+0x78/0x108
| [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100
| [ 47.354426] do_el0_svc+0x34/0x50
| [ 47.355102] el0_svc+0x34/0x108
| [ 47.355747] el0t_64_sync_handler+0xf4/0x120
| [ 47.356617] el0t_64_sync+0x194/0x198
| [ 47.357374] Code: bad PC value
| [ 47.357999] ---[ end trace 0000000000000000 ]---
| [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| [ 47.360515] Kernel Offset: disabled
| [ 47.361230] CPU features: 0x002000,00050108,c8004203
| [ 47.362232] Memory Limit: none
Note: Unfortunately the code dump reports "bad PC value" as it attempts
to dump some instructions prior to the UDF (i.e. before the start of the
page), and terminates early upon a fault, obscuring the problem.
This patch fixes this issue by aligning the section starter markes to
CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which
ensures that the linker never needs to place padding bytes within the
section. Assertions are added to verify each section begins with the
function we expect, making our implicit requirement explicit.
In future it might be nice to rework the kexec and hibernation code to
decouple the section start from the entry point, but that involves much
more significant changes that come with a higher risk of error, so I've
tried to keep this fix as simple as possible for now.
Fixes: 47a15aa54427 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT")
Reported-by: CKI Project <cki-project@redhat.com>
Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-27 11:08:20 +00:00
ASSERT( _ _ h i b e r n a t e _ e x i t _ t e x t _ s t a r t = = s w s u s p _ a r c h _ s u s p e n d _ e x i t ,
" Hibernate e x i t t e x t d o e s n o t s t a r t w i t h s w s u s p _ a r c h _ s u s p e n d _ e x i t " )
2016-04-27 17:47:12 +01:00
# endif
2017-12-06 11:24:02 +00:00
# ifdef C O N F I G _ U N M A P _ K E R N E L _ A T _ E L 0
2021-11-18 15:04:32 +00:00
ASSERT( ( _ _ e n t r y _ t r a m p _ t e x t _ e n d - _ _ e n t r y _ t r a m p _ t e x t _ s t a r t ) < = 3 * P A G E _ S I Z E ,
2017-12-06 11:24:02 +00:00
" Entry t r a m p o l i n e t e x t t o o b i g " )
# endif
2021-03-19 10:01:15 +00:00
# ifdef C O N F I G _ K V M
ASSERT( _ _ h y p _ b s s _ s t a r t = = _ _ b s s _ s t a r t , " H Y P a n d H o s t B S S a r e m i s a l i g n e d " )
# endif
2014-06-24 16:51:37 +01:00
/ *
* If p a d d i n g i s a p p l i e d b e f o r e . h e a d . t e x t , v i r t < - > p h y s c o n v e r s i o n s w i l l f a i l .
* /
2020-08-25 15:54:40 +02:00
ASSERT( _ t e x t = = K I M A G E _ V A D D R , " H E A D i s m i s a l i g n e d " )
2021-02-02 12:36:57 +00:00
ASSERT( s w a p p e r _ p g _ d i r - r e s e r v e d _ p g _ d i r = = R E S E R V E D _ S W A P P E R _ O F F S E T ,
" RESERVED_ S W A P P E R _ O F F S E T i s w r o n g ! " )
2021-02-02 12:36:58 +00:00
# ifdef C O N F I G _ U N M A P _ K E R N E L _ A T _ E L 0
ASSERT( s w a p p e r _ p g _ d i r - t r a m p _ p g _ d i r = = T R A M P _ S W A P P E R _ O F F S E T ,
" TRAMP_ S W A P P E R _ O F F S E T i s w r o n g ! " )
# endif
2021-09-30 14:31:08 +00:00
# ifdef C O N F I G _ K E X E C _ C O R E
/* kexec relocation code should fit into one KEXEC_CONTROL_PAGE_SIZE */
2022-04-29 15:13:46 +02:00
ASSERT( _ _ r e l o c a t e _ n e w _ k e r n e l _ e n d - _ _ r e l o c a t e _ n e w _ k e r n e l _ s t a r t < = S Z _ 4 K ,
" kexec r e l o c a t i o n c o d e i s b i g g e r t h a n 4 K i B " )
2021-09-30 14:31:08 +00:00
ASSERT( K E X E C _ C O N T R O L _ P A G E _ S I Z E > = S Z _ 4 K , " K E X E C _ C O N T R O L _ P A G E _ S I Z E i s b r o k e n " )
arm64: avoid executing padding bytes during kexec / hibernation
Currently we rely on the HIBERNATE_TEXT section starting with the entry
point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting
with the entry point to arm64_relocate_new_kernel. In both cases we copy
the entire section into a dynamically-allocated page, and then later
branch to the start of this page.
SYM_FUNC_START() will align the function entry points to
CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the
assembled code it will place padding bytes before the function entry
point if the location counter was not already sufficiently aligned. The
linker happens to use the value zero for these padding bytes.
This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT
is greater than 4, which can be the case with
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y.
When such padding is applied, attempting to kexec or resume from
hibernate will result ina crash: the kernel will branch to the padding
bytes as the start of the dynamically-allocated page, and as those bytes
are zero they will decode as UDF #0, which reliably triggers an
UNDEFINED exception. For example:
| # ./kexec --reuse-cmdline -f Image
| [ 46.965800] kexec_core: Starting new kernel
| [ 47.143641] psci: CPU1 killed (polled 0 ms)
| [ 47.233653] psci: CPU2 killed (polled 0 ms)
| [ 47.323465] psci: CPU3 killed (polled 0 ms)
| [ 47.324776] Bye!
| [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| [ 47.328510] Modules linked in:
| [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3
| [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [ 47.332782] pc : 0x43a95000
| [ 47.333338] lr : machine_kexec+0x190/0x1e0
| [ 47.334169] sp : ffff80000d293b70
| [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000
| [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010
| [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006
| [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658
| [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0
| [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00
| [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008
| [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000
| [ 47.349334] Call trace:
| [ 47.349834] 0x43a95000
| [ 47.350338] kernel_kexec+0x88/0x100
| [ 47.351070] __do_sys_reboot+0x108/0x268
| [ 47.351873] __arm64_sys_reboot+0x2c/0x40
| [ 47.352689] invoke_syscall+0x78/0x108
| [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100
| [ 47.354426] do_el0_svc+0x34/0x50
| [ 47.355102] el0_svc+0x34/0x108
| [ 47.355747] el0t_64_sync_handler+0xf4/0x120
| [ 47.356617] el0t_64_sync+0x194/0x198
| [ 47.357374] Code: bad PC value
| [ 47.357999] ---[ end trace 0000000000000000 ]---
| [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| [ 47.360515] Kernel Offset: disabled
| [ 47.361230] CPU features: 0x002000,00050108,c8004203
| [ 47.362232] Memory Limit: none
Note: Unfortunately the code dump reports "bad PC value" as it attempts
to dump some instructions prior to the UDF (i.e. before the start of the
page), and terminates early upon a fault, obscuring the problem.
This patch fixes this issue by aligning the section starter markes to
CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which
ensures that the linker never needs to place padding bytes within the
section. Assertions are added to verify each section begins with the
function we expect, making our implicit requirement explicit.
In future it might be nice to rework the kexec and hibernation code to
decouple the section start from the entry point, but that involves much
more significant changes that come with a higher risk of error, so I've
tried to keep this fix as simple as possible for now.
Fixes: 47a15aa54427 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT")
Reported-by: CKI Project <cki-project@redhat.com>
Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-27 11:08:20 +00:00
ASSERT( _ _ r e l o c a t e _ n e w _ k e r n e l _ s t a r t = = a r m 6 4 _ r e l o c a t e _ n e w _ k e r n e l ,
" kexec c o n t r o l p a g e d o e s n o t s t a r t w i t h a r m 6 4 _ r e l o c a t e _ n e w _ k e r n e l " )
2021-09-30 14:31:08 +00:00
# endif