License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
/* SPDX-License-Identifier: GPL-2.0 */
2009-04-29 09:47:18 +02:00
/ *
* ld s c r i p t f o r t h e x86 k e r n e l
*
* Historic 3 2 - b i t v e r s i o n w r i t t e n b y M a r t i n M a r e s < m j @atrey.karlin.mff.cuni.cz>
*
2009-04-29 10:58:38 +02:00
* Modernisation, u n i f i c a t i o n a n d o t h e r c h a n g e s a n d f i x e s :
* Copyright ( C ) 2 0 0 7 - 2 0 0 9 S a m R a v n b o r g < s a m @ravnborg.org>
2009-04-29 09:47:18 +02:00
*
*
* Don' t d e f i n e a b s o l u t e s y m b o l s u n t i l a n d u n l e s s y o u k n o w t h a t s y m b o l
* value i s s h o u l d r e m a i n c o n s t a n t e v e n i f k e r n e l i m a g e i s r e l o c a t e d
* at r u n t i m e . A b s o l u t e s y m b o l s a r e n o t r e l o c a t e d . I f s y m b o l v a l u e s h o u l d
* change i f k e r n e l i s r e l o c a t e d , m a k e t h e s y m b o l s e c t i o n r e l a t i v e a n d
* put i t i n s i d e t h e s e c t i o n d e f i n i t i o n .
* /
# ifdef C O N F I G _ X 8 6 _ 3 2
# define L O A D _ O F F S E T _ _ P A G E _ O F F S E T
# else
# define L O A D _ O F F S E T _ _ S T A R T _ K E R N E L _ m a p
# endif
2020-03-26 12:30:20 -07:00
# define R U N T I M E _ D I S C A R D _ E X I T
2019-10-29 14:13:30 -07:00
# define E M I T S _ P T _ N O T E
2019-10-29 14:13:38 -07:00
# define R O _ E X C E P T I O N _ T A B L E _ A L I G N 1 6
2019-10-29 14:13:30 -07:00
2009-04-29 09:47:18 +02:00
# include < a s m - g e n e r i c / v m l i n u x . l d s . h >
# include < a s m / a s m - o f f s e t s . h >
# include < a s m / t h r e a d _ i n f o . h >
# include < a s m / p a g e _ t y p e s . h >
2017-07-24 18:36:57 -05:00
# include < a s m / o r c _ l o o k u p . h >
2009-04-29 09:47:18 +02:00
# include < a s m / c a c h e . h >
# include < a s m / b o o t . h >
# undef i 3 8 6 / * i n c a s e t h e p r e p r o c e s s o r i s a 3 2 b i t o n e * /
2019-01-09 17:32:10 +01:00
OUTPUT_ F O R M A T ( C O N F I G _ O U T P U T _ F O R M A T )
2009-04-29 09:47:18 +02:00
# ifdef C O N F I G _ X 8 6 _ 3 2
OUTPUT_ A R C H ( i 3 8 6 )
ENTRY( p h y s _ s t a r t u p _ 3 2 )
# else
OUTPUT_ A R C H ( i 3 8 6 : x86 - 6 4 )
ENTRY( p h y s _ s t a r t u p _ 6 4 )
# endif
x86_64: Fix jiffies ODR violation
'jiffies' and 'jiffies_64' are meant to alias (two different symbols that
share the same address). Most architectures make the symbols alias to the
same address via a linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:
jiffies = jiffies_64;
which is effectively a definition of jiffies.
jiffies and jiffies_64 are both forward declared for all architectures in
include/linux/jiffies.h. jiffies_64 is defined in kernel/time/timer.c.
x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via the linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via 'jiffies_64 = jiffies;'.
As Fangrui notes:
In LLD, symbol assignments in linker scripts override definitions in
object files. GNU ld appears to have the same behavior. It would
probably make sense for LLD to error "duplicate symbol" but GNU ld
is unlikely to adopt for compatibility reasons.
This results in an ODR violation (UB), which seems to have survived
thus far. Where it becomes harmful is when;
1. -fno-semantic-interposition is used:
As Fangrui notes:
Clang after LLVM commit 5b22bcc2b70d
("[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local")
defaults to -fno-semantic-interposition similar semantics which help
-fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
within the same translation unit. Unlike GCC
-fno-semantic-interposition, Clang emits such relocations referencing
local symbols for non-pic code as well.
This causes references to jiffies to refer to '.Ljiffies$local' when
jiffies is defined in the same translation unit. Likewise, references to
jiffies_64 become references to '.Ljiffies_64$local' in translation units
that define jiffies_64. Because these differ from the names used in the
linker script, they will not be rewritten to alias one another.
2. Full LTO
Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere. When
the linker processes the linker script, there are no longer any
references to jiffies_64' anywhere to replace with 'jiffies'. And
thus '.Ljiffies$local' and '.Ljiffies_64$local' no longer alias
at all.
In the process of porting patches enabling Full LTO from arm64 to x86_64,
spooky bugs have been observed where the kernel appeared to boot, but init
doesn't get scheduled.
Avoid the ODR violation by matching other architectures and define jiffies
only by linker script. For -fno-semantic-interposition + Full LTO, there
is no longer a global definition of jiffies for the compiler to produce a
local symbol which the linker script won't ensure aliases to jiffies_64.
Fixes: 40747ffa5aa8 ("asmlinkage: Make jiffies visible")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Alistair Delva <adelva@google.com>
Debugged-by: Nick Desaulniers <ndesaulniers@google.com>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bob Haarman <inglorion@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot on
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Link: https://lkml.kernel.org/r/20200602193100.229287-1-inglorion@google.com
2020-06-02 12:30:59 -07:00
jiffies = j i f f i e s _ 6 4 ;
2016-02-17 14:41:14 -08:00
# if d e f i n e d ( C O N F I G _ X 8 6 _ 6 4 )
2009-10-19 06:12:04 -07:00
/ *
2016-02-17 14:41:14 -08:00
* On 6 4 - b i t , a l i g n R O D A T A t o 2 M B s o w e r e t a i n l a r g e p a g e m a p p i n g s f o r
* boundaries s p a n n i n g k e r n e l t e x t , r o d a t a a n d d a t a s e c t i o n s .
2009-10-19 06:12:04 -07:00
*
* However, k e r n e l i d e n t i t y m a p p i n g s w i l l h a v e d i f f e r e n t R W X p e r m i s s i o n s
* to t h e p a g e s m a p p i n g t o t e x t a n d t o t h e p a g e s p a d d i n g ( w h i c h a r e f r e e d ) t h e
* text s e c t i o n . H e n c e k e r n e l i d e n t i t y m a p p i n g s w i l l b e b r o k e n t o s m a l l e r
* pages. F o r 6 4 - b i t , k e r n e l t e x t a n d k e r n e l i d e n t i t y m a p p i n g s a r e d i f f e r e n t ,
2016-02-17 14:41:14 -08:00
* so w e c a n e n a b l e p r o t e c t i o n c h e c k s a s w e l l a s r e t a i n 2 M B l a r g e p a g e
* mappings f o r k e r n e l t e x t .
2009-10-19 06:12:04 -07:00
* /
2018-07-18 11:41:04 +02:00
# define X 8 6 _ A L I G N _ R O D A T A _ B E G I N . = A L I G N ( H P A G E _ S I Z E ) ;
2009-10-14 14:46:56 -07:00
2018-07-18 11:41:04 +02:00
# define X 8 6 _ A L I G N _ R O D A T A _ E N D \
2009-10-14 14:46:56 -07:00
. = ALIGN( H P A G E _ S I Z E ) ; \
2018-07-18 11:41:04 +02:00
_ _ end_ r o d a t a _ h p a g e _ a l i g n = . ; \
_ _ end_ r o d a t a _ a l i g n e d = . ;
2009-10-14 14:46:56 -07:00
2017-12-04 15:07:46 +01:00
# define A L I G N _ E N T R Y _ T E X T _ B E G I N . = A L I G N ( P M D _ S I Z E ) ;
# define A L I G N _ E N T R Y _ T E X T _ E N D . = A L I G N ( P M D _ S I Z E ) ;
2018-09-14 08:45:58 -05:00
/ *
* This s e c t i o n c o n t a i n s d a t a w h i c h w i l l b e m a p p e d a s d e c r y p t e d . M e m o r y
* encryption o p e r a t e s o n a p a g e b a s i s . M a k e t h i s s e c t i o n P M D - a l i g n e d
* to a v o i d s p l i t t i n g t h e p a g e s w h i l e m a p p i n g t h e s e c t i o n e a r l y .
*
* Note : We u s e a s e p a r a t e s e c t i o n s o t h a t o n l y t h i s s e c t i o n g e t s
* decrypted t o a v o i d e x p o s i n g m o r e t h a n w e w i s h .
* /
# define B S S _ D E C R Y P T E D \
. = ALIGN( P M D _ S I Z E ) ; \
_ _ start_ b s s _ d e c r y p t e d = . ; \
* ( .bss . .decrypted ) ; \
. = ALIGN( P A G E _ S I Z E ) ; \
_ _ start_ b s s _ d e c r y p t e d _ u n u s e d = . ; \
. = ALIGN( P M D _ S I Z E ) ; \
_ _ end_ b s s _ d e c r y p t e d = . ; \
2009-10-14 14:46:56 -07:00
# else
2018-07-18 11:41:04 +02:00
# define X 8 6 _ A L I G N _ R O D A T A _ B E G I N
# define X 8 6 _ A L I G N _ R O D A T A _ E N D \
. = ALIGN( P A G E _ S I Z E ) ; \
_ _ end_ r o d a t a _ a l i g n e d = . ;
2009-10-14 14:46:56 -07:00
2017-12-04 15:07:46 +01:00
# define A L I G N _ E N T R Y _ T E X T _ B E G I N
# define A L I G N _ E N T R Y _ T E X T _ E N D
2018-09-14 08:45:58 -05:00
# define B S S _ D E C R Y P T E D
2017-12-04 15:07:46 +01:00
2009-10-14 14:46:56 -07:00
# endif
2009-04-29 09:47:19 +02:00
PHDRS {
text P T _ L O A D F L A G S ( 5 ) ; /* R_E */
2010-11-16 22:31:26 +01:00
data P T _ L O A D F L A G S ( 6 ) ; /* RW_ */
2009-04-29 09:47:19 +02:00
# ifdef C O N F I G _ X 8 6 _ 6 4
# ifdef C O N F I G _ S M P
2009-09-04 09:18:07 +01:00
percpu P T _ L O A D F L A G S ( 6 ) ; /* RW_ */
2009-04-29 09:47:19 +02:00
# endif
2009-08-25 14:50:53 +01:00
init P T _ L O A D F L A G S ( 7 ) ; /* RWE */
2009-04-29 09:47:19 +02:00
# endif
note P T _ N O T E F L A G S ( 0 ) ; /* ___ */
}
2009-04-29 09:47:18 +02:00
2009-04-29 09:47:20 +02:00
SECTIONS
{
# ifdef C O N F I G _ X 8 6 _ 3 2
x86/kallsyms: fix GOLD link failure with new relative kallsyms table format
Commit 2213e9a66bb8 ("kallsyms: add support for relative offsets in
kallsyms address table") changed the default kallsyms symbol table
format to use relative references rather than absolute addresses.
This reduces the size of the kallsyms symbol table by 50% on 64-bit
architectures, and further reduces the size of the relocation tables
used by relocatable kernels. Since the memory footprint of the static
kernel image is always much smaller than 4 GB, these relative references
are assumed to be representable in 32 bits, even when the native word
size is 64 bits.
On 64-bit architectures, this obviously only works if the distance
between each relative reference and the chosen anchor point is
representable in 32 bits, and so the table generation code in
scripts/kallsyms.c scans the table for the lowest value that is covered
by the kernel text, and selects it as the anchor point.
However, when using the GOLD linker rather than the default BFD linker
to build the x86_64 kernel, the symbol phys_offset_64, which is the
result of arithmetic defined in the linker script, is emitted as a 'T'
rather than an 'A' type symbol, resulting in scripts/kallsyms.c to
mistake it for a suitable anchor point, even though it is far away from
the actual kernel image in the virtual address space. This results in
out-of-range warnings from scripts/kallsyms.c and a broken build.
So let's align with the BFD linker, and emit the phys_offset_[32|64]
symbols as absolute symbols explicitly. Note that the out of range
issue does not exist on 32-bit x86, but this patch changes both symbols
for symmetry.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 10:04:37 +01:00
. = LOAD_ O F F S E T + L O A D _ P H Y S I C A L _ A D D R ;
phys_ s t a r t u p _ 3 2 = A B S O L U T E ( s t a r t u p _ 3 2 - L O A D _ O F F S E T ) ;
2009-04-29 09:47:20 +02:00
# else
x86/kallsyms: fix GOLD link failure with new relative kallsyms table format
Commit 2213e9a66bb8 ("kallsyms: add support for relative offsets in
kallsyms address table") changed the default kallsyms symbol table
format to use relative references rather than absolute addresses.
This reduces the size of the kallsyms symbol table by 50% on 64-bit
architectures, and further reduces the size of the relocation tables
used by relocatable kernels. Since the memory footprint of the static
kernel image is always much smaller than 4 GB, these relative references
are assumed to be representable in 32 bits, even when the native word
size is 64 bits.
On 64-bit architectures, this obviously only works if the distance
between each relative reference and the chosen anchor point is
representable in 32 bits, and so the table generation code in
scripts/kallsyms.c scans the table for the lowest value that is covered
by the kernel text, and selects it as the anchor point.
However, when using the GOLD linker rather than the default BFD linker
to build the x86_64 kernel, the symbol phys_offset_64, which is the
result of arithmetic defined in the linker script, is emitted as a 'T'
rather than an 'A' type symbol, resulting in scripts/kallsyms.c to
mistake it for a suitable anchor point, even though it is far away from
the actual kernel image in the virtual address space. This results in
out-of-range warnings from scripts/kallsyms.c and a broken build.
So let's align with the BFD linker, and emit the phys_offset_[32|64]
symbols as absolute symbols explicitly. Note that the out of range
issue does not exist on 32-bit x86, but this patch changes both symbols
for symmetry.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 10:04:37 +01:00
. = _ _ START_ K E R N E L ;
phys_ s t a r t u p _ 6 4 = A B S O L U T E ( s t a r t u p _ 6 4 - L O A D _ O F F S E T ) ;
2009-04-29 09:47:20 +02:00
# endif
2009-04-29 09:47:21 +02:00
/* Text and read-only data */
.text : AT( A D D R ( . t e x t ) - L O A D _ O F F S E T ) {
2009-09-16 16:44:28 -04:00
_ text = . ;
2016-09-21 16:04:07 -05:00
_ stext = . ;
2009-09-16 16:44:28 -04:00
/* bootstrapping code */
HEAD_ T E X T
2009-04-29 09:47:21 +02:00
TEXT_ T E X T
SCHED_ T E X T
2016-10-07 17:02:55 -07:00
CPUIDLE_ T E X T
2009-04-29 09:47:21 +02:00
LOCK_ T E X T
KPROBES_ T E X T
2017-12-04 15:07:46 +01:00
ALIGN_ E N T R Y _ T E X T _ B E G I N
2011-03-07 19:10:39 +01:00
ENTRY_ T E X T
2017-12-04 15:07:46 +01:00
ALIGN_ E N T R Y _ T E X T _ E N D
2016-03-25 14:22:05 -07:00
SOFTIRQENTRY_ T E X T
2020-08-18 15:57:45 +02:00
STATIC_ C A L L _ T E X T
2009-04-29 09:47:21 +02:00
* ( .gnu .warning )
x86/entry/64: Create a per-CPU SYSCALL entry trampoline
Handling SYSCALL is tricky: the SYSCALL handler is entered with every
single register (except FLAGS), including RSP, live. It somehow needs
to set RSP to point to a valid stack, which means it needs to save the
user RSP somewhere and find its own stack pointer. The canonical way
to do this is with SWAPGS, which lets us access percpu data using the
%gs prefix.
With PAGE_TABLE_ISOLATION-like pagetable switching, this is
problematic. Without a scratch register, switching CR3 is impossible, so
%gs-based percpu memory would need to be mapped in the user pagetables.
Doing that without information leaks is difficult or impossible.
Instead, use a different sneaky trick. Map a copy of the first part
of the SYSCALL asm at a different address for each CPU. Now RIP
varies depending on the CPU, so we can use RIP-relative memory access
to access percpu memory. By putting the relevant information (one
scratch slot and the stack address) at a constant offset relative to
RIP, we can make SYSCALL work without relying on %gs.
A nice thing about this approach is that we can easily switch it on
and off if we want pagetable switching to be configurable.
The compat variant of SYSCALL doesn't have this problem in the first
place -- there are plenty of scratch registers, since we don't care
about preserving r8-r15. This patch therefore doesn't touch SYSCALL32
at all.
This patch actually seems to be a small speedup. With this patch,
SYSCALL touches an extra cache line and an extra virtual page, but
the pipeline no longer stalls waiting for SWAPGS. It seems that, at
least in a tight loop, the latter outweights the former.
Thanks to David Laight for an optimization tip.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-04 15:07:25 +01:00
2018-01-19 01:14:21 +09:00
# ifdef C O N F I G _ R E T P O L I N E
_ _ indirect_ t h u n k _ s t a r t = . ;
* ( .text .__x86 .indirect_thunk )
_ _ indirect_ t h u n k _ e n d = . ;
# endif
2019-10-29 14:13:51 -07:00
} : text =0xcccc
2019-04-23 11:38:27 -07:00
2019-10-29 14:13:37 -07:00
/* End of text section, which should occupy whole number of pages */
_ etext = . ;
2010-11-16 22:31:26 +01:00
. = ALIGN( P A G E _ S I Z E ) ;
2019-10-29 14:13:37 -07:00
2018-07-18 11:41:04 +02:00
X8 6 _ A L I G N _ R O D A T A _ B E G I N
2009-08-25 14:50:53 +01:00
RO_ D A T A ( P A G E _ S I Z E )
2018-07-18 11:41:04 +02:00
X8 6 _ A L I G N _ R O D A T A _ E N D
2009-04-29 09:47:22 +02:00
2009-04-29 09:47:23 +02:00
/* Data */
.data : AT( A D D R ( . d a t a ) - L O A D _ O F F S E T ) {
2009-05-11 13:22:00 +01:00
/* Start of data section */
_ sdata = . ;
2009-08-25 14:50:53 +01:00
/* init_task */
INIT_ T A S K _ D A T A ( T H R E A D _ S I Z E )
2009-04-29 09:47:23 +02:00
# ifdef C O N F I G _ X 8 6 _ 3 2
2009-08-25 14:50:53 +01:00
/* 32 bit has nosave before _edata */
NOSAVE_ D A T A
2009-04-29 09:47:23 +02:00
# endif
2009-08-25 14:50:53 +01:00
PAGE_ A L I G N E D _ D A T A ( P A G E _ S I Z E )
2009-04-29 09:47:23 +02:00
2009-11-13 11:54:40 +00:00
CACHELINE_ A L I G N E D _ D A T A ( L 1 _ C A C H E _ B Y T E S )
2009-04-29 09:47:23 +02:00
2009-08-25 14:50:53 +01:00
DATA_ D A T A
CONSTRUCTORS
/* rarely changed data like cpu maps */
2009-11-13 11:54:40 +00:00
READ_ M O S T L Y _ D A T A ( I N T E R N O D E _ C A C H E _ B Y T E S )
2009-04-29 09:47:23 +02:00
/* End of data section */
_ edata = . ;
2009-08-25 14:50:53 +01:00
} : data
2009-04-29 09:47:23 +02:00
2017-03-30 17:49:27 +02:00
BUG_ T A B L E
2009-04-29 09:47:24 +02:00
2017-07-24 18:36:57 -05:00
ORC_ U N W I N D _ T A B L E
2011-08-03 09:31:50 -04:00
. = ALIGN( P A G E _ S I Z E ) ;
_ _ vvar_ p a g e = . ;
.vvar : AT( A D D R ( . v v a r ) - L O A D _ O F F S E T ) {
2011-08-03 09:31:51 -04:00
/* work around gold bug 13023 */
_ _ vvar_ b e g i n n i n g _ h a c k = . ;
2011-08-03 09:31:50 -04:00
2011-08-03 09:31:51 -04:00
/* Place all vvars at the offsets in asm/vvar.h. */
2019-11-12 01:27:10 +00:00
# define E M I T _ V V A R ( n a m e , o f f s e t ) \
2011-08-03 09:31:51 -04:00
. = _ _ vvar_ b e g i n n i n g _ h a c k + o f f s e t ; \
2011-08-03 09:31:50 -04:00
* ( .vvar_ # # name)
# include < a s m / v v a r . h >
# undef E M I T _ V V A R
2014-03-17 23:22:11 +01:00
/ *
* Pad t h e r e s t o f t h e p a g e w i t h z e r o s . O t h e r w i s e t h e l o a d e r
* can l e a v e g a r b a g e h e r e .
* /
. = _ _ vvar_ b e g i n n i n g _ h a c k + P A G E _ S I Z E ;
2011-08-03 09:31:50 -04:00
} : data
2018-02-08 14:38:57 +08:00
. = ALIGN( _ _ v v a r _ p a g e + P A G E _ S I Z E , P A G E _ S I Z E ) ;
2011-08-03 09:31:50 -04:00
2009-08-25 14:50:53 +01:00
/* Init code and data - will be freed after init */
. = ALIGN( P A G E _ S I Z E ) ;
.init .begin : AT( A D D R ( . i n i t . b e g i n ) - L O A D _ O F F S E T ) {
_ _ init_ b e g i n = . ; /* paired with __init_end */
2009-04-29 09:47:25 +02:00
}
2009-08-25 14:50:53 +01:00
# if d e f i n e d ( C O N F I G _ X 8 6 _ 6 4 ) & & d e f i n e d ( C O N F I G _ S M P )
2009-04-29 09:47:25 +02:00
/ *
2009-08-25 14:50:53 +01:00
* percpu o f f s e t s a r e z e r o - b a s e d o n S M P . P E R C P U _ V A D D R ( ) c h a n g e s t h e
* output P H D R , s o t h e n e x t o u t p u t s e c t i o n - . i n i t . t e x t - s h o u l d
* start a n o t h e r s e g m e n t - i n i t .
2009-04-29 09:47:25 +02:00
* /
2011-01-25 14:26:50 +01:00
PERCPU_ V A D D R ( I N T E R N O D E _ C A C H E _ B Y T E S , 0 , : p e r c p u )
2014-11-04 08:50:48 +00:00
ASSERT( S I Z E O F ( . d a t a . . p e r c p u ) < C O N F I G _ P H Y S I C A L _ S T A R T ,
" per- C P U d a t a t o o l a r g e - i n c r e a s e C O N F I G _ P H Y S I C A L _ S T A R T " )
2009-08-25 14:50:53 +01:00
# endif
2009-04-29 09:47:25 +02:00
2009-09-16 16:44:30 -04:00
INIT_ T E X T _ S E C T I O N ( P A G E _ S I Z E )
2009-08-25 14:50:53 +01:00
# ifdef C O N F I G _ X 8 6 _ 6 4
: init
# endif
2009-04-29 09:47:25 +02:00
2016-01-26 22:12:07 +01:00
/ *
* Section f o r c o d e u s e d e x c l u s i v e l y b e f o r e a l t e r n a t i v e s a r e r u n . A l l
* references t o s u c h c o d e m u s t b e p a t c h e d o u t b y a l t e r n a t i v e s , n o r m a l l y
* by u s i n g X 8 6 _ F E A T U R E _ A L W A Y S C P U f e a t u r e b i t .
*
* See s t a t i c _ c p u _ h a s ( ) f o r a n e x a m p l e .
* /
.altinstr_aux : AT( A D D R ( . a l t i n s t r _ a u x ) - L O A D _ O F F S E T ) {
* ( .altinstr_aux )
}
2009-09-16 16:44:30 -04:00
INIT_ D A T A _ S E C T I O N ( 1 6 )
2009-04-29 09:47:25 +02:00
.x86_cpu_dev .init : AT( A D D R ( . x86 _ c p u _ d e v . i n i t ) - L O A D _ O F F S E T ) {
_ _ x8 6 _ c p u _ d e v _ s t a r t = . ;
* ( .x86_cpu_dev .init )
_ _ x8 6 _ c p u _ d e v _ e n d = . ;
}
2013-10-17 15:35:35 -07:00
# ifdef C O N F I G _ X 8 6 _ I N T E L _ M I D
.x86_intel_mid_dev .init : AT( A D D R ( . x86 _ i n t e l _ m i d _ d e v . i n i t ) - \
LOAD_ O F F S E T ) {
_ _ x8 6 _ i n t e l _ m i d _ d e v _ s t a r t = . ;
* ( .x86_intel_mid_dev .init )
_ _ x8 6 _ i n t e l _ m i d _ d e v _ e n d = . ;
}
# endif
2010-08-27 14:19:33 -04:00
/ *
* start a d d r e s s a n d s i z e o f o p e r a t i o n s w h i c h d u r i n g r u n t i m e
* can b e p a t c h e d w i t h v i r t u a l i z a t i o n f r i e n d l y i n s t r u c t i o n s o r
* baremetal n a t i v e o n e s . T h i n k p a g e t a b l e o p e r a t i o n s .
* Details i n p a r a v i r t _ t y p e s . h
* /
2009-04-29 09:47:26 +02:00
. = ALIGN( 8 ) ;
.parainstructions : AT( A D D R ( . p a r a i n s t r u c t i o n s ) - L O A D _ O F F S E T ) {
_ _ parainstructions = . ;
* ( .parainstructions )
_ _ parainstructions_ e n d = . ;
}
2021-10-26 14:01:36 +02:00
# ifdef C O N F I G _ R E T P O L I N E
/ *
* List o f i n s t r u c t i o n s t h a t c a l l / j m p / j c c t o r e t p o l i n e t h u n k s
* _ _ x8 6 _ i n d i r e c t _ t h u n k _ * ( ) . T h e s e i n s t r u c t i o n s c a n b e p a t c h e d a l o n g
* with a l t e r n a t i v e s , a f t e r w h i c h t h e s e c t i o n c a n b e f r e e d .
* /
. = ALIGN( 8 ) ;
.retpoline_sites : AT( A D D R ( . r e t p o l i n e _ s i t e s ) - L O A D _ O F F S E T ) {
_ _ retpoline_ s i t e s = . ;
* ( .retpoline_sites )
_ _ retpoline_ s i t e s _ e n d = . ;
}
# endif
2010-08-27 14:19:33 -04:00
/ *
* struct a l t _ i n s t e n t r i e s . F r o m t h e h e a d e r ( a l t e r n a t i v e . h ) :
* " Alternative i n s t r u c t i o n s f o r d i f f e r e n t C P U t y p e s o r c a p a b i l i t i e s "
* Think l o c k i n g i n s t r u c t i o n s o n s p i n l o c k s .
* /
2009-04-29 09:47:26 +02:00
. = ALIGN( 8 ) ;
.altinstructions : AT( A D D R ( . a l t i n s t r u c t i o n s ) - L O A D _ O F F S E T ) {
_ _ alt_ i n s t r u c t i o n s = . ;
* ( .altinstructions )
_ _ alt_ i n s t r u c t i o n s _ e n d = . ;
}
2010-08-27 14:19:33 -04:00
/ *
* And h e r e a r e t h e r e p l a c e m e n t i n s t r u c t i o n s . T h e l i n k e r s t i c k s
* them a s b i n a r y b l o b s . T h e . a l t i n s t r u c t i o n s h a s e n o u g h d a t a t o
* get t h e a d d r e s s a n d t h e l e n g t h o f t h e m t o p a t c h t h e k e r n e l s a f e l y .
* /
2009-04-29 09:47:26 +02:00
.altinstr_replacement : AT( A D D R ( . a l t i n s t r _ r e p l a c e m e n t ) - L O A D _ O F F S E T ) {
* ( .altinstr_replacement )
}
2010-08-27 14:19:33 -04:00
/ *
* struct i o m m u _ t a b l e _ e n t r y e n t r i e s a r e i n j e c t e d i n t h i s s e c t i o n .
* It i s a n a r r a y o f I O M M U s w h i c h d u r i n g r u n t i m e g e t s s o r t e d d e p e n d i n g
* on i t s d e p e n d e n c y o r d e r . A f t e r r o o t f s _ i n i t c a l l i s c o m p l e t e
* this s e c t i o n c a n b e s a f e l y r e m o v e d .
* /
x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure
This patch set adds a mechanism to "modularize" the IOMMUs we have
on X86. Currently the count of IOMMUs is up to six and they have a complex
relationship that requires careful execution order. 'pci_iommu_alloc'
does that today, but most folks are unhappy with how it does it.
This patch set addresses this and also paves a mechanism to jettison
unused IOMMUs during run-time. For details that sparked this, please
refer to: http://lkml.org/lkml/2010/8/2/282
The first solution that comes to mind is to convert wholesale
the IOMMU detection routines to be called during initcall
time frame. Unfortunately that misses the dependency relationship
that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work,
GART detection MUST run first, and before all of that SWIOTLB MUST run).
The second solution would be to introduce a registration call wherein
the IOMMU would provide its detection/init routines and as well on what
MUST run before it. That would work, except that the 'pci_iommu_alloc'
which would run through this list, is called during mem_init. This means we
don't have any memory allocator, and it is so early that we haven't yet
started running through the initcall_t list.
This solution borrows concepts from the 2nd idea and from how
MODULE_INIT works. A macro is provided that each IOMMU uses to define
it's detect function and early_init (before the memory allocate is
active), and as well what other IOMMU MUST run before us. Since most IOMMUs
depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro
to depends on that is also provided.
This macro is similar in design to MODULE_PARAM macro wherein
we setup a .iommu_table section in which we populate it with the values
that match a struct iommu_table_entry. During bootup we will sort
through the array so that the IOMMUs that MUST run before us are first
elements in the array. And then we just iterate through them calling the
detection routine and if appropiate, the init routines.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 13:57:56 -04:00
.iommu_table : AT( A D D R ( . i o m m u _ t a b l e ) - L O A D _ O F F S E T ) {
_ _ iommu_ t a b l e = . ;
* ( .iommu_table )
_ _ iommu_ t a b l e _ e n d = . ;
}
2011-02-14 15:34:57 -08:00
2011-05-20 17:51:17 -07:00
. = ALIGN( 8 ) ;
.apicdrivers : AT( A D D R ( . a p i c d r i v e r s ) - L O A D _ O F F S E T ) {
_ _ apicdrivers = . ;
* ( .apicdrivers ) ;
_ _ apicdrivers_ e n d = . ;
}
2010-08-30 14:10:02 -04:00
. = ALIGN( 8 ) ;
2009-04-29 09:47:27 +02:00
/ *
2020-02-24 18:21:29 -05:00
* .exit .text is discarded a t r u n t i m e , n o t l i n k t i m e , t o d e a l w i t h
* references f r o m . a l t i n s t r u c t i o n s
2009-04-29 09:47:27 +02:00
* /
.exit .text : AT( A D D R ( . e x i t . t e x t ) - L O A D _ O F F S E T ) {
EXIT_ T E X T
}
.exit .data : AT( A D D R ( . e x i t . d a t a ) - L O A D _ O F F S E T ) {
EXIT_ D A T A
}
2009-08-25 14:50:53 +01:00
# if ! d e f i n e d ( C O N F I G _ X 8 6 _ 6 4 ) | | ! d e f i n e d ( C O N F I G _ S M P )
2011-03-24 18:50:09 +01:00
PERCPU_ S E C T I O N ( I N T E R N O D E _ C A C H E _ B Y T E S )
2009-04-29 09:47:28 +02:00
# endif
. = ALIGN( P A G E _ S I Z E ) ;
2009-04-29 12:56:58 +02:00
2009-04-29 09:47:28 +02:00
/* freed after init ends here */
2009-04-29 12:56:58 +02:00
.init .end : AT( A D D R ( . i n i t . e n d ) - L O A D _ O F F S E T ) {
_ _ init_ e n d = . ;
}
2009-04-29 09:47:28 +02:00
2009-08-25 14:50:53 +01:00
/ *
* smp_ l o c k s m i g h t b e f r e e d a f t e r i n i t
* start/ e n d m u s t b e p a g e a l i g n e d
* /
. = ALIGN( P A G E _ S I Z E ) ;
.smp_locks : AT( A D D R ( . s m p _ l o c k s ) - L O A D _ O F F S E T ) {
_ _ smp_ l o c k s = . ;
* ( .smp_locks )
. = ALIGN( P A G E _ S I Z E ) ;
2010-03-28 19:42:54 -07:00
_ _ smp_ l o c k s _ e n d = . ;
2009-08-25 14:50:53 +01:00
}
2009-04-29 09:47:28 +02:00
# ifdef C O N F I G _ X 8 6 _ 6 4
.data_nosave : AT( A D D R ( . d a t a _ n o s a v e ) - L O A D _ O F F S E T ) {
2009-08-25 14:50:53 +01:00
NOSAVE_ D A T A
}
2009-04-29 09:47:28 +02:00
# endif
2009-04-29 09:47:29 +02:00
/* BSS */
. = ALIGN( P A G E _ S I Z E ) ;
.bss : AT( A D D R ( . b s s ) - L O A D _ O F F S E T ) {
_ _ bss_ s t a r t = . ;
2010-02-20 01:03:38 +01:00
* ( .bss . .page_aligned )
2020-07-21 11:34:48 +02:00
. = ALIGN( P A G E _ S I Z E ) ;
2019-04-15 09:49:56 -07:00
* ( BSS_ M A I N )
2018-09-14 08:45:58 -05:00
BSS_ D E C R Y P T E D
2010-11-16 22:31:26 +01:00
. = ALIGN( P A G E _ S I Z E ) ;
2009-04-29 09:47:29 +02:00
_ _ bss_ s t o p = . ;
}
2009-04-29 09:47:28 +02:00
2019-06-19 18:40:57 +00:00
/ *
* The m e m o r y o c c u p i e d f r o m _ t e x t t o h e r e , _ _ e n d _ o f _ k e r n e l _ r e s e r v e , i s
* automatically r e s e r v e d i n s e t u p _ a r c h ( ) . A n y t h i n g a f t e r h e r e m u s t b e
* explicitly r e s e r v e d u s i n g m e m b l o c k _ r e s e r v e ( ) o r i t w i l l b e d i s c a r d e d
* and t r e a t e d a s a v a i l a b l e m e m o r y .
* /
_ _ end_ o f _ k e r n e l _ r e s e r v e = . ;
2009-04-29 09:47:29 +02:00
. = ALIGN( P A G E _ S I Z E ) ;
.brk : AT( A D D R ( . b r k ) - L O A D _ O F F S E T ) {
_ _ brk_ b a s e = . ;
. + = 6 4 * 1 0 2 4 ; /* 64k alignment slop space */
* ( .brk_reservation ) /* areas brk users have reserved */
_ _ brk_ l i m i t = . ;
}
x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 17:09:04 -07:00
. = ALIGN( P A G E _ S I Z E ) ; /* keep VO_INIT_SIZE page aligned */
2009-12-14 13:55:20 -08:00
_ end = . ;
2009-04-29 09:47:29 +02:00
x86/mm: Create a workarea in the kernel for SME early encryption
In order for the kernel to be encrypted "in place" during boot, a workarea
outside of the kernel must be used. This SME workarea used during early
encryption of the kernel is situated on a 2MB boundary after the end of
the kernel text, data, etc. sections (_end).
This works well during initial boot of a compressed kernel because of
the relocation used for decompression of the kernel. But when performing
a kexec boot, there's a chance that the SME workarea may not be mapped
by the kexec pagetables or that some of the other data used by kexec
could exist in this range.
Create a section for SME in vmlinux.lds.S. Position it after "_end", which
is after "__end_of_kernel_reserve", so that the memory will be reclaimed
during boot and since this area is all zeroes, it compresses well. This
new section will be part of the kernel image, so kexec will account for it
in pagetable mappings and placement of data after the kernel.
Here's an example of a kernel size without and with the SME section:
without:
vmlinux: 36,501,616
bzImage: 6,497,344
100000000-47f37ffff : System RAM
1e4000000-1e47677d4 : Kernel code (0x7677d4)
1e47677d5-1e4e2e0bf : Kernel data (0x6c68ea)
1e5074000-1e5372fff : Kernel bss (0x2fefff)
with:
vmlinux: 44,419,408
bzImage: 6,503,136
880000000-c7ff7ffff : System RAM
8cf000000-8cf7677d4 : Kernel code (0x7677d4)
8cf7677d5-8cfe2e0bf : Kernel data (0x6c68ea)
8d0074000-8d0372fff : Kernel bss (0x2fefff)
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Lianbo Jiang <lijiang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/3c483262eb4077b1654b2052bd14a8d011bffde3.1560969363.git.thomas.lendacky@amd.com
2019-06-19 18:40:59 +00:00
# ifdef C O N F I G _ A M D _ M E M _ E N C R Y P T
/ *
* Early s c r a t c h / w o r k a r e a s e c t i o n : L i v e s o u t s i d e o f t h e k e r n e l p r o p e r
* ( _ text - _ e n d ) .
*
* Resides a f t e r _ e n d b e c a u s e e v e n t h o u g h t h e . b r k s e c t i o n i s a f t e r
* _ _ end_ o f _ k e r n e l _ r e s e r v e , t h e . b r k s e c t i o n i s l a t e r r e s e r v e d a s a
* part o f t h e k e r n e l . S i n c e i t i s l o c a t e d a f t e r _ _ e n d _ o f _ k e r n e l _ r e s e r v e
* it w i l l b e d i s c a r d e d a n d b e c o m e p a r t o f t h e a v a i l a b l e m e m o r y . A s
* such, i t c a n o n l y b e u s e d b y v e r y e a r l y b o o t c o d e a n d m u s t n o t b e
* needed a f t e r w a r d s .
*
* Currently u s e d b y S M E f o r p e r f o r m i n g i n - p l a c e e n c r y p t i o n o f t h e
* kernel d u r i n g b o o t . R e s i d e s o n a 2 M B b o u n d a r y t o s i m p l i f y t h e
* pagetable s e t u p u s e d f o r S M E i n - p l a c e e n c r y p t i o n .
* /
. = ALIGN( H P A G E _ S I Z E ) ;
.init .scratch : AT( A D D R ( . i n i t . s c r a t c h ) - L O A D _ O F F S E T ) {
_ _ init_ s c r a t c h _ b e g i n = . ;
* ( .init .scratch )
. = ALIGN( H P A G E _ S I Z E ) ;
_ _ init_ s c r a t c h _ e n d = . ;
}
# endif
2018-02-08 14:38:57 +08:00
STABS_ D E B U G
DWARF_ D E B U G
2020-08-21 12:42:45 -07:00
ELF_ D E T A I L S
linker script: unify usage of discard definition
Discarded sections in different archs share some commonality but have
considerable differences. This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.
This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro. As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.
ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.
defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390. Michal Simek tested microblaze.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
2009-07-09 11:27:40 +09:00
DISCARDS
2009-04-29 09:47:20 +02:00
2020-08-21 12:43:04 -07:00
/ *
* Make s u r e t h a t t h e . g o t . p l t i s e i t h e r c o m p l e t e l y e m p t y o r i t
* contains o n l y t h e l a z y d i s p a t c h e n t r i e s .
* /
.got .plt ( INFO) : { * ( . g o t . p l t ) }
ASSERT( S I Z E O F ( . g o t . p l t ) = = 0 | |
# ifdef C O N F I G _ X 8 6 _ 6 4
SIZEOF( . g o t . p l t ) = = 0 x18 ,
# else
SIZEOF( . g o t . p l t ) = = 0 x c ,
# endif
" Unexpected G O T / P L T e n t r i e s d e t e c t e d ! " )
2020-08-21 12:43:05 -07:00
/ *
* Sections t h a t s h o u l d s t a y z e r o s i z e d , w h i c h i s s a f e r t o
* explicitly c h e c k i n s t e a d o f b l i n d l y d i s c a r d i n g .
* /
.got : {
* ( .got ) * ( .igot . * )
}
ASSERT( S I Z E O F ( . g o t ) = = 0 , " U n e x p e c t e d G O T e n t r i e s d e t e c t e d ! " )
.plt : {
* ( .plt ) * ( .plt . * ) * ( .iplt )
}
ASSERT( S I Z E O F ( . p l t ) = = 0 , " U n e x p e c t e d r u n - t i m e p r o c e d u r e l i n k a g e s d e t e c t e d ! " )
.rel .dyn : {
* ( .rel . * ) * ( .rel_ * )
}
ASSERT( S I Z E O F ( . r e l . d y n ) = = 0 , " U n e x p e c t e d r u n - t i m e r e l o c a t i o n s ( . r e l ) d e t e c t e d ! " )
.rela .dyn : {
* ( .rela . * ) * ( .rela_ * )
}
ASSERT( S I Z E O F ( . r e l a . d y n ) = = 0 , " U n e x p e c t e d r u n - t i m e r e l o c a t i o n s ( . r e l a ) d e t e c t e d ! " )
2020-08-21 12:43:04 -07:00
}
2009-04-29 09:47:18 +02:00
2009-10-16 07:18:46 +02:00
/ *
* The A S S E R T ( ) s i n k t o . i s i n t e n t i o n a l , f o r b i n u t i l s 2 . 1 4 c o m p a t i b i l i t y :
* /
2009-08-03 14:44:54 -07:00
. = ASSERT( ( _ e n d - L O A D _ O F F S E T < = K E R N E L _ I M A G E _ S I Z E ) ,
" kernel i m a g e b i g g e r t h a n K E R N E L _ I M A G E _ S I Z E " ) ;
x86/build: Fix vmlinux size check on 64-bit
Commit
b4e0409a36f4 ("x86: check vmlinux limits, 64-bit")
added a check that the size of the 64-bit kernel is less than
KERNEL_IMAGE_SIZE.
The check uses (_end - _text), but this is not enough. The initial
PMD used in startup_64() (level2_kernel_pgt) can only map upto
KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the
modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE.
The correct check is what is currently done for 32-bit, since
LOAD_OFFSET is defined appropriately for the two architectures. Just
check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally.
Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not
really used by the main kernel. The higher the kernel is located, the
less the space available for the vmalloc area. However, it is used by
KASLR in the compressed stub to limit the maximum address of the kernel
to a safe value.
Clean up various comments to clarify that despite the name,
KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a
limit on the maximum virtual address that the image can occupy.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201029161903.2553528-1-nivedita@alum.mit.edu
2020-10-29 12:19:03 -04:00
# ifdef C O N F I G _ X 8 6 _ 6 4
2009-04-29 09:47:18 +02:00
/ *
* Per- c p u s y m b o l s w h i c h n e e d t o b e o f f s e t f r o m _ _ p e r _ c p u _ l o a d
* for t h e b o o t p r o c e s s o r .
* /
2018-12-19 11:01:43 -08:00
# define I N I T _ P E R _ C P U ( x ) i n i t _ p e r _ c p u _ _ ## x = A B S O L U T E ( x ) + _ _ p e r _ c p u _ l o a d
2009-04-29 09:47:18 +02:00
INIT_ P E R _ C P U ( g d t _ p a g e ) ;
2019-04-14 18:00:06 +02:00
INIT_ P E R _ C P U ( f i x e d _ p e r c p u _ d a t a ) ;
INIT_ P E R _ C P U ( i r q _ s t a c k _ b a c k i n g _ s t o r e ) ;
2009-04-29 09:47:18 +02:00
# ifdef C O N F I G _ S M P
2019-04-14 18:00:06 +02:00
. = ASSERT( ( f i x e d _ p e r c p u _ d a t a = = 0 ) ,
" fixed_ p e r c p u _ d a t a i s n o t a t s t a r t o f p e r - c p u a r e a " ) ;
2009-04-29 09:47:18 +02:00
# endif
x86/build: Fix vmlinux size check on 64-bit
Commit
b4e0409a36f4 ("x86: check vmlinux limits, 64-bit")
added a check that the size of the 64-bit kernel is less than
KERNEL_IMAGE_SIZE.
The check uses (_end - _text), but this is not enough. The initial
PMD used in startup_64() (level2_kernel_pgt) can only map upto
KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the
modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE.
The correct check is what is currently done for 32-bit, since
LOAD_OFFSET is defined appropriately for the two architectures. Just
check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally.
Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not
really used by the main kernel. The higher the kernel is located, the
less the space available for the vmalloc area. However, it is used by
KASLR in the compressed stub to limit the maximum address of the kernel
to a safe value.
Clean up various comments to clarify that despite the name,
KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a
limit on the maximum virtual address that the image can occupy.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201029161903.2553528-1-nivedita@alum.mit.edu
2020-10-29 12:19:03 -04:00
# endif / * C O N F I G _ X 8 6 _ 6 4 * /
2009-04-29 09:47:18 +02:00
2015-09-09 15:38:55 -07:00
# ifdef C O N F I G _ K E X E C _ C O R E
2009-04-29 09:47:18 +02:00
# include < a s m / k e x e c . h >
2009-08-03 14:44:54 -07:00
. = ASSERT( k e x e c _ c o n t r o l _ c o d e _ s i z e < = K E X E C _ C O N T R O L _ C O D E _ M A X _ S I Z E ,
" kexec c o n t r o l c o d e s i z e i s t o o b i g " ) ;
2009-04-29 09:47:18 +02:00
# endif