2005-04-16 15:20:36 -07:00
/ *
* linux/ b o o t / h e a d . S
*
* Copyright ( C ) 1 9 9 1 , 1 9 9 2 , 1 9 9 3 L i n u s T o r v a l d s
* /
/ *
* head. S c o n t a i n s t h e 3 2 - b i t s t a r t u p c o d e .
*
* NOTE! ! ! S t a r t u p h a p p e n s a t a b s o l u t e a d d r e s s 0 x00 0 0 1 0 0 0 , w h i c h i s a l s o w h e r e
* the p a g e d i r e c t o r y w i l l e x i s t . T h e s t a r t u p c o d e w i l l b e o v e r w r i t t e n b y
* the p a g e d i r e c t o r y . [ A c c o r d i n g t o c o m m e n t s e t c e l s e w h e r e o n a c o m p r e s s e d
* kernel i t w i l l e n d u p a t 0 x10 0 0 + 1 M b I h o p e s o a s I a s s u m e t h i s . - A C ]
*
* Page 0 i s d e l i b e r a t e l y k e p t s a f e , s i n c e S y s t e m M a n a g e m e n t M o d e c o d e i n
* laptops m a y n e e d t o a c c e s s t h e B I O S d a t a s t o r e d t h e r e . T h i s i s a l s o
* useful f o r f u t u r e d e v i c e d r i v e r s t h a t e i t h e r a c c e s s t h e B I O S v i a V M 8 6
* mode.
* /
/ *
2005-06-25 14:58:59 -07:00
* High l o a d e d s t u f f b y H a n s L e r m e n & W e r n e r A l m e s b e r g e r , F e b . 1 9 9 6
2005-04-16 15:20:36 -07:00
* /
2009-05-08 15:59:13 -07:00
.code32
.text
2005-04-16 15:20:36 -07:00
2009-09-16 16:44:27 -04:00
# include < l i n u x / i n i t . h >
2005-04-16 15:20:36 -07:00
# include < l i n u x / l i n k a g e . h >
# include < a s m / s e g m e n t . h >
2008-04-08 12:54:30 +02:00
# include < a s m / b o o t . h >
2007-05-02 19:27:07 +02:00
# include < a s m / m s r . h >
2008-05-12 15:43:39 +02:00
# include < a s m / p r o c e s s o r - f l a g s . h >
2007-10-26 11:29:04 -06:00
# include < a s m / a s m - o f f s e t s . h >
2015-02-19 13:34:58 +06:00
# include < a s m / b o o t p a r a m . h >
2005-04-16 15:20:36 -07:00
x86/build: Build compressed x86 kernels as PIE
The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC. When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses. However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:
Failed to allocate space for phdrs
during the decompression stage.
If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.
Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less
optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel. To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.
To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:
commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Mar 15 11:07:06 2016 -0700
Add -z noreloc-overflow option to x86-64 ld
Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
disable relocation overflow check. This can be used to avoid relocation
overflow check if there will be no dynamic relocation overflow at
run-time.
The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow. So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-16 20:04:35 -07:00
/ *
* Locally d e f i n e d s y m b o l s s h o u l d b e m a r k e d h i d d e n :
* /
.hidden _bss
.hidden _ebss
.hidden _got
.hidden _egot
2009-09-16 16:44:27 -04:00
_ _ HEAD
2005-04-16 15:20:36 -07:00
.code32
2009-02-14 00:50:22 +03:00
ENTRY( s t a r t u p _ 3 2 )
2013-01-24 12:20:07 -08:00
/ *
* 3 2 bit e n t r y i s 0 a n d i t i s A B I s o i m m u t a b l e !
* If w e c o m e h e r e d i r e c t l y f r o m a b o o t l o a d e r ,
* kernel( t e x t + d a t a + b s s + b r k ) r a m d i s k , z e r o _ p a g e , c o m m a n d l i n e
* all n e e d t o b e u n d e r t h e 4 G l i m i t .
* /
2005-04-16 15:20:36 -07:00
cld
2009-05-08 15:59:13 -07:00
/ *
* Test K E E P _ S E G M E N T S f l a g t o s e e i f t h e b o o t l o a d e r i s a s k i n g
* us t o n o t r e l o a d s e g m e n t s
* /
2015-02-19 13:34:58 +06:00
testb $ K E E P _ S E G M E N T S , B P _ l o a d f l a g s ( % e s i )
2007-10-26 11:29:04 -06:00
jnz 1 f
2005-04-16 15:20:36 -07:00
cli
2013-03-01 09:20:39 +08:00
movl $ ( _ _ B O O T _ D S ) , % e a x
2007-05-02 19:27:07 +02:00
movl % e a x , % d s
movl % e a x , % e s
movl % e a x , % s s
2007-10-26 11:29:04 -06:00
1 :
2007-05-02 19:27:07 +02:00
2009-05-08 15:59:13 -07:00
/ *
* Calculate t h e d e l t a b e t w e e n w h e r e w e w e r e c o m p i l e d t o r u n
2007-05-02 19:27:07 +02:00
* at a n d w h e r e w e w e r e a c t u a l l y l o a d e d a t . T h i s c a n o n l y b e d o n e
* with a s h o r t l o c a l c a l l o n x86 . N o t h i n g e l s e w i l l t e l l u s w h a t
* address w e a r e r u n n i n g a t . T h e r e s e r v e d c h u n k o f t h e r e a l - m o d e
2007-07-11 12:18:33 -07:00
* data a t 0 x1 e 4 ( d e f i n e d a s a s c r a t c h f i e l d ) a r e u s e d a s t h e s t a c k
* for t h i s c a l c u l a t i o n . O n l y 4 b y t e s a r e n e e d e d .
2007-05-02 19:27:07 +02:00
* /
2009-05-05 23:24:50 -07:00
leal ( B P _ s c r a t c h + 4 ) ( % e s i ) , % e s p
2007-05-02 19:27:07 +02:00
call 1 f
1 : popl % e b p
subl $ 1 b , % e b p
2007-05-02 19:27:08 +02:00
/* setup a stack and make sure cpu supports long mode. */
2008-04-08 12:54:30 +02:00
movl $ b o o t _ s t a c k _ e n d , % e a x
2007-05-02 19:27:08 +02:00
addl % e b p , % e a x
movl % e a x , % e s p
call v e r i f y _ c p u
testl % e a x , % e a x
jnz n o _ l o n g m o d e
2009-05-08 15:59:13 -07:00
/ *
* Compute t h e d e l t a b e t w e e n w h e r e w e w e r e c o m p i l e d t o r u n a t
2007-05-02 19:27:07 +02:00
* and w h e r e t h e c o d e w i l l a c t u a l l y r u n a t .
2009-05-08 15:59:13 -07:00
*
* % ebp c o n t a i n s t h e a d d r e s s w e a r e l o a d e d a t b y t h e b o o t l o a d e r a n d % e b x
2007-05-02 19:27:07 +02:00
* contains t h e a d d r e s s w h e r e w e s h o u l d m o v e t h e k e r n e l i m a g e t e m p o r a r i l y
* for s a f e i n - p l a c e d e c o m p r e s s i o n .
* /
# ifdef C O N F I G _ R E L O C A T A B L E
movl % e b p , % e b x
2009-05-11 15:56:08 -07:00
movl B P _ k e r n e l _ a l i g n m e n t ( % e s i ) , % e a x
decl % e a x
addl % e a x , % e b x
notl % e a x
andl % e a x , % e b x
2013-10-10 17:18:14 -07:00
cmpl $ L O A D _ P H Y S I C A L _ A D D R , % e b x
jge 1 f
2007-05-02 19:27:07 +02:00
# endif
2013-10-10 17:18:14 -07:00
movl $ L O A D _ P H Y S I C A L _ A D D R , % e b x
1 :
2007-05-02 19:27:07 +02:00
2009-05-08 17:42:16 -07:00
/* Target address to relocate to for decompression */
x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 17:09:04 -07:00
movl B P _ i n i t _ s i z e ( % e s i ) , % e a x
subl $ _ e n d , % e a x
addl % e a x , % e b x
2005-04-16 15:20:36 -07:00
/ *
2007-05-02 19:27:07 +02:00
* Prepare f o r e n t e r i n g 6 4 b i t m o d e
2005-04-16 15:20:36 -07:00
* /
2007-05-02 19:27:07 +02:00
/* Load new GDT with the 64bit segments using 32bit descriptor */
2016-11-01 15:49:24 +00:00
addl % e b p , g d t + 2 ( % e b p )
2007-05-02 19:27:07 +02:00
lgdt g d t ( % e b p )
/* Enable PAE mode */
2014-02-24 13:37:29 +00:00
movl % c r4 , % e a x
orl $ X 8 6 _ C R 4 _ P A E , % e a x
2007-05-02 19:27:07 +02:00
movl % e a x , % c r4
/ *
* Build e a r l y 4 G b o o t p a g e t a b l e
* /
2009-05-08 15:59:13 -07:00
/* Initialize Page tables to 0 */
2007-05-02 19:27:07 +02:00
leal p g t a b l e ( % e b x ) , % e d i
xorl % e a x , % e a x
x86/KASLR: Build identity mappings on demand
Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identity mapping.
To support ranges above this, we need to have an identity mapping for the
desired memory range before we can decompress (and later run) the kernel.
32-bit kernels already have the needed identity mapping. This patch adds
identity mappings for the needed memory ranges on 64-bit kernels. This
happens in two possible boot paths:
If loaded via startup_32(), we need to set up the needed identity map.
If loaded from a 64-bit bootloader, the bootloader will have already
set up an identity mapping, and we'll start via the compressed kernel's
startup_64(). In this case, the bootloader's page tables need to be
avoided while selecting the new uncompressed kernel location. If not,
the decompressor could overwrite them during decompression.
To accomplish this, we could walk the pagetable and find every page
that is used, and add them to mem_avoid, but this needs extra code and
will require increasing the size of the mem_avoid array.
Instead, we can create a new set of page tables for our own identity
mapping instead. The pages for the new page table will come from the
_pagetable section of the compressed kernel, which means they are
already contained by in mem_avoid array. To do this, we reuse the code
from the uncompressed kernel's identity mapping routines.
The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
init_size, as now the compressed kernel's _rodata to _end will contribute
to init_size.
To handle the possible mappings, we need to increase the existing page
table buffer size:
When booting via startup_64(), we need to cover the old VO, params,
cmdline and uncompressed kernel. In an extreme case we could have them
all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
This gets us to 19 pages total.
When booting via startup_32(), KASLR could move the uncompressed kernel
above 4G, so we need to create extra identity mappings, which should only
need (2+2) pages at most when it is beyond the 512G boundary. So 19
pages is sufficient for this case as well.
The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
names to maintain logical consistency with the existing BOOT_HEAP_SIZE
and BOOT_STACK_SIZE defines.
This patch is based on earlier patches from Yinghai Lu and Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06 15:01:35 -07:00
movl $ ( B O O T _ I N I T _ P G T _ S I Z E / 4 ) , % e c x
2007-05-02 19:27:07 +02:00
rep s t o s l
/* Build Level 4 */
leal p g t a b l e + 0 ( % e b x ) , % e d i
leal 0 x10 0 7 ( % e d i ) , % e a x
movl % e a x , 0 ( % e d i )
/* Build Level 3 */
leal p g t a b l e + 0 x10 0 0 ( % e b x ) , % e d i
leal 0 x10 0 7 ( % e d i ) , % e a x
movl $ 4 , % e c x
1 : movl % e a x , 0 x00 ( % e d i )
addl $ 0 x00 0 0 1 0 0 0 , % e a x
addl $ 8 , % e d i
decl % e c x
jnz 1 b
/* Build Level 2 */
leal p g t a b l e + 0 x20 0 0 ( % e b x ) , % e d i
movl $ 0 x00 0 0 0 1 8 3 , % e a x
movl $ 2 0 4 8 , % e c x
1 : movl % e a x , 0 ( % e d i )
addl $ 0 x00 2 0 0 0 0 0 , % e a x
addl $ 8 , % e d i
decl % e c x
jnz 1 b
/* Enable the boot page tables */
leal p g t a b l e ( % e b x ) , % e a x
movl % e a x , % c r3
/* Enable Long mode in EFER (Extended Feature Enable Register) */
movl $ M S R _ E F E R , % e c x
rdmsr
btsl $ _ E F E R _ L M E , % e a x
wrmsr
2013-01-24 12:20:01 -08:00
/* After gdt is loaded */
xorl % e a x , % e a x
lldt % a x
2015-04-01 16:50:58 +02:00
movl $ _ _ B O O T _ T S S , % e a x
2013-01-24 12:20:01 -08:00
ltr % a x
2009-05-08 15:59:13 -07:00
/ *
* Setup f o r t h e j u m p t o 6 4 b i t m o d e
2007-05-02 19:27:07 +02:00
*
* When t h e j u m p i s p e r f o r m e n d w e w i l l b e i n l o n g m o d e b u t
* in 3 2 b i t c o m p a t i b i l i t y m o d e w i t h E F E R . L M E = 1 , C S . L = 0 , C S . D = 1
* ( and i n t u r n E F E R . L M A = 1 ) . T o j u m p i n t o 6 4 b i t m o d e w e u s e
* the n e w g d t / i d t t h a t h a s _ _ K E R N E L _ C S w i t h C S . L = 1 .
* We p l a c e a l l o f t h e v a l u e s o n o u r m i n i s t a c k s o l r e t c a n
* used t o p e r f o r m t h a t f a r j u m p .
* /
pushl $ _ _ K E R N E L _ C S
leal s t a r t u p _ 6 4 ( % e b p ) , % e a x
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
# ifdef C O N F I G _ E F I _ M I X E D
movl e f i 3 2 _ c o n f i g ( % e b p ) , % e b x
cmp $ 0 , % e b x
jz 1 f
leal h a n d o v e r _ e n t r y ( % e b p ) , % e a x
1 :
# endif
2007-05-02 19:27:07 +02:00
pushl % e a x
/* Enter paged protected Mode, activating Long Mode */
2008-05-12 15:43:39 +02:00
movl $ ( X 8 6 _ C R 0 _ P G | X 8 6 _ C R 0 _ P E ) , % e a x / * E n a b l e P a g i n g a n d P r o t e c t e d m o d e * /
2007-05-02 19:27:07 +02:00
movl % e a x , % c r0
/* Jump from 32bit compatibility mode into 64bit mode. */
lret
2009-02-14 00:50:22 +03:00
ENDPROC( s t a r t u p _ 3 2 )
2007-05-02 19:27:07 +02:00
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
# ifdef C O N F I G _ E F I _ M I X E D
.org 0x190
ENTRY( e f i 3 2 _ s t u b _ e n t r y )
add $ 0 x4 , % e s p / * D i s c a r d r e t u r n a d d r e s s * /
popl % e c x
popl % e d x
popl % e s i
leal ( B P _ s c r a t c h + 4 ) ( % e s i ) , % e s p
call 1 f
1 : pop % e b p
subl $ 1 b , % e b p
movl % e c x , e f i 3 2 _ c o n f i g ( % e b p )
movl % e d x , e f i 3 2 _ c o n f i g + 8 ( % e b p )
sgdtl e f i 3 2 _ b o o t _ g d t ( % e b p )
leal e f i 3 2 _ c o n f i g ( % e b p ) , % e a x
movl % e a x , e f i _ c o n f i g ( % e b p )
jmp s t a r t u p _ 3 2
ENDPROC( e f i 3 2 _ s t u b _ e n t r y )
# endif
2007-05-02 19:27:07 +02:00
.code64
2007-05-02 19:27:08 +02:00
.org 0x200
2007-05-02 19:27:07 +02:00
ENTRY( s t a r t u p _ 6 4 )
2009-05-08 15:59:13 -07:00
/ *
2013-01-24 12:20:07 -08:00
* 6 4 bit e n t r y i s 0 x20 0 a n d i t i s A B I s o i m m u t a b l e !
2009-05-08 15:59:13 -07:00
* We c o m e h e r e e i t h e r f r o m s t a r t u p _ 3 2 o r d i r e c t l y f r o m a
2013-01-24 12:20:07 -08:00
* 6 4 bit b o o t l o a d e r .
* If w e c o m e h e r e f r o m a b o o t l o a d e r , k e r n e l ( t e x t + d a t a + b s s + b r k ) ,
* ramdisk, z e r o _ p a g e , c o m m a n d l i n e c o u l d b e a b o v e 4 G .
* We d e p e n d o n a n i d e n t i t y m a p p e d p a g e t a b l e b e i n g p r o v i d e d
* that m a p s o u r e n t i r e k e r n e l ( t e x t + d a t a + b s s + b r k ) , z e r o p a g e
* and c o m m a n d l i n e .
2007-05-02 19:27:07 +02:00
* /
x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,
"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."
This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.
The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.
Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
v7:
- Fix checkpatch warnings.
v6:
- Try to allocate initrd memory just below hdr->inird_addr_max.
v5:
- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.
v4:
- Don't read more than image->load_options_size
v3:
- Fix following warnings when compiling CONFIG_EFI_STUB=n
arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.
- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size
Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names
v2:
- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 21:27:52 +00:00
# ifdef C O N F I G _ E F I _ S T U B
2012-04-15 16:06:04 +01:00
/ *
x86, build: Dynamically find entry points in compressed startup code
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.
Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.
Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.
This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.
Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.
This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.
[ hpa: some of the issues in the description are addressed or
retconned by the 2.12 boot protocol. This patch has been edited to
only remove fixed addresses that were *not* thus retconned. ]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-10 14:31:59 +00:00
* The e n t r y p o i n t f o r t h e P E / C O F F e x e c u t a b l e i s e f i _ p e _ e n t r y , s o
* only l e g a c y b o o t l o a d e r s w i l l e x e c u t e t h i s j m p .
2012-04-15 16:06:04 +01:00
* /
jmp p r e f e r r e d _ a d d r
x86, build: Dynamically find entry points in compressed startup code
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.
Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.
Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.
This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.
Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.
This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.
[ hpa: some of the issues in the description are addressed or
retconned by the 2.12 boot protocol. This patch has been edited to
only remove fixed addresses that were *not* thus retconned. ]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-10 14:31:59 +00:00
ENTRY( e f i _ p e _ e n t r y )
2014-01-10 15:27:14 +00:00
movq % r c x , e f i 6 4 _ c o n f i g ( % r i p ) / * H a n d l e * /
movq % r d x , e f i 6 4 _ c o n f i g + 8 ( % r i p ) / * E F I S y s t e m t a b l e p o i n t e r * /
leaq e f i 6 4 _ c o n f i g ( % r i p ) , % r a x
movq % r a x , e f i _ c o n f i g ( % r i p )
call 1 f
1 : popq % r b p
subq $ 1 b , % r b p
/ *
2014-09-22 23:05:49 -07:00
* Relocate e f i _ c o n f i g - > c a l l ( ) .
2014-01-10 15:27:14 +00:00
* /
x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-08-22 12:01:21 +02:00
addq % r b p , e f i 6 4 _ c o n f i g + 3 2 ( % r i p )
2014-01-10 15:27:14 +00:00
movq % r a x , % r d i
2012-07-19 10:23:48 +01:00
call m a k e _ b o o t _ p a r a m s
cmpq $ 0 ,% r a x
2014-01-10 15:27:14 +00:00
je f a i l
mov % r a x , % r s i
2014-04-08 13:14:00 +01:00
leaq s t a r t u p _ 3 2 ( % r i p ) , % r a x
movl % e a x , B P _ c o d e 3 2 _ s t a r t ( % r s i )
2014-01-10 15:27:14 +00:00
jmp 2 f / * S k i p t h e r e l o c a t i o n * /
2012-07-19 10:23:48 +01:00
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
handover_entry :
2014-01-10 15:27:14 +00:00
call 1 f
1 : popq % r b p
subq $ 1 b , % r b p
/ *
2014-09-22 23:05:49 -07:00
* Relocate e f i _ c o n f i g - > c a l l ( ) .
2014-01-10 15:27:14 +00:00
* /
movq e f i _ c o n f i g ( % r i p ) , % r a x
x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-08-22 12:01:21 +02:00
addq % r b p , 3 2 ( % r a x )
2014-01-10 15:27:14 +00:00
2 :
movq e f i _ c o n f i g ( % r i p ) , % r d i
x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,
"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."
This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.
The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.
Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
v7:
- Fix checkpatch warnings.
v6:
- Try to allocate initrd memory just below hdr->inird_addr_max.
v5:
- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.
v4:
- Don't read more than image->load_options_size
v3:
- Fix following warnings when compiling CONFIG_EFI_STUB=n
arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.
- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size
Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names
v2:
- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 21:27:52 +00:00
call e f i _ m a i n
movq % r a x ,% r s i
2012-04-15 16:06:04 +01:00
cmpq $ 0 ,% r a x
jne 2 f
2014-01-10 15:27:14 +00:00
fail :
2012-04-15 16:06:04 +01:00
/* EFI init failed, so hang. */
hlt
2014-01-10 15:27:14 +00:00
jmp f a i l
2012-04-15 16:06:04 +01:00
2 :
2014-04-08 13:14:00 +01:00
movl B P _ c o d e 3 2 _ s t a r t ( % e s i ) , % e a x
x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,
"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."
This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.
The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.
Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
v7:
- Fix checkpatch warnings.
v6:
- Try to allocate initrd memory just below hdr->inird_addr_max.
v5:
- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.
v4:
- Don't read more than image->load_options_size
v3:
- Fix following warnings when compiling CONFIG_EFI_STUB=n
arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.
- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size
Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names
v2:
- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 21:27:52 +00:00
leaq p r e f e r r e d _ a d d r ( % r a x ) , % r a x
jmp * % r a x
preferred_addr :
# endif
2007-05-02 19:27:07 +02:00
/* Setup data segments. */
xorl % e a x , % e a x
movl % e a x , % d s
movl % e a x , % e s
movl % e a x , % s s
2007-08-10 22:31:05 +02:00
movl % e a x , % f s
movl % e a x , % g s
2007-05-02 19:27:07 +02:00
2009-05-08 15:59:13 -07:00
/ *
* Compute t h e d e c o m p r e s s e d k e r n e l s t a r t a d d r e s s . I t i s w h e r e
2007-05-02 19:27:07 +02:00
* we w e r e l o a d e d a t a l i g n e d t o a 2 M b o u n d a r y . % r b p c o n t a i n s t h e
* decompressed k e r n e l s t a r t a d d r e s s .
*
* If i t i s a r e l o c a t a b l e k e r n e l t h e n d e c o m p r e s s a n d r u n t h e k e r n e l
* from l o a d a d d r e s s a l i g n e d t o 2 M B a d d r , o t h e r w i s e d e c o m p r e s s a n d
2009-05-11 14:41:55 -07:00
* run t h e k e r n e l f r o m L O A D _ P H Y S I C A L _ A D D R
2009-05-08 17:42:16 -07:00
*
* We c a n n o t r e l y o n t h e c a l c u l a t i o n d o n e i n 3 2 - b i t m o d e , s i n c e w e
* may h a v e b e e n i n v o k e d v i a t h e 6 4 - b i t e n t r y p o i n t .
2007-05-02 19:27:07 +02:00
* /
/* Start with the delta to where the kernel will run at. */
# ifdef C O N F I G _ R E L O C A T A B L E
leaq s t a r t u p _ 3 2 ( % r i p ) / * - $ s t a r t u p _ 3 2 * / , % r b p
2009-05-11 15:56:08 -07:00
movl B P _ k e r n e l _ a l i g n m e n t ( % r s i ) , % e a x
decl % e a x
addq % r a x , % r b p
notq % r a x
andq % r a x , % r b p
2013-10-10 17:18:14 -07:00
cmpq $ L O A D _ P H Y S I C A L _ A D D R , % r b p
jge 1 f
2007-05-02 19:27:07 +02:00
# endif
2013-10-10 17:18:14 -07:00
movq $ L O A D _ P H Y S I C A L _ A D D R , % r b p
1 :
2007-05-02 19:27:07 +02:00
2009-05-08 17:42:16 -07:00
/* Target address to relocate to for decompression */
x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 17:09:04 -07:00
movl B P _ i n i t _ s i z e ( % r s i ) , % e b x
subl $ _ e n d , % e b x
addq % r b p , % r b x
2007-05-02 19:27:07 +02:00
2009-05-08 16:27:41 -07:00
/* Set up the stack */
leaq b o o t _ s t a c k _ e n d ( % r b x ) , % r s p
/* Zero EFLAGS */
pushq $ 0
popfq
2009-05-08 15:59:13 -07:00
/ *
* Copy t h e c o m p r e s s e d k e r n e l t o t h e e n d o f o u r b u f f e r
2007-05-02 19:27:07 +02:00
* where d e c o m p r e s s i o n i n p l a c e b e c o m e s s a f e .
* /
2009-05-08 16:45:15 -07:00
pushq % r s i
leaq ( _ b s s - 8 ) ( % r i p ) , % r s i
leaq ( _ b s s - 8 ) ( % r b x ) , % r d i
2009-05-08 16:20:34 -07:00
movq $ _ b s s / * - $ s t a r t u p _ 3 2 * / , % r c x
2009-05-08 16:45:15 -07:00
shrq $ 3 , % r c x
std
rep m o v s q
cld
popq % r s i
2007-05-02 19:27:07 +02:00
/ *
* Jump t o t h e r e l o c a t e d a d d r e s s .
* /
leaq r e l o c a t e d ( % r b x ) , % r a x
jmp * % r a x
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
# ifdef C O N F I G _ E F I _ S T U B
.org 0x390
ENTRY( e f i 6 4 _ s t u b _ e n t r y )
movq % r d i , e f i 6 4 _ c o n f i g ( % r i p ) / * H a n d l e * /
movq % r s i , e f i 6 4 _ c o n f i g + 8 ( % r i p ) / * E F I S y s t e m t a b l e p o i n t e r * /
leaq e f i 6 4 _ c o n f i g ( % r i p ) , % r a x
movq % r a x , e f i _ c o n f i g ( % r i p )
movq % r d x , % r s i
jmp h a n d o v e r _ e n t r y
ENDPROC( e f i 6 4 _ s t u b _ e n t r y )
# endif
2009-05-08 15:59:13 -07:00
.text
2007-05-02 19:27:07 +02:00
relocated :
2005-04-16 15:20:36 -07:00
/ *
2009-05-08 16:27:41 -07:00
* Clear B S S ( s t a c k i s c u r r e n t l y e m p t y )
2005-04-16 15:20:36 -07:00
* /
2009-05-08 16:45:15 -07:00
xorl % e a x , % e a x
leaq _ b s s ( % r i p ) , % r d i
leaq _ e b s s ( % r i p ) , % r c x
2007-05-02 19:27:07 +02:00
subq % r d i , % r c x
2009-05-08 16:45:15 -07:00
shrq $ 3 , % r c x
rep s t o s q
2007-05-02 19:27:07 +02:00
2014-09-22 23:05:49 -07:00
/ *
* Adjust o u r o w n G O T
* /
leaq _ g o t ( % r i p ) , % r d x
leaq _ e g o t ( % r i p ) , % r c x
1 :
cmpq % r c x , % r d x
jae 2 f
addq % r b x , ( % r d x )
addq $ 8 , % r d x
jmp 1 b
2 :
2005-04-16 15:20:36 -07:00
/ *
2016-04-18 09:42:13 -07:00
* Do t h e e x t r a c t i o n , a n d j u m p t o t h e n e w k e r n e l . .
2005-04-16 15:20:36 -07:00
* /
2009-05-08 17:42:16 -07:00
pushq % r s i / * S a v e t h e r e a l m o d e a r g u m e n t * /
movq % r s i , % r d i / * r e a l m o d e a d d r e s s * /
leaq b o o t _ h e a p ( % r i p ) , % r s i / * m a l l o c a r e a f o r u n c o m p r e s s i o n * /
leaq i n p u t _ d a t a ( % r i p ) , % r d x / * i n p u t _ d a t a * /
movl $ z _ i n p u t _ l e n , % e c x / * i n p u t _ l e n * /
movq % r b p , % r8 / * o u t p u t t a r g e t a d d r e s s * /
2014-10-31 21:40:38 +08:00
movq $ z _ o u t p u t _ l e n , % r9 / * d e c o m p r e s s e d l e n g t h , e n d o f r e l o c s * /
2016-04-18 09:42:13 -07:00
call e x t r a c t _ k e r n e l / * r e t u r n s k e r n e l l o c a t i o n i n % r a x * /
2007-05-02 19:27:07 +02:00
popq % r s i
2005-04-16 15:20:36 -07:00
/ *
2007-05-02 19:27:07 +02:00
* Jump t o t h e d e c o m p r e s s e d k e r n e l .
2005-04-16 15:20:36 -07:00
* /
2013-10-10 17:18:14 -07:00
jmp * % r a x
2005-04-16 15:20:36 -07:00
2013-01-24 12:20:00 -08:00
.code32
no_longmode :
/* This isn't an x86-64 CPU so hang */
1 :
hlt
jmp 1 b
# include " . . / . . / k e r n e l / v e r i f y _ c p u . S "
2007-05-02 19:27:07 +02:00
.data
gdt :
.word gdt_end - gdt
.long gdt
.word 0
.quad 0x0000000000000000 /* NULL descriptor */
.quad 0x00af9a000000ffff /* __KERNEL_CS */
.quad 0x00cf92000000ffff /* __KERNEL_DS */
2007-08-10 22:31:05 +02:00
.quad 0x0080890000000000 /* TS descriptor */
.quad 0x0000000000000000 /* TS continued */
2007-05-02 19:27:07 +02:00
gdt_end :
2008-04-08 12:54:30 +02:00
2014-03-05 10:15:55 +00:00
# ifdef C O N F I G _ E F I _ S T U B
2014-01-10 15:27:14 +00:00
efi_config :
.quad 0
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
# ifdef C O N F I G _ E F I _ M I X E D
.global efi32_config
efi32_config :
x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-08-22 12:01:21 +02:00
.fill 4 , 8 , 0
x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-10 15:54:31 +00:00
.quad efi64_thunk
.byte 0
# endif
2014-01-10 15:27:14 +00:00
.global efi64_config
efi64_config :
x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-08-22 12:01:21 +02:00
.fill 4 , 8 , 0
2014-03-27 15:10:39 -07:00
.quad efi_call
2014-01-10 15:27:14 +00:00
.byte 1
2014-03-05 10:15:55 +00:00
# endif / * C O N F I G _ E F I _ S T U B * /
2009-05-08 15:59:13 -07:00
/ *
* Stack a n d h e a p f o r u n c o m p r e s s i o n
* /
.bss
.balign 4
2008-04-08 12:54:30 +02:00
boot_heap :
.fill BOOT_ H E A P _ S I Z E , 1 , 0
boot_stack :
.fill BOOT_ S T A C K _ S I Z E , 1 , 0
boot_stack_end :
2009-05-08 16:20:34 -07:00
/ *
* Space f o r p a g e t a b l e s ( n o t i n . b s s s o n o t z e r o e d )
* /
.section " .pgtable " , " a" ,@nobits
.balign 4096
pgtable :
x86/KASLR: Build identity mappings on demand
Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identity mapping.
To support ranges above this, we need to have an identity mapping for the
desired memory range before we can decompress (and later run) the kernel.
32-bit kernels already have the needed identity mapping. This patch adds
identity mappings for the needed memory ranges on 64-bit kernels. This
happens in two possible boot paths:
If loaded via startup_32(), we need to set up the needed identity map.
If loaded from a 64-bit bootloader, the bootloader will have already
set up an identity mapping, and we'll start via the compressed kernel's
startup_64(). In this case, the bootloader's page tables need to be
avoided while selecting the new uncompressed kernel location. If not,
the decompressor could overwrite them during decompression.
To accomplish this, we could walk the pagetable and find every page
that is used, and add them to mem_avoid, but this needs extra code and
will require increasing the size of the mem_avoid array.
Instead, we can create a new set of page tables for our own identity
mapping instead. The pages for the new page table will come from the
_pagetable section of the compressed kernel, which means they are
already contained by in mem_avoid array. To do this, we reuse the code
from the uncompressed kernel's identity mapping routines.
The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
init_size, as now the compressed kernel's _rodata to _end will contribute
to init_size.
To handle the possible mappings, we need to increase the existing page
table buffer size:
When booting via startup_64(), we need to cover the old VO, params,
cmdline and uncompressed kernel. In an extreme case we could have them
all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
This gets us to 19 pages total.
When booting via startup_32(), KASLR could move the uncompressed kernel
above 4G, so we need to create extra identity mappings, which should only
need (2+2) pages at most when it is beyond the 512G boundary. So 19
pages is sufficient for this case as well.
The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
names to maintain logical consistency with the existing BOOT_HEAP_SIZE
and BOOT_STACK_SIZE defines.
This patch is based on earlier patches from Yinghai Lu and Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06 15:01:35 -07:00
.fill BOOT_ P G T _ S I Z E , 1 , 0