1229 Commits

Author SHA1 Message Date
Baoquan He
69f8ca8d36 x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
With the current ifdeffery CONFIG_KEXEC, get_cmdline_acpi_rsdp() is only
available when kexec_load interface is taken, while kexec_file_load
interface can't make use of it.

Now change it to CONFIG_KEXEC_CORE.

Link: https://lkml.kernel.org/r/20231208073036.7884-6-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:18 -08:00
Linus Torvalds
8999ad99f4 * Refactor and clean up TDX hypercall/module call infrastructure
* Handle retrying/resuming page conversion hypercalls
  * Make sure to use the (shockingly) reliable TSC in TDX guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmVBlqMACgkQaDWVMHDJ
 krBrhBAArKay0MvzmdzS4IQs8JqkmuMEHI6WabYv2POPjJNXrn5MelLH972pLuX9
 NJ3+yeOLmNMYwqu5qwLCxyeO5CtqEyT2lNumUrxAtHQG4+oS2RYJYUalxMuoGxt8
 fAHxbItFg0TobBSUtwcnN2R2WdXwPuUW0Co+pJfLlZV4umVM7QANO1nf1g8YmlDD
 sVtpDaeKJRdylmwgWgAyGow0tDKd6oZB9j/vOHvZRrEQ+DMjEtG75fjwbjbu43Cl
 tI/fbxKjzAkOFcZ7PEPsQ8jE1h9DXU+JzTML9Nu/cPMalxMeBg3Dol/JOEbqgreI
 4W8Lg7g071EkUcQDxpwfe4aS6rsfsbwUIV4gJVkg9ZhlT7RayWsFik2CfBpJ4IMQ
 TM8BxtCEGCz3cxVvg3mstX9rRA7eNlXOzcKE/8Y7cpSsp94bA9jtf2GgUSUoi9St
 y+fIEei8mgeHutdiFh8psrmR7hp6iX/ldMFqHtjNo6xatf2KjdVHhVSU13Jz544z
 43ATNi1gZeHOgfwlAlIxLPDVDJidHuux3f6g2vfMkAqItyEqFauC1HA1pIDgckoY
 9FpBPp9vNUToSPp6reB6z/PkEBIrG2XtQh82JLt2CnCb6aTUtnPds+psjtT4sSE/
 a9SQvZLWWmpj+BlI2yrtfJzhy7SwhltgdjItQHidmCNEn0PYfTc=
 =FJ1Y
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 TDX updates from Dave Hansen:
 "The majority of this is a rework of the assembly and C wrappers that
  are used to talk to the TDX module and VMM. This is a nice cleanup in
  general but is also clearing the way for using this code when Linux is
  the TDX VMM.

  There are also some tidbits to make TDX guests play nicer with Hyper-V
  and to take advantage the hardware TSC.

  Summary:

   - Refactor and clean up TDX hypercall/module call infrastructure

   - Handle retrying/resuming page conversion hypercalls

   - Make sure to use the (shockingly) reliable TSC in TDX guests"

[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
  confidentiality technology ]

* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Mark TSC reliable
  x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
  x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
  x86/virt/tdx: Wire up basic SEAMCALL functions
  x86/tdx: Remove 'struct tdx_hypercall_args'
  x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
  x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
  x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
  x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
  x86/tdx: Rename __tdx_module_call() to __tdcall()
  x86/tdx: Make macros of TDCALLs consistent with the spec
  x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
  x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
  x86/tdx: Retry partially-completed page conversion hypercalls
2023-11-01 10:28:32 -10:00
Linus Torvalds
f0d25b5d0f x86 MM handling code changes for v6.7:
- Add new NX-stack self-test
  - Improve NUMA partial-CFMWS handling
  - Fix #VC handler bugs resulting in SEV-SNP boot failures
  - Drop the 4MB memory size restriction on minimal NUMA nodes
  - Reorganize headers a bit, in preparation to header dependency reduction efforts
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9Ek4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gIJQ/+Mg6mzMaThyNXqhJszeZJBmDaBv2sqjAB
 5tcferg1nJBdNBzX8bJ95UFt9fIqeYAcgH00qlQCYSmyzbC1TQTk9U2Pre1zbOw4
 042ONK8sygKSje1zdYleHoBeqwnxD2VNM0NwBElhGjumwHRng/tbLiI9wx6qiz+C
 VsFXavkBszHGA1pjy9wZLGixYIH5jCygMpH134Wp+CIhpS+C4nftcGdIL1D5Oil1
 6Tm2XeI6uyfiQhm9IOwDjfoYeC7gUjx1rp8rHseGUMJxyO/BX9q5j1ixbsVriqfW
 97ucYuRL9mza7ic516C9v7OlAA3AGH2xWV+SYOGK88i9Co4kYzP4WnamxXqOsD8+
 popxG55oa6QelhaouTBZvgERpZ4fWupSDs/UccsDaE9leMCerNEbGHEzt/Mm/2sw
 xopjMQ0y5Kn6/fS0dLv8U+XHu4ANkvXJkFd6Ny0h/WfgGefuQOOTG9ruYgfeqqB8
 dViQ4R7CO8ySjD45KawAZl/EqL86x1M/CI1nlt0YY4vNwUuOJbebL7Jn8w3Fjxm5
 FVfUlDmcPdhZfL9Vnrsi6MIou1cU1yJPw4D6sXJ4sg4s7A4ebBcRRrjayVQ4msjv
 Q7cvBOMnWEHhOV11pvP50FmQuj74XW3bUqiuWrnK1SypvnhHavF6kc1XYpBLs1xZ
 y8nueJW2qPw=
 =tT5F
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm handling updates from Ingo Molnar:

 - Add new NX-stack self-test

 - Improve NUMA partial-CFMWS handling

 - Fix #VC handler bugs resulting in SEV-SNP boot failures

 - Drop the 4MB memory size restriction on minimal NUMA nodes

 - Reorganize headers a bit, in preparation to header dependency
   reduction efforts

 - Misc cleanups & fixes

* tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
  selftests/x86/lam: Zero out buffer for readlink()
  x86/sev: Drop unneeded #include
  x86/sev: Move sev_setup_arch() to mem_encrypt.c
  x86/tdx: Replace deprecated strncpy() with strtomem_pad()
  selftests/x86/mm: Add new test that userspace stack is in fact NX
  x86/sev: Make boot_ghcb_page[] static
  x86/boot: Move x86_cache_alignment initialization to correct spot
  x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
  x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
  x86_64: Show CR4.PSE on auxiliaries like on BSP
  x86/iommu/docs: Update AMD IOMMU specification document URL
  x86/sev/docs: Update document URL in amd-memory-encryption.rst
  x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
  ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
  x86/numa: Introduce numa_fill_memblks()
2023-10-30 15:40:57 -10:00
Linus Torvalds
2b95bb0526 Changes to the x86 boot code in v6.7:
- Rework PE header generation, primarily to generate a modern, 4k aligned
    kernel image view with narrower W^X permissions.
 
  - Further refine init-lifetime annotations
 
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9B6ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jXOg/+NAOQKhIYK0uFqAM+CEhZX4cqsJ9Ck0ze
 bqQ8pf5iCkbVZ+6ByiMSOszScTgVTSalRfKMYR+Fa9PVkLK4SNAeYPnGYugmLRoj
 U3lZYFpNDEwsZOmFwvqn7p+bGBQcBYKZuVI6bQh5U7Go4v6ujPjK4zTAK8SWDdTp
 DtEzhj9tELcYlm1NSV2OYu/k0IWAFV3Fc++G3WAm85xOK7oXVOYeMIlaVkpOyAXu
 th3yCw+Q0u1tuBS++77FwsEPt1KTzKGcTL7HpPrb4e4e4snOhmri+KAM/Noef7Vm
 lWqo8fTAeYwpYQ80oFsXVDhuI5LsfsuQgQid20sy1cWwswe1o1A73/AeP4pRogWl
 zLJuRcuNg2/VhPvMLdBWn5QdgJjH7CngeH+r/YkZPssPo6tfwa5UW7HOTCQvLsO9
 a+xy098qkk9d+8Za0sYMuv8/4+Ev5II2haP8edLgNWQ8S5qKIUQaY+r6268pIN/F
 0fGP9B3wblBjiNWCnd8UBh6T271g1O4vaMUt2URdcW3QObEq2EGnNiTc5tx9OPnP
 ZxQdAIl6pB0H0HIe9/7PABF40biKn84zmSl+KuXrhvh1f5FjYjJWVNyKlAKdSpSR
 wjvzg1KbhLiAHV05oQSHR7txMHJxfjpxAKmus0Hpqo6qVQ9FgrKiru9VHKocIpKU
 z66g+wEKUuY=
 =sxZJ
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

 - Rework PE header generation, primarily to generate a modern, 4k
   aligned kernel image view with narrower W^X permissions.

 - Further refine init-lifetime annotations

 - Misc cleanups & fixes

* tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/boot: efistub: Assign global boot_params variable
  x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
  x86/head/64: Move the __head definition to <asm/init.h>
  x86/head/64: Add missing __head annotation to startup_64_load_idt()
  x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata
  x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
  x86/boot: Fix incorrect startup_gdt_descr.size
  x86/boot: Compile boot code with -std=gnu11 too
  x86/boot: Increase section and file alignment to 4k/512
  x86/boot: Split off PE/COFF .data section
  x86/boot: Drop PE/COFF .reloc section
  x86/boot: Construct PE/COFF .text section from assembler
  x86/boot: Derive file size from _edata symbol
  x86/boot: Define setup size in linker script
  x86/boot: Set EFI handover offset directly in header asm
  x86/boot: Grab kernel_info offset from zoffset header directly
  x86/boot: Drop references to startup_64
  x86/boot: Drop redundant code setting the root device
  x86/boot: Omit compression buffer from PE/COFF image memory footprint
  x86/boot: Remove the 'bugger off' message
  ...
2023-10-30 14:11:57 -10:00
Ard Biesheuvel
d55d5bc5d9 x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
The x86 decompressor is built and linked as a separate executable, but
it shares components with the kernel proper, which are either #include'd
as C files, or linked into the decompresor as a static library (e.g, the
EFI stub)

Both the kernel itself and the decompressor define a global symbol
'boot_params' to refer to the boot_params struct, but in the former
case, it refers to the struct directly, whereas in the decompressor, it
refers to a global pointer variable referring to the struct boot_params
passed by the bootloader or constructed from scratch.

This ambiguity is unfortunate, and makes it impossible to assign this
decompressor variable from the x86 EFI stub, given that declaring it as
extern results in a clash. So rename the decompressor version (whose
scope is limited) to boot_params_ptr.

[ mingo: Renamed 'boot_params_p' to 'boot_params_ptr' for clarity ]

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-10-18 12:03:03 +02:00
Joerg Roedel
63e44bc520 x86/sev: Check for user-space IOIO pointing to kernel space
Check the memory operand of INS/OUTS before emulating the instruction.
The #VC exception can get raised from user-space, but the memory operand
can be manipulated to access kernel memory before the emulation actually
begins and after the exception handler has run.

  [ bp: Massage commit message. ]

Fixes: 597cfe48212a ("x86/boot/compressed/64: Setup a GHCB-based VC Exception handler")
Reported-by: Tom Dohrmann <erbse.13@gmx.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2023-10-17 10:58:16 +02:00
Joerg Roedel
b9cb9c4558 x86/sev: Check IOBM for IOIO exceptions from user-space
Check the IO permission bitmap (if present) before emulating IOIO #VC
exceptions for user-space. These permissions are checked by hardware
already before the #VC is raised, but due to the VC-handler decoding
race it needs to be checked again in software.

Fixes: 25189d08e516 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: Tom Dohrmann <erbse.13@gmx.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Dohrmann <erbse.13@gmx.de>
Cc: <stable@kernel.org>
2023-10-09 15:47:57 +02:00
GUO Zihua
bfb32e2008 x86/sev: Make boot_ghcb_page[] static
boot_ghcb_page is not used by any other file, so make it static.

This also resolves sparse warning:

  arch/x86/boot/compressed/sev.c:28:13: warning: symbol 'boot_ghcb_page' was not declared. Should it be static?

Signed-off-by: GUO Zihua <guozihua@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-10-03 15:31:27 +02:00
Ard Biesheuvel
3e3eabe26d x86/boot: Increase section and file alignment to 4k/512
Align x86 with other EFI architectures, and increase the section
alignment to the EFI page size (4k), so that firmware is able to honour
the section permission attributes and map code read-only and data
non-executable.

There are a number of requirements that have to be taken into account:
- the sign tools get cranky when there are gaps between sections in the
  file view of the image
- the virtual offset of each section must be aligned to the image's
  section alignment
- the file offset *and size* of each section must be aligned to the
  image's file alignment
- the image size must be aligned to the section alignment
- each section's virtual offset must be greater than or equal to the
  size of the headers.

In order to meet all these requirements, while avoiding the need for
lots of padding to accommodate the .compat section, the latter is placed
at an arbitrary offset towards the end of the image, but aligned to the
minimum file alignment (512 bytes). The space before the .text section
is therefore distributed between the PE header, the .setup section and
the .compat section, leaving no gaps in the file coverage, making the
signing tools happy.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-18-ardb@google.com
2023-09-17 19:48:44 +02:00
Ard Biesheuvel
34951f3c28 x86/boot: Split off PE/COFF .data section
Describe the code and data of the decompressor binary using separate
.text and .data PE/COFF sections, so that we will be able to map them
using restricted permissions once we increase the section and file
alignment sufficiently. This avoids the need for memory mappings that
are writable and executable at the same time, which is something that
is best avoided for security reasons.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-17-ardb@google.com
2023-09-17 19:48:43 +02:00
Ard Biesheuvel
fa5750521e x86/boot: Drop PE/COFF .reloc section
Ancient buggy EFI loaders may have required a .reloc section to be
present at some point in time, but this has not been true for a long
time so the .reloc section can just be dropped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-16-ardb@google.com
2023-09-17 19:48:43 +02:00
Ard Biesheuvel
efa089e63b x86/boot: Construct PE/COFF .text section from assembler
Now that the size of the setup block is visible to the assembler, it is
possible to populate the PE/COFF header fields from the asm code
directly, instead of poking the values into the binary using the build
tool. This will make it easier to reorganize the section layout without
having to tweak the build tool in lockstep.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-15-ardb@google.com
2023-09-17 19:48:43 +02:00
Ard Biesheuvel
aeb92067f6 x86/boot: Derive file size from _edata symbol
Tweak the linker script so that the value of _edata represents the
decompressor binary's file size rounded up to the appropriate alignment.
This removes the need to calculate it in the build tool, and will make
it easier to refer to the file size from the header directly in
subsequent changes to the PE header layout.

While adding _edata to the sed regex that parses the compressed
vmlinux's symbol list, tweak the regex a bit for conciseness.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_STUB=y.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-14-ardb@google.com
2023-09-17 19:48:42 +02:00
Ard Biesheuvel
093ab258e3 x86/boot: Define setup size in linker script
The setup block contains the real mode startup code that is used when
booting from a legacy BIOS, along with the boot_params/setup_data that
is used by legacy x86 bootloaders to pass the command line and initial
ramdisk parameters, among other things.

The setup block also contains the PE/COFF header of the entire combined
image, which includes the compressed kernel image, the decompressor and
the EFI stub.

This PE header describes the layout of the executable image in memory,
and currently, the fact that the setup block precedes it makes it rather
fiddly to get the right values into the right place in the final image.

Let's make things a bit easier by defining the setup_size in the linker
script so it can be referenced from the asm code directly, rather than
having to rely on the build tool to calculate it. For the time being,
add 64 bytes of fixed padding for the .reloc and .compat sections - this
will be removed in a subsequent patch after the PE/COFF header has been
reorganized.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_MIXED=y.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-13-ardb@google.com
2023-09-17 19:48:42 +02:00
Ard Biesheuvel
eac956345f x86/boot: Set EFI handover offset directly in header asm
The offsets of the EFI handover entrypoints are available to the
assembler when constructing the header, so there is no need to set them
from the build tool afterwards.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-12-ardb@google.com
2023-09-17 19:48:42 +02:00
Ard Biesheuvel
2e765c02dc x86/boot: Grab kernel_info offset from zoffset header directly
Instead of parsing zoffset.h and poking the kernel_info offset value
into the header from the build tool, just grab the value directly in the
asm file that describes this header.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-11-ardb@google.com
2023-09-17 19:48:42 +02:00
Kirill A. Shutemov
f530ee95b7 x86/boot/compressed: Reserve more memory for page tables
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.

The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.

In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.

Update wost-case calculations to include 5-level paging.

To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.

[ Also add a warning, should this memory run low - by Dave Hansen. ]

Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
2023-09-17 09:48:57 +02:00
Ard Biesheuvel
b618d31f11 x86/boot: Drop references to startup_64
The x86 boot image generation tool assign a default value to startup_64
and subsequently parses the actual value from zoffset.h but it never
actually uses the value anywhere. So remove this code.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-25-ardb@google.com
2023-09-15 11:18:42 +02:00
Ard Biesheuvel
7448e8e5d1 x86/boot: Drop redundant code setting the root device
The root device defaults to 0,0 and is no longer configurable at build
time [0], so there is no need for the build tool to ever write to this
field.

[0] 079f85e624189292 ("x86, build: Do not set the root_dev field in bzImage")

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-23-ardb@google.com
2023-09-15 11:18:42 +02:00
Ard Biesheuvel
8eace5b355 x86/boot: Omit compression buffer from PE/COFF image memory footprint
Now that the EFI stub decompresses the kernel and hands over to the
decompressed image directly, there is no longer a need to provide a
decompression buffer as part of the .BSS allocation of the PE/COFF
image. It also means the PE/COFF image can be loaded anywhere in memory,
and setting the preferred image base is unnecessary. So drop the
handling of this from the header and from the build tool.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-22-ardb@google.com
2023-09-15 11:18:41 +02:00
Ard Biesheuvel
768171d7eb x86/boot: Remove the 'bugger off' message
Ancient (pre-2003) x86 kernels could boot from a floppy disk straight from
the BIOS, using a small real mode boot stub at the start of the image
where the BIOS would expect the boot record (or boot block) to appear.

Due to its limitations (kernel size < 1 MiB, no support for IDE, USB or
El Torito floppy emulation), this support was dropped, and a Linux aware
bootloader is now always required to boot the kernel from a legacy BIOS.

To smoothen this transition, the boot stub was not removed entirely, but
replaced with one that just prints an error message telling the user to
install a bootloader.

As it is unlikely that anyone doing direct floppy boot with such an
ancient kernel is going to upgrade to v6.5+ and expect that this boot
method still works, printing this message is kind of pointless, and so
it should be possible to remove the logic that emits it.

Let's free up this space so it can be used to expand the PE header in a
subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
2023-09-15 11:18:41 +02:00
Ard Biesheuvel
bfab35f552 x86/efi: Drop alignment flags from PE section headers
The section header flags for alignment are documented in the PE/COFF
spec as being applicable to PE object files only, not to PE executables
such as the Linux bzImage, so let's drop them from the PE header.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-20-ardb@google.com
2023-09-15 11:18:41 +02:00
Ard Biesheuvel
5f51c5d0e9 x86/efi: Drop EFI stub .bss from .data section
Now that the EFI stub always zero inits its BSS section upon entry,
there is no longer a need to place the BSS symbols carried by the stub
into the .data section.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-18-ardb@google.com
2023-09-15 11:18:40 +02:00
Kai Huang
8a8544bde8 x86/tdx: Remove 'struct tdx_hypercall_args'
Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args'
minus RCX.  Although from __tdx_hypercall()'s perspective RCX isn't
used as shared register thus not part of input/output registers, it's
not worth to have a separate structure just due to one register.

Remove the 'struct tdx_hypercall_args' and use 'struct tdx_module_args'
instead in __tdx_hypercall() related code.  This also saves the memory
copy between the two structures within __tdx_hypercall().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/798dad5ce24e9d745cf0e16825b75ccc433ad065.1692096753.git.kai.huang%40intel.com
2023-09-12 16:30:14 -07:00
Kai Huang
c641cfb5c1 x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
Now the 'struct tdx_hypercall_args' and 'struct tdx_module_args' are
almost the same, and the TDX_HYPERCALL and TDX_MODULE_CALL asm macro
share similar code pattern too.  The __tdx_hypercall() and __tdcall()
should be unified to use the same assembly code.

As a preparation to unify them, simplify the TDX_HYPERCALL to make it
more like the TDX_MODULE_CALL.

The TDX_HYPERCALL takes the pointer of 'struct tdx_hypercall_args' as
function call argument, and does below extra things comparing to the
TDX_MODULE_CALL:

1) It sets RAX to 0 (TDG.VP.VMCALL leaf) internally;
2) It sets RCX to the (fixed) bitmap of shared registers internally;
3) It calls __tdx_hypercall_failed() internally (and panics) when the
   TDCALL instruction itself fails;
4) After TDCALL, it moves R10 to RAX to return the return code of the
   VMCALL leaf, regardless the '\ret' asm macro argument;

Firstly, change the TDX_HYPERCALL to take the same function call
arguments as the TDX_MODULE_CALL does: TDCALL leaf ID, and the pointer
to 'struct tdx_module_args'.  Then 1) and 2) can be moved to the
caller:

 - TDG.VP.VMCALL leaf ID can be passed via the function call argument;
 - 'struct tdx_module_args' is 'struct tdx_hypercall_args' + RCX, thus
   the bitmap of shared registers can be passed via RCX in the
   structure.

Secondly, to move 3) and 4) out of assembly, make the TDX_HYPERCALL
always save output registers to the structure.  The caller then can:

 - Call __tdx_hypercall_failed() when TDX_HYPERCALL returns error;
 - Return R10 in the structure as the return code of the VMCALL leaf;

With above changes, change the asm function from __tdx_hypercall() to
__tdcall_hypercall(), and reimplement __tdx_hypercall() as the C wrapper
of it.  This avoids having to add another wrapper of __tdx_hypercall()
(_tdx_hypercall() is already taken).

The __tdcall_hypercall() will be replaced with a __tdcall() variant
using TDX_MODULE_CALL in a later commit as the final goal is to have one
assembly to handle both TDCALL and TDVMCALL.

Currently, the __tdx_hypercall() asm is in '.noinstr.text'.  To keep
this unchanged, annotate __tdx_hypercall(), which is a C function now,
as 'noinstr'.

Remove the __tdx_hypercall_ret() as __tdx_hypercall() already does so.

Implement __tdx_hypercall() in tdx-shared.c so it can be shared with the
compressed code.

Opportunistically fix a checkpatch error complaining using space around
parenthesis '(' and ')' while moving the bitmap of shared registers to
<asm/shared/tdx.h>.

[ dhansen: quash new calls of __tdx_hypercall_ret() that showed up ]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/0cbf25e7aee3256288045023a31f65f0cef90af4.1692096753.git.kai.huang%40intel.com
2023-09-12 16:28:13 -07:00
Linus Torvalds
97efd28334 Misc x86 cleanups.
The following commit deserves special mention:
 
    22dc02f81cddd Revert "sched/fair: Move unused stub functions to header"
 
 This is in x86/cleanups, because the revert is a re-application of a
 number of cleanups that got removed inadvertedly.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtDkoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jCMw//UvQGM8yxsTa57r0/ZpJHS2++P5pJxOsz
 45kBb3aBiDV6idArce4EHpthp3MvF3Pycibp9w0qg//NOtIHTKeagXv52abxsu1W
 hmS6gXJZDXZvjO1BFaUlmv97iYtzGfKnQppj32C4tMr9SaP49h3KvOHH1Z8CR3mP
 1nZaJJwYIi2qBh7msnmLGG+F0drb85O/dfHdoLX6iVJw9UP4n5nu9u8u1E0iC7J7
 2GC6AwP60A0EBRTK9EHQQEYwy9uvdS/TG5f2Qk1VP87KA9TTocs8MyapMG4DQu79
 hZKVEGuVQAlV3rYe9cJBNpDx1mTu3rmuMH0G71KEe3T6UcG5QRUiAPm8UfA9prPD
 uWjY4zm5o0W3tUio4V1MqqiLFIaBU63WmTY9RyM0QH8Ms8r8GugWKmnrTIuHfEC3
 9D+Uhyb5d8ID6qFGLTOvPm0g+v64lnH71qq83PcVJgsmZvUb2XvFA3d/A0h9JzLT
 2In/yfU10UsLUFTiNRyAgcLccjaGhliDB2oke9Kp0OyOTSQRcWmiq8kByVxCPImP
 auOWWcNXjcuOgjlnziEkMTDuRY12MgUB2If4zhELvdEFibIaaNW5sNCbY2msWaN1
 CUD7fcj0L3HZvzujUm72l5hxL2brJMuPwVNJfuOe4T8wzy569d6VJULrd1URBM1B
 vfaPs1Dz46Q=
 =kiAA
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 cleanups from Ingo Molnar:
 "The following commit deserves special mention:

   22dc02f81cddd Revert "sched/fair: Move unused stub functions to header"

  This is in x86/cleanups, because the revert is a re-application of a
  number of cleanups that got removed inadvertedly"

[ This also effectively undoes the amd_check_microcode() microcode
  declaration change I had done in my microcode loader merge in commit
  42a7f6e3ffe0 ("Merge tag 'x86_microcode_for_v6.6_rc1' [...]").

  I picked the declaration change by Arnd from this branch instead,
  which put it in <asm/processor.h> instead of <asm/microcode.h> like I
  had done in my merge resolution   - Linus ]

* tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Refactor code using deprecated strncpy() interface to use strscpy()
  x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy()
  x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() interfaces to use strscpy()
  x86/qspinlock-paravirt: Fix missing-prototype warning
  x86/paravirt: Silence unused native_pv_lock_init() function warning
  x86/alternative: Add a __alt_reloc_selftest() prototype
  x86/purgatory: Include header for warn() declaration
  x86/asm: Avoid unneeded __div64_32 function definition
  Revert "sched/fair: Move unused stub functions to header"
  x86/apic: Hide unused safe_smp_processor_id() on 32-bit UP
  x86/cpu: Fix amd_check_microcode() declaration
2023-08-28 17:05:58 -07:00
Linus Torvalds
f31f663fa9 - Handle the case where the beginning virtual address of the address
range whose SEV encryption status needs to change, is not page aligned
   so that callers which round up the number of pages to be decrypted,
   would mark a trailing page as decrypted and thus cause corruption
   during live migration.
 
 - Return an error from the #VC handler on AMD SEV-* guests when the debug
   registers swapping is enabled as a DR7 access should not happen then
   - that register is guest/host switched.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTsje4ACgkQEsHwGGHe
 VUrKWxAAqiTpQSjJCB32ReioSsLv3kl7vtLO3xtE42VpF0F7pAAPzRsh+bgDjGSM
 uqcEgbX1YtPlb8wK6yh5dyNLLvtzxhaAQkUfbfuEN2oqbvIEcJmhWAm/xw1yCsh2
 GDphFPtvqgT4KUCkEHj8tC9eQzG+L0bwymPzqXooVDnm4rL0ulEl6ONffhHfJFVg
 bmL8UjmJNodFcO6YBfosQIDDfc4ayuwm9f/rGltNFl+jwCi62kMJaVdU1112agsV
 LE73DRoRpfHKLslj9o9ubRcvaHKS24y2Amflnj1tas0h8I2uXBRwIgxjQXl5vtXV
 pu5/5VHM9X13x8XKpKVkEohXkBzFRigs8yfHq+JlpyWXXB/ymW8Acbqqnvll12r4
 JSy+XfBNa6V5Y/NDS/1faJiX6XSi5ZyZHZG70sf52XVoBYhzoms5kxqTJnHHisnY
 X50677/tQF3V9WsmKD0aj0Um2ztiq0/TNMI7FT3lzYRDNJb1ln3ZK9f04i8L5jA4
 bsrSV5oCVpLkW4eQaAJwxttTB+dRb5MwwkeS7D/eTuJ1pgUmJMIbZp2YbJH7NP2F
 6FShQdwHi8KYN7mxUM+WwOk7goaBm5L61w5UtRlt6aDE7LdEbMAeSSdmD3HlEZHR
 ntBqcEx4SkAT+Ru0izVXjsoWmtkn8+DY44oUC2X6eZxUSAT4Cm4=
 =td9F
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Handle the case where the beginning virtual address of the address
   range whose SEV encryption status needs to change, is not page
   aligned so that callers which round up the number of pages to be
   decrypted, would mark a trailing page as decrypted and thus cause
   corruption during live migration.

 - Return an error from the #VC handler on AMD SEV-* guests when the
   debug registers swapping is enabled as a DR7 access should not happen
   then - that register is guest/host switched.

* tag 'x86_sev_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Make enc_dec_hypercall() accept a size instead of npages
  x86/sev: Do not handle #VC for DR7 read/write
2023-08-28 15:28:54 -07:00
Alexey Kardashevskiy
e221804dad x86/sev: Do not handle #VC for DR7 read/write
With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
receive a #VC for reads or writes of DR7.

Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
an SNP guest doesn't gracefully terminate during SNP feature negotiation
if MSR_AMD64_SEV_DEBUG_SWAP is enabled.

Since a guest is not expected to receive a #VC on DR7 accesses when
MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
handler in this situation.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230816022122.981998-1-aik@amd.com
2023-08-16 10:13:42 +02:00
Ard Biesheuvel
a1b87d54f4 x86/efistub: Avoid legacy decompressor when doing EFI boot
The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are becoming problematic in the context of EFI boot now that the
logo requirements are getting tighter: EFI executables will no longer be
allowed to consist of a single executable section that is mapped with
read, write and execute permissions if they are intended for use in a
context where Secure Boot is enabled (and where Microsoft's set of
certificates is used, i.e., every x86 PC built to run Windows).

To avoid stepping on reserved memory before having inspected the E820
tables, and to ensure the correct placement when running a kernel build
that is non-relocatable, the bare metal decompressor moves its own
executable image to the end of the allocation that was reserved for it,
in order to perform the decompression in place. This means the region in
question requires both write and execute permissions, which either need
to be given upfront (which EFI will no longer permit), or need to be
applied on demand using the existing page fault handling framework.

However, the physical placement of the kernel is usually randomized
anyway, and even if it isn't, a dedicated decompression output buffer
can be allocated anywhere in memory using EFI APIs when still running in
the boot services, given that EFI support already implies a relocatable
kernel. This means that decompression in place is never necessary, nor
is moving the compressed image from one end to the other.

Since EFI already maps all of memory 1:1, it is also unnecessary to
create new page tables or handle page faults when decompressing the
kernel. That means there is also no need to replace the special
exception handlers for SEV. Generally, there is little need to do
any of the things that the decompressor does beyond

- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel

So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-24-ardb@kernel.org
2023-08-07 21:07:43 +02:00
Ard Biesheuvel
31c77a5099 x86/efistub: Perform SNP feature test while running in the firmware
Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.

The SNP feature check can be performed while running under the EFI boot
services, which means it can force the boot to fail gracefully and
return an error to the bootloader if the loaded kernel does not
implement support for all the features that the hypervisor enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-23-ardb@kernel.org
2023-08-07 21:03:53 +02:00
Ard Biesheuvel
8338151935 x86/decompressor: Factor out kernel decompression and relocation
Factor out the decompressor sequence that invokes the decompressor,
parses the ELF and applies the relocations so that it can be called
directly from the EFI stub.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-21-ardb@kernel.org
2023-08-07 20:59:13 +02:00
Ard Biesheuvel
24388292e2 x86/decompressor: Move global symbol references to C code
It is no longer necessary to be cautious when referring to global
variables in the position independent decompressor code, now that it is
built using PIE codegen and makes an assertion in the linker script that
no GOT entries exist (which would require adjustment for the actual
runtime load address of the decompressor binary).

This means global variables can be referenced directly from C code,
instead of having to pass their runtime addresses into C routines from
asm code, which needs to happen at each call site. Do so for the code
that will be called directly from the EFI stub after a subsequent patch,
and avoid the need to duplicate this logic a third time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-20-ardb@kernel.org
2023-08-07 20:58:02 +02:00
Ard Biesheuvel
03dda95137 x86/decompressor: Merge trampoline cleanup with switching code
Now that the trampoline setup code and the actual invocation of it are
all done from the C routine, the trampoline cleanup can be merged into
it as well, instead of returning to asm just to call another C function.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-16-ardb@kernel.org
2023-08-07 20:51:17 +02:00
Ard Biesheuvel
cb83cece57 x86/decompressor: Pass pgtable address to trampoline directly
The only remaining use of the trampoline address by the trampoline
itself is deriving the page table address from it, and this involves
adding an offset of 0x0. So simplify this, and pass the new CR3 value
directly.

This makes the fact that the page table happens to be at the start of
the trampoline allocation an implementation detail of the caller.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-15-ardb@kernel.org
2023-08-07 20:49:26 +02:00
Ard Biesheuvel
f97b67a773 x86/decompressor: Only call the trampoline when changing paging levels
Since the current and desired number of paging levels are known when the
trampoline is being prepared, avoid calling the trampoline at all if it
is clear that calling it is not going to result in a change to the
number of paging levels.

Given that the CPU is already running in long mode, the PAE and LA57
settings are necessarily consistent with the currently active page
tables, and other fields in CR4 will be initialized by the startup code
in the kernel proper. So limit the manipulation of CR4 to toggling the
LA57 bit, which is the only thing that really needs doing at this point
in the boot. This also means that there is no need to pass the value of
l5_required to toggle_la57(), as it will not be called unless CR4.LA57
needs to toggle.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-14-ardb@kernel.org
2023-08-07 20:48:09 +02:00
Ard Biesheuvel
64ef578b6b x86/decompressor: Call trampoline directly from C code
Instead of returning to the asm calling code to invoke the trampoline,
call it straight from the C code that sets it up. That way, the struct
return type is no longer needed for returning two values, and the call
can be made conditional more cleanly in a subsequent patch.

This means that all callee save 64-bit registers need to be preserved
and restored, as their contents may not survive the legacy mode switch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-13-ardb@kernel.org
2023-08-07 20:46:57 +02:00
Ard Biesheuvel
bd328aa01f x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode, and preserving the caller's
stack pointer value. Currently, the trampoline stack is placed in the
same page that carries the trampoline code, which means this page must
be mapped writable and executable, and the stack is therefore executable
as well.

Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a 32-bit addressable stack entirely, and in a later
patch, this will be taken advantage of by removing writable permissions
from (and adding executable permissions to) the trampoline code page
when booting via the EFI stub.

Note that the value of RSP still needs to be preserved explicitly across
the switch into 32-bit mode, as the register may get truncated to 32
bits.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-12-ardb@kernel.org
2023-08-07 20:45:55 +02:00
Ard Biesheuvel
918a7a04e7 x86/decompressor: Use standard calling convention for trampoline
Update the trampoline code so its arguments are passed via RDI and RSI,
which matches the ordinary SysV calling convention for x86_64. This will
allow this code to be called directly from C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-11-ardb@kernel.org
2023-08-07 20:43:59 +02:00
Ard Biesheuvel
e8972a76aa x86/decompressor: Call trampoline as a normal function
Move the long return to switch to 32-bit mode into the trampoline code
so it can be called as an ordinary function. This will allow it to be
called directly from C code in a subsequent patch.

While at it, reorganize the code somewhat to keep the prologue and
epilogue of the function together, making the code a bit easier to
follow. Also, given that the trampoline is now entered in 64-bit mode, a
simple RIP-relative reference can be used to take the address of the
exit point.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-10-ardb@kernel.org
2023-08-07 20:43:13 +02:00
Ard Biesheuvel
00c6b0978e x86/decompressor: Assign paging related global variables earlier
There is no need to defer the assignment of the paging related global
variables 'pgdir_shift' and 'ptrs_per_p4d' until after the trampoline is
cleaned up, so assign them as soon as it is clear that 5-level paging
will be enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-9-ardb@kernel.org
2023-08-07 20:42:11 +02:00
Ard Biesheuvel
8b63cba746 x86/decompressor: Store boot_params pointer in callee save register
Instead of pushing and popping %RSI several times to preserve the struct
boot_params pointer across the execution of the startup code, move it
into a callee save register before the first call into C, and copy it
back when needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-8-ardb@kernel.org
2023-08-07 20:41:04 +02:00
Ard Biesheuvel
d7156b986d x86/efistub: Clear BSS in EFI handover protocol entrypoint
The so-called EFI handover protocol is value-add from the distros that
permits a loader to simply copy a PE kernel image into memory and call
an alternative entrypoint that is described by an embedded boot_params
structure.

Most implementations of this protocol do not bother to check the PE
header for minimum alignment, section placement, etc, and therefore also
don't clear the image's BSS, or even allocate enough memory for it.

Allocating more memory on the fly is rather difficult, but at least
clear the BSS region explicitly when entering in this manner, so that
the EFI stub code does not get confused by global variables that were
not zero-initialized correctly.

When booting in mixed mode, this BSS clearing must occur before any
global state is created, so clear it in the 32-bit asm entry point.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-7-ardb@kernel.org
2023-08-07 20:40:36 +02:00
Ard Biesheuvel
1279206458 x86/decompressor: Avoid magic offsets for EFI handover entrypoint
The native 32-bit or 64-bit EFI handover protocol entrypoint offset
relative to the respective startup_32/64 address is described in
boot_params as handover_offset, so that the special Linux/x86 aware EFI
loader can find it there.

When mixed mode is enabled, this single field has to describe this
offset for both the 32-bit and 64-bit entrypoints, so their respective
relative offsets have to be identical. Given that startup_32 and
startup_64 are 0x200 bytes apart, and the EFI handover entrypoint
resides at a fixed offset, the 32-bit and 64-bit versions of those
entrypoints must be exactly 0x200 bytes apart as well.

Currently, hard-coded fixed offsets are used to ensure this, but it is
sufficient to emit the 64-bit entrypoint 0x200 bytes after the 32-bit
one, wherever it happens to reside. This allows this code (which is now
EFI mixed mode specific) to be moved into efi_mixed.S and out of the
startup code in head_64.S.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-6-ardb@kernel.org
2023-08-07 20:39:00 +02:00
Ard Biesheuvel
df9215f152 x86/efistub: Simplify and clean up handover entry code
Now that the EFI entry code in assembler is only used by the optional
and deprecated EFI handover protocol, and given that the EFI stub C code
no longer returns to it, most of it can simply be dropped.

While at it, clarify the symbol naming, by merging efi_main() and
efi_stub_entry(), making the latter the shared entry point for all
different boot modes that enter via the EFI stub.

The efi32_stub_entry() and efi64_stub_entry() names are referenced
explicitly by the tooling that populates the setup header, so these must
be retained, but can be emitted as aliases of efi_stub_entry() where
appropriate.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-5-ardb@kernel.org
2023-08-07 20:37:09 +02:00
Ard Biesheuvel
264b82fdb4 x86/decompressor: Don't rely on upper 32 bits of GPRs being preserved
The 4-to-5 level mode switch trampoline disables long mode and paging in
order to be able to flick the LA57 bit. According to section 3.4.1.1 of
the x86 architecture manual [0], 64-bit GPRs might not retain the upper
32 bits of their contents across such a mode switch.

Given that RBP, RBX and RSI are live at this point, preserve them on the
stack, along with the return address that might be above 4G as well.

[0] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture

  "Because the upper 32 bits of 64-bit general-purpose registers are
   undefined in 32-bit modes, the upper 32 bits of any general-purpose
   register are not preserved when switching from 64-bit mode to a 32-bit
   mode (to protected mode or compatibility mode). Software must not
   depend on these bits to maintain a value after a 64-bit to 32-bit
   mode switch."

Fixes: 194a9749c73d650c ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-2-ardb@kernel.org
2023-08-07 19:02:06 +02:00
Borislav Petkov (AMD)
bee6cf1a80 x86/sev: Do not try to parse for the CC blob on non-AMD hardware
Tao Liu reported a boot hang on an Intel Atom machine due to an unmapped
EFI config table. The reason being that the CC blob which contains the
CPUID page for AMD SNP guests is parsed for before even checking
whether the machine runs on AMD hardware.

Usually that's not a problem on !AMD hw - it simply won't find the CC
blob's GUID and return. However, if any parts of the config table
pointers array is not mapped, the kernel will #PF very early in the
decompressor stage without any opportunity to recover.

Therefore, do a superficial CPUID check before poking for the CC blob.
This will fix the current issue on real hardware. It would also work as
a guest on a non-lying hypervisor.

For the lying hypervisor, the check is done again, *after* parsing the
CC blob as the real CPUID page will be present then.

Clear the #VC handler in case SEV-{ES,SNP} hasn't been detected, as
a precaution.

Fixes: c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup")
Reported-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tao Liu <ltao@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230601072043.24439-1-ltao@redhat.com
2023-08-07 18:05:13 +02:00
Arnd Bergmann
6d33531bc0 x86/purgatory: Include header for warn() declaration
The purgatory code uses parts of the decompressor and provides its own
warn() function, but has to include the corresponding header file to
avoid a -Wmissing-prototypes warning.

It turns out that this function prototype actually differs from the
declaration, so change it to get a constant pointer in the declaration
and the other definition as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230803082619.1369127-5-arnd@kernel.org
2023-08-03 16:37:18 +02:00
Linus Torvalds
5dfe7a7e52 - Fix a race window where load_unaligned_zeropad() could cause
a fatal shutdown during TDX private<=>shared conversion
  - Annotate sites where VM "exit reasons" are reused as hypercall
    numbers.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmSZ50cACgkQaDWVMHDJ
 krAPEg//bAR0SzrjIir2eiQ7p3ktr4L7iae0odzFkW/XHam5ZJP+v9cCMLzY6zNO
 44x9Z85jZ9w34GcZ4D7D0OmTHbcDpcxckPXSFco/dK4IyYeLzUImYXEYo41YJEx9
 O4sSQMBqIyjMXej/oKBhgKHSWaV60XimvQvTvhpjXGD/45bt9sx4ZNVTi8+xVMbw
 jpktMsQcsjHctcIY2D2eUR61Ma/Vg9t6Qih51YMtbq6Nqcyhw2IKvDwIx3kQuW34
 qSW7wsyn+RfHQDpwjPgDG/6OE815Pbtzlxz+y6tB9pN88IWkA1H5Jh2CQRlMBud2
 2nVQRpqPgr9uOIeNnNI7FFd1LgTIc/v7lDPfpUH9KelOs7cGWvaRymkuhPSvWxRI
 tmjlMdFq8XcjrOPieA9WpxYKXinqj4wNXtnYGyaM+Ur/P3qWaj18PMCYMbeN6pJC
 eNYEJVk2Mt8GmiPL55aYG5+Z1F8sciLKbz8TFq5ya2z0EnSbyVvR+DReqd7zRzh6
 Bmbmx9isAzN6wWNszNt7f8XSgRPV2Ri1tvb1vixk3JLxyx2iVCUL6KJ0cZOUNy0x
 nQqy7/zMtBsFGZ/Ca8f2kpVaGgxkUFy7n1rI4psXTGBOVlnJyMz3WSi9N8F/uJg0
 Ca5W4493+txdyHSAmWBQAQuZp3RJOlhTkXe5dfjukv6Rnnw1MZU=
 =Hr3D
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 tdx updates from Dave Hansen:

 - Fix a race window where load_unaligned_zeropad() could cause a fatal
   shutdown during TDX private<=>shared conversion

   The race has never been observed in practice but might allow
   load_unaligned_zeropad() to catch a TDX page in the middle of its
   conversion process which would lead to a fatal and unrecoverable
   guest shutdown.

 - Annotate sites where VM "exit reasons" are reused as hypercall
   numbers.

* tag 'x86_tdx_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix enc_status_change_finish_noop()
  x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
  x86/mm: Allow guest.enc_status_change_prepare() to fail
  x86/tdx: Wrap exit reason with hcall_func()
2023-06-26 16:32:47 -07:00
Linus Torvalds
941d77c773 - Compute the purposeful misalignment of zen_untrain_ret automatically
and assert __x86_return_thunk's alignment so that future changes to
   the symbol macros do not accidentally break them.
 
 - Remove CONFIG_X86_FEATURE_NAMES Kconfig option as its existence is
   pointless
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZ1wgACgkQEsHwGGHe
 VUrXlRAAhIonFM1suIHo6w085jY5YA1XnsziJr/bT3e16FdHrF1i3RBEX4ml0m3O
 ADwa9dMsC9UJIa+/TKRNFfQvfRcLE/rsUKlS1Rluf/IRIxuSt/Oa4bFQHGXFRwnV
 eSlnWTNiaWrRs/vJEYAnMOe98oRyElHWa9kZ7K5FC+Ksfn/WO1U1RQ2NWg2A2wkN
 8MHJiS41w2piOrLU/nfUoI7+esHgHNlib222LoptDGHuaY8V2kBugFooxAEnTwS3
 PCzWUqCTgahs393vbx6JimoIqgJDa7bVdUMB0kOUHxtpbBiNdYYVy6e7UKnV1yjB
 qP3v9jQW4+xIyRmlFiErJXEZx7DjAIP5nulGRrUMzRfWEGF8mdRZ+ugGqFMHCeC8
 vXI+Ixp2vvsfhG3N/algsJUdkjlpt3hBpElRZCfR08M253KAbAmUNMOr4sx4RPi5
 ymC+pLIHd1K0G9jiZaFnOMaY71gAzWizwxwjFKLQMo44q+lpNJvsVO00cr+9RBYj
 LQL2APkONVEzHPMYR/LrXCslYaW//DrfLdRQjNbzUTonxFadkTO2Eu8J90B/5SFZ
 CqC1NYKMQPVFeg4XuGWCgZEH+jokCGhl8vvmXClAMcOEOZt0/s4H89EKFkmziyon
 L1ZrA/U72gWV8EwD7GLtuFJmnV4Ayl/hlek2j0qNKaj6UUgTFg8=
 =LcUq
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Compute the purposeful misalignment of zen_untrain_ret automatically
   and assert __x86_return_thunk's alignment so that future changes to
   the symbol macros do not accidentally break them.

 - Remove CONFIG_X86_FEATURE_NAMES Kconfig option as its existence is
   pointless

* tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retbleed: Add __x86_return_thunk alignment checks
  x86/cpu: Remove X86_FEATURE_NAMES
  x86/Kconfig: Make X86_FEATURE_NAMES non-configurable in prompt
2023-06-26 15:42:34 -07:00
Linus Torvalds
2c96136a3f - Add support for unaccepted memory as specified in the UEFI spec v2.9.
The gist of it all is that Intel TDX and AMD SEV-SNP confidential
   computing guests define the notion of accepting memory before using it
   and thus preventing a whole set of attacks against such guests like
   memory replay and the like.
 
   There are a couple of strategies of how memory should be accepted
   - the current implementation does an on-demand way of accepting.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZ0f4ACgkQEsHwGGHe
 VUpasw//RKoNW9HSU1csY+XnG9uuaT6QKgji+gIEZWWIGPO9iibvbBj6P5WxJE8T
 fe7yb6CGa6d6thoU0v+mQGVVvCd7OjCFwPD5wAo4mXToD7Ig+4mI6jMkaKifqa2f
 N1Uuy8u/zQnGyWrP5Y//WH5bJYfsmds4UGwXI2nLvKlhE7MG90/ePjt7iqnnwZsy
 waLp6a0Q1VeOvnfRszFLHZw/SoER5RSJ4qeVqttkFNmPPEKMK1Kirrl2poR56OQJ
 nMr6LqVtD7erlSJ36VRXOKzLI443A4iIEIg/wBjIOU6L5ZEWJGNqtCDnIqFJ6+TM
 XatsejfRYkkMZH0qXtX9+M0u+HJHbZPCH5rEcA21P3Nbd7od/ANq91qCGoMjtUZ4
 7pZohMG8M6IDvkLiOb8fQVkR5k/9Jbk8UvdN/8jdPx1ERxYMFO3BDvJpV2gzrW4B
 KYtFTPR7j2nY3eKfDpe3flanqYzKUBsKoTlLnlH7UHaiMZ2idwG8AQjlrhC/erCq
 /Lq1LXt4Mq46FyHABc+PSHytu0WWj1nBUftRt+lviY/Uv7TlkBldOTT7wm7itsfF
 HUCTfLWl0CJXKPq8rbbZhAG/exN6Ay6MO3E3OcNq8A72E5y4cXenuG3ic/0tUuOu
 FfjpiMk35qE2Qb4hnj1YtF3XINtd1MpKcuwzGSzEdv9s3J7hrS0=
 =FS95
 -----END PGP SIGNATURE-----

Merge tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 confidential computing update from Borislav Petkov:

 - Add support for unaccepted memory as specified in the UEFI spec v2.9.

   The gist of it all is that Intel TDX and AMD SEV-SNP confidential
   computing guests define the notion of accepting memory before using
   it and thus preventing a whole set of attacks against such guests
   like memory replay and the like.

   There are a couple of strategies of how memory should be accepted -
   the current implementation does an on-demand way of accepting.

* tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sevguest: Add CONFIG_CRYPTO dependency
  x86/efi: Safely enable unaccepted memory in UEFI
  x86/sev: Add SNP-specific unaccepted memory support
  x86/sev: Use large PSC requests if applicable
  x86/sev: Allow for use of the early boot GHCB for PSC requests
  x86/sev: Put PSC struct on the stack in prep for unaccepted memory support
  x86/sev: Fix calculation of end address based on number of pages
  x86/tdx: Add unaccepted memory support
  x86/tdx: Refactor try_accept_one()
  x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub
  efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
  efi: Add unaccepted memory support
  x86/boot/compressed: Handle unaccepted memory
  efi/libstub: Implement support for unaccepted memory
  efi/x86: Get full memory map in allocate_e820()
  mm: Add support for unaccepted memory
2023-06-26 15:32:39 -07:00