3ecf671f1d
Commit d23d33ea0fcd ("x86/microcode: Taint and warn on late loading") started tainting the kernel after microcode late loading. There is some history behind why x86 microcode started doing the late loading stop_machine() rendezvous. Document the whole situation. No functional changes. [ bp: Fix typos, heavily massage. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220813223825.3164861-2-ashok.raj@intel.com
241 lines
7.8 KiB
ReStructuredText
241 lines
7.8 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
==========================
|
|
The Linux Microcode Loader
|
|
==========================
|
|
|
|
:Authors: - Fenghua Yu <fenghua.yu@intel.com>
|
|
- Borislav Petkov <bp@suse.de>
|
|
- Ashok Raj <ashok.raj@intel.com>
|
|
|
|
The kernel has a x86 microcode loading facility which is supposed to
|
|
provide microcode loading methods in the OS. Potential use cases are
|
|
updating the microcode on platforms beyond the OEM End-Of-Life support,
|
|
and updating the microcode on long-running systems without rebooting.
|
|
|
|
The loader supports three loading methods:
|
|
|
|
Early load microcode
|
|
====================
|
|
|
|
The kernel can update microcode very early during boot. Loading
|
|
microcode early can fix CPU issues before they are observed during
|
|
kernel boot time.
|
|
|
|
The microcode is stored in an initrd file. During boot, it is read from
|
|
it and loaded into the CPU cores.
|
|
|
|
The format of the combined initrd image is microcode in (uncompressed)
|
|
cpio format followed by the (possibly compressed) initrd image. The
|
|
loader parses the combined initrd image during boot.
|
|
|
|
The microcode files in cpio name space are:
|
|
|
|
on Intel:
|
|
kernel/x86/microcode/GenuineIntel.bin
|
|
on AMD :
|
|
kernel/x86/microcode/AuthenticAMD.bin
|
|
|
|
During BSP (BootStrapping Processor) boot (pre-SMP), the kernel
|
|
scans the microcode file in the initrd. If microcode matching the
|
|
CPU is found, it will be applied in the BSP and later on in all APs
|
|
(Application Processors).
|
|
|
|
The loader also saves the matching microcode for the CPU in memory.
|
|
Thus, the cached microcode patch is applied when CPUs resume from a
|
|
sleep state.
|
|
|
|
Here's a crude example how to prepare an initrd with microcode (this is
|
|
normally done automatically by the distribution, when recreating the
|
|
initrd, so you don't really have to do it yourself. It is documented
|
|
here for future reference only).
|
|
::
|
|
|
|
#!/bin/bash
|
|
|
|
if [ -z "$1" ]; then
|
|
echo "You need to supply an initrd file"
|
|
exit 1
|
|
fi
|
|
|
|
INITRD="$1"
|
|
|
|
DSTDIR=kernel/x86/microcode
|
|
TMPDIR=/tmp/initrd
|
|
|
|
rm -rf $TMPDIR
|
|
|
|
mkdir $TMPDIR
|
|
cd $TMPDIR
|
|
mkdir -p $DSTDIR
|
|
|
|
if [ -d /lib/firmware/amd-ucode ]; then
|
|
cat /lib/firmware/amd-ucode/microcode_amd*.bin > $DSTDIR/AuthenticAMD.bin
|
|
fi
|
|
|
|
if [ -d /lib/firmware/intel-ucode ]; then
|
|
cat /lib/firmware/intel-ucode/* > $DSTDIR/GenuineIntel.bin
|
|
fi
|
|
|
|
find . | cpio -o -H newc >../ucode.cpio
|
|
cd ..
|
|
mv $INITRD $INITRD.orig
|
|
cat ucode.cpio $INITRD.orig > $INITRD
|
|
|
|
rm -rf $TMPDIR
|
|
|
|
|
|
The system needs to have the microcode packages installed into
|
|
/lib/firmware or you need to fixup the paths above if yours are
|
|
somewhere else and/or you've downloaded them directly from the processor
|
|
vendor's site.
|
|
|
|
Late loading
|
|
============
|
|
|
|
You simply install the microcode packages your distro supplies and
|
|
run::
|
|
|
|
# echo 1 > /sys/devices/system/cpu/microcode/reload
|
|
|
|
as root.
|
|
|
|
The loading mechanism looks for microcode blobs in
|
|
/lib/firmware/{intel-ucode,amd-ucode}. The default distro installation
|
|
packages already put them there.
|
|
|
|
Since kernel 5.19, late loading is not enabled by default.
|
|
|
|
The /dev/cpu/microcode method has been removed in 5.19.
|
|
|
|
Why is late loading dangerous?
|
|
==============================
|
|
|
|
Synchronizing all CPUs
|
|
----------------------
|
|
|
|
The microcode engine which receives the microcode update is shared
|
|
between the two logical threads in a SMT system. Therefore, when
|
|
the update is executed on one SMT thread of the core, the sibling
|
|
"automatically" gets the update.
|
|
|
|
Since the microcode can "simulate" MSRs too, while the microcode update
|
|
is in progress, those simulated MSRs transiently cease to exist. This
|
|
can result in unpredictable results if the SMT sibling thread happens to
|
|
be in the middle of an access to such an MSR. The usual observation is
|
|
that such MSR accesses cause #GPs to be raised to signal that former are
|
|
not present.
|
|
|
|
The disappearing MSRs are just one common issue which is being observed.
|
|
Any other instruction that's being patched and gets concurrently
|
|
executed by the other SMT sibling, can also result in similar,
|
|
unpredictable behavior.
|
|
|
|
To eliminate this case, a stop_machine()-based CPU synchronization was
|
|
introduced as a way to guarantee that all logical CPUs will not execute
|
|
any code but just wait in a spin loop, polling an atomic variable.
|
|
|
|
While this took care of device or external interrupts, IPIs including
|
|
LVT ones, such as CMCI etc, it cannot address other special interrupts
|
|
that can't be shut off. Those are Machine Check (#MC), System Management
|
|
(#SMI) and Non-Maskable interrupts (#NMI).
|
|
|
|
Machine Checks
|
|
--------------
|
|
|
|
Machine Checks (#MC) are non-maskable. There are two kinds of MCEs.
|
|
Fatal un-recoverable MCEs and recoverable MCEs. While un-recoverable
|
|
errors are fatal, recoverable errors can also happen in kernel context
|
|
are also treated as fatal by the kernel.
|
|
|
|
On certain Intel machines, MCEs are also broadcast to all threads in a
|
|
system. If one thread is in the middle of executing WRMSR, a MCE will be
|
|
taken at the end of the flow. Either way, they will wait for the thread
|
|
performing the wrmsr(0x79) to rendezvous in the MCE handler and shutdown
|
|
eventually if any of the threads in the system fail to check in to the
|
|
MCE rendezvous.
|
|
|
|
To be paranoid and get predictable behavior, the OS can choose to set
|
|
MCG_STATUS.MCIP. Since MCEs can be at most one in a system, if an
|
|
MCE was signaled, the above condition will promote to a system reset
|
|
automatically. OS can turn off MCIP at the end of the update for that
|
|
core.
|
|
|
|
System Management Interrupt
|
|
---------------------------
|
|
|
|
SMIs are also broadcast to all CPUs in the platform. Microcode update
|
|
requests exclusive access to the core before writing to MSR 0x79. So if
|
|
it does happen such that, one thread is in WRMSR flow, and the 2nd got
|
|
an SMI, that thread will be stopped in the first instruction in the SMI
|
|
handler.
|
|
|
|
Since the secondary thread is stopped in the first instruction in SMI,
|
|
there is very little chance that it would be in the middle of executing
|
|
an instruction being patched. Plus OS has no way to stop SMIs from
|
|
happening.
|
|
|
|
Non-Maskable Interrupts
|
|
-----------------------
|
|
|
|
When thread0 of a core is doing the microcode update, if thread1 is
|
|
pulled into NMI, that can cause unpredictable behavior due to the
|
|
reasons above.
|
|
|
|
OS can choose a variety of methods to avoid running into this situation.
|
|
|
|
|
|
Is the microcode suitable for late loading?
|
|
-------------------------------------------
|
|
|
|
Late loading is done when the system is fully operational and running
|
|
real workloads. Late loading behavior depends on what the base patch on
|
|
the CPU is before upgrading to the new patch.
|
|
|
|
This is true for Intel CPUs.
|
|
|
|
Consider, for example, a CPU has patch level 1 and the update is to
|
|
patch level 3.
|
|
|
|
Between patch1 and patch3, patch2 might have deprecated a software-visible
|
|
feature.
|
|
|
|
This is unacceptable if software is even potentially using that feature.
|
|
For instance, say MSR_X is no longer available after an update,
|
|
accessing that MSR will cause a #GP fault.
|
|
|
|
Basically there is no way to declare a new microcode update suitable
|
|
for late-loading. This is another one of the problems that caused late
|
|
loading to be not enabled by default.
|
|
|
|
Builtin microcode
|
|
=================
|
|
|
|
The loader supports also loading of a builtin microcode supplied through
|
|
the regular builtin firmware method CONFIG_EXTRA_FIRMWARE. Only 64-bit is
|
|
currently supported.
|
|
|
|
Here's an example::
|
|
|
|
CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09 amd-ucode/microcode_amd_fam15h.bin"
|
|
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
|
|
|
|
This basically means, you have the following tree structure locally::
|
|
|
|
/lib/firmware/
|
|
|-- amd-ucode
|
|
...
|
|
| |-- microcode_amd_fam15h.bin
|
|
...
|
|
|-- intel-ucode
|
|
...
|
|
| |-- 06-3a-09
|
|
...
|
|
|
|
so that the build system can find those files and integrate them into
|
|
the final kernel image. The early loader finds them and applies them.
|
|
|
|
Needless to say, this method is not the most flexible one because it
|
|
requires rebuilding the kernel each time updated microcode from the CPU
|
|
vendor is available.
|