Linus Torvalds 4f712ee0cb S390:
* Changes to FPU handling came in via the main s390 pull request
 
 * Only deliver to the guest the SCLP events that userspace has
   requested.
 
 * More virtual vs physical address fixes (only a cleanup since
   virtual and physical address spaces are currently the same).
 
 * Fix selftests undefined behavior.
 
 x86:
 
 * Fix a restriction that the guest can't program a PMU event whose
   encoding matches an architectural event that isn't included in the
   guest CPUID.  The enumeration of an architectural event only says
   that if a CPU supports an architectural event, then the event can be
   programmed *using the architectural encoding*.  The enumeration does
   NOT say anything about the encoding when the CPU doesn't report support
   the event *in general*.  It might support it, and it might support it
   using the same encoding that made it into the architectural PMU spec.
 
 * Fix a variety of bugs in KVM's emulation of RDPMC (more details on
   individual commits) and add a selftest to verify KVM correctly emulates
   RDMPC, counter availability, and a variety of other PMC-related
   behaviors that depend on guest CPUID and therefore are easier to
   validate with selftests than with custom guests (aka kvm-unit-tests).
 
 * Zero out PMU state on AMD if the virtual PMU is disabled, it does not
   cause any bug but it wastes time in various cases where KVM would check
   if a PMC event needs to be synthesized.
 
 * Optimize triggering of emulated events, with a nice ~10% performance
   improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
   guest.
 
 * Tighten the check for "PMI in guest" to reduce false positives if an NMI
   arrives in the host while KVM is handling an IRQ VM-Exit.
 
 * Fix a bug where KVM would report stale/bogus exit qualification information
   when exiting to userspace with an internal error exit code.
 
 * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
 
 * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
   read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
 
 * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB).  KVM
   doesn't support yielding in the middle of processing a zap, and 1GiB
   granularity resulted in multi-millisecond lags that are quite impolite
   for CONFIG_PREEMPT kernels.
 
 * Allocate write-tracking metadata on-demand to avoid the memory overhead when
   a kernel is built with i915 virtualization support but the workloads use
   neither shadow paging nor i915 virtualization.
 
 * Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives.
 
 * Fix the debugregs ABI for 32-bit KVM.
 
 * Rework the "force immediate exit" code so that vendor code ultimately decides
   how and when to force the exit, which allowed some optimization for both
   Intel and AMD.
 
 * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, causing extra unnecessary work.
 
 * Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
 
 * Harden against underflowing the active mmu_notifier invalidation
   count, so that "bad" invalidations (usually due to bugs elsehwere in the
   kernel) are detected earlier and are less likely to hang the kernel.
 
 x86 Xen emulation:
 
 * Overlay pages can now be cached based on host virtual address,
   instead of guest physical addresses.  This removes the need to
   reconfigure and invalidate the cache if the guest changes the
   gpa but the underlying host virtual address remains the same.
 
 * When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.
 
 * Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).
 
 * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.
 
 RISC-V:
 
 * Support exception and interrupt handling in selftests
 
 * New self test for RISC-V architectural timer (Sstc extension)
 
 * New extension support (Ztso, Zacas)
 
 * Support userspace emulation of random number seed CSRs.
 
 ARM:
 
 * Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers
 
 * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it
 
 * Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path
 
 * Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register
 
 * Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
 
 LoongArch:
 
 * Set reserved bits as zero in CPUCFG.
 
 * Start SW timer only when vcpu is blocking.
 
 * Do not restart SW timer when it is expired.
 
 * Remove unnecessary CSR register saving during enter guest.
 
 * Misc cleanups and fixes as usual.
 
 Generic:
 
 * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
   true on all architectures except MIPS (where Kconfig determines the
   available depending on CPU capabilities).  It is replaced either by
   an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
   everywhere else.
 
 * Factor common "select" statements in common code instead of requiring
   each architecture to specify it
 
 * Remove thoroughly obsolete APIs from the uapi headers.
 
 * Move architecture-dependent stuff to uapi/asm/kvm.h
 
 * Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
 
 * Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, so that there's no need to remember
   to *conditionally* clean up after the worker.
 
 Selftests:
 
 * Reduce boilerplate especially when utilize selftest TAP infrastructure.
 
 * Add basic smoke tests for SEV and SEV-ES, along with a pile of library
   support for handling private/encrypted/protected memory.
 
 * Fix benign bugs where tests neglect to close() guest_memfd files.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
 eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
 j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
 Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
 =mqOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00

351 lines
9.3 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* Interrupt descriptor table related code
*/
#include <linux/interrupt.h>
#include <asm/cpu_entry_area.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
#include <asm/proto.h>
#include <asm/desc.h>
#include <asm/hw_irq.h>
#include <asm/ia32.h>
#include <asm/idtentry.h>
#define DPL0 0x0
#define DPL3 0x3
#define DEFAULT_STACK 0
#define G(_vector, _addr, _ist, _type, _dpl, _segment) \
{ \
.vector = _vector, \
.bits.ist = _ist, \
.bits.type = _type, \
.bits.dpl = _dpl, \
.bits.p = 1, \
.addr = _addr, \
.segment = _segment, \
}
/* Interrupt gate */
#define INTG(_vector, _addr) \
G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL0, __KERNEL_CS)
/* System interrupt gate */
#define SYSG(_vector, _addr) \
G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL3, __KERNEL_CS)
#ifdef CONFIG_X86_64
/*
* Interrupt gate with interrupt stack. The _ist index is the index in
* the tss.ist[] array, but for the descriptor it needs to start at 1.
*/
#define ISTG(_vector, _addr, _ist) \
G(_vector, _addr, _ist + 1, GATE_INTERRUPT, DPL0, __KERNEL_CS)
#else
#define ISTG(_vector, _addr, _ist) INTG(_vector, _addr)
#endif
/* Task gate */
#define TSKG(_vector, _gdt) \
G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)
#define IDT_TABLE_SIZE (IDT_ENTRIES * sizeof(gate_desc))
static bool idt_setup_done __initdata;
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_idts[] = {
INTG(X86_TRAP_DB, asm_exc_debug),
SYSG(X86_TRAP_BP, asm_exc_int3),
#ifdef CONFIG_X86_32
/*
* Not possible on 64-bit. See idt_setup_early_pf() for details.
*/
INTG(X86_TRAP_PF, asm_exc_page_fault),
#endif
#ifdef CONFIG_INTEL_TDX_GUEST
INTG(X86_TRAP_VE, asm_exc_virtualization_exception),
#endif
};
/*
* The default IDT entries which are set up in trap_init() before
* cpu_init() is invoked. Interrupt stacks cannot be used at that point and
* the traps which use them are reinitialized with IST after cpu_init() has
* set up TSS.
*/
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, asm_exc_divide_error),
ISTG(X86_TRAP_NMI, asm_exc_nmi, IST_INDEX_NMI),
INTG(X86_TRAP_BR, asm_exc_bounds),
INTG(X86_TRAP_UD, asm_exc_invalid_op),
INTG(X86_TRAP_NM, asm_exc_device_not_available),
INTG(X86_TRAP_OLD_MF, asm_exc_coproc_segment_overrun),
INTG(X86_TRAP_TS, asm_exc_invalid_tss),
INTG(X86_TRAP_NP, asm_exc_segment_not_present),
INTG(X86_TRAP_SS, asm_exc_stack_segment),
INTG(X86_TRAP_GP, asm_exc_general_protection),
INTG(X86_TRAP_SPURIOUS, asm_exc_spurious_interrupt_bug),
INTG(X86_TRAP_MF, asm_exc_coprocessor_error),
INTG(X86_TRAP_AC, asm_exc_alignment_check),
INTG(X86_TRAP_XF, asm_exc_simd_coprocessor_error),
#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
#else
ISTG(X86_TRAP_DF, asm_exc_double_fault, IST_INDEX_DF),
#endif
ISTG(X86_TRAP_DB, asm_exc_debug, IST_INDEX_DB),
#ifdef CONFIG_X86_MCE
ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE),
#endif
#ifdef CONFIG_X86_CET
INTG(X86_TRAP_CP, asm_exc_control_protection),
#endif
#ifdef CONFIG_AMD_MEM_ENCRYPT
ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
#endif
SYSG(X86_TRAP_OF, asm_exc_overflow),
};
static const struct idt_data ia32_idt[] __initconst = {
#if defined(CONFIG_IA32_EMULATION)
SYSG(IA32_SYSCALL_VECTOR, asm_int80_emulation),
#elif defined(CONFIG_X86_32)
SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32),
#endif
};
/*
* The APIC and SMP idt entries
*/
static const __initconst struct idt_data apic_idts[] = {
#ifdef CONFIG_SMP
INTG(RESCHEDULE_VECTOR, asm_sysvec_reschedule_ipi),
INTG(CALL_FUNCTION_VECTOR, asm_sysvec_call_function),
INTG(CALL_FUNCTION_SINGLE_VECTOR, asm_sysvec_call_function_single),
INTG(REBOOT_VECTOR, asm_sysvec_reboot),
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
INTG(THERMAL_APIC_VECTOR, asm_sysvec_thermal),
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
INTG(THRESHOLD_APIC_VECTOR, asm_sysvec_threshold),
#endif
#ifdef CONFIG_X86_MCE_AMD
INTG(DEFERRED_ERROR_VECTOR, asm_sysvec_deferred_error),
#endif
#ifdef CONFIG_X86_LOCAL_APIC
INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt),
INTG(X86_PLATFORM_IPI_VECTOR, asm_sysvec_x86_platform_ipi),
# if IS_ENABLED(CONFIG_KVM)
INTG(POSTED_INTR_VECTOR, asm_sysvec_kvm_posted_intr_ipi),
INTG(POSTED_INTR_WAKEUP_VECTOR, asm_sysvec_kvm_posted_intr_wakeup_ipi),
INTG(POSTED_INTR_NESTED_VECTOR, asm_sysvec_kvm_posted_intr_nested_ipi),
# endif
# ifdef CONFIG_IRQ_WORK
INTG(IRQ_WORK_VECTOR, asm_sysvec_irq_work),
# endif
INTG(SPURIOUS_APIC_VECTOR, asm_sysvec_spurious_apic_interrupt),
INTG(ERROR_APIC_VECTOR, asm_sysvec_error_interrupt),
#endif
};
/* Must be page-aligned because the real IDT is used in the cpu entry area */
static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
static struct desc_ptr idt_descr __ro_after_init = {
.size = IDT_TABLE_SIZE - 1,
.address = (unsigned long) idt_table,
};
void load_current_idt(void)
{
lockdep_assert_irqs_disabled();
load_idt(&idt_descr);
}
#ifdef CONFIG_X86_F00F_BUG
bool idt_is_f00f_address(unsigned long address)
{
return ((address - idt_descr.address) >> 3) == 6;
}
#endif
static __init void
idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sys)
{
gate_desc desc;
for (; size > 0; t++, size--) {
idt_init_desc(&desc, t);
write_idt_entry(idt, t->vector, &desc);
if (sys)
set_bit(t->vector, system_vectors);
}
}
static __init void set_intr_gate(unsigned int n, const void *addr)
{
struct idt_data data;
init_idt_data(&data, n, addr);
idt_setup_from_table(idt_table, &data, 1, false);
}
/**
* idt_setup_early_traps - Initialize the idt table with early traps
*
* On X8664 these traps do not use interrupt stacks as they can't work
* before cpu_init() is invoked and sets up TSS. The IST variants are
* installed after that.
*/
void __init idt_setup_early_traps(void)
{
idt_setup_from_table(idt_table, early_idts, ARRAY_SIZE(early_idts),
true);
load_idt(&idt_descr);
}
/**
* idt_setup_traps - Initialize the idt table with default traps
*/
void __init idt_setup_traps(void)
{
idt_setup_from_table(idt_table, def_idts, ARRAY_SIZE(def_idts), true);
if (ia32_enabled())
idt_setup_from_table(idt_table, ia32_idt, ARRAY_SIZE(ia32_idt), true);
}
#ifdef CONFIG_X86_64
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, asm_exc_page_fault),
};
/**
* idt_setup_early_pf - Initialize the idt table with early pagefault handler
*
* On X8664 this does not use interrupt stacks as they can't work before
* cpu_init() is invoked and sets up TSS. The IST variant is installed
* after that.
*
* Note, that X86_64 cannot install the real #PF handler in
* idt_setup_early_traps() because the memory initialization needs the #PF
* handler from the early_idt_handler_array to initialize the early page
* tables.
*/
void __init idt_setup_early_pf(void)
{
idt_setup_from_table(idt_table, early_pf_idts,
ARRAY_SIZE(early_pf_idts), true);
}
#endif
static void __init idt_map_in_cea(void)
{
/*
* Set the IDT descriptor to a fixed read-only location in the cpu
* entry area, so that the "sidt" instruction will not leak the
* location of the kernel, and to defend the IDT against arbitrary
* memory write vulnerabilities.
*/
cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
PAGE_KERNEL_RO);
idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
}
/**
* idt_setup_apic_and_irq_gates - Setup APIC/SMP and normal interrupt gates
*/
void __init idt_setup_apic_and_irq_gates(void)
{
int i = FIRST_EXTERNAL_VECTOR;
void *entry;
idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR);
set_intr_gate(i, entry);
}
#ifdef CONFIG_X86_LOCAL_APIC
for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
/*
* Don't set the non assigned system vectors in the
* system_vectors bitmap. Otherwise they show up in
* /proc/interrupts.
*/
entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR);
set_intr_gate(i, entry);
}
#endif
/* Map IDT into CPU entry area and reload it. */
idt_map_in_cea();
load_idt(&idt_descr);
/* Make the IDT table read only */
set_memory_ro((unsigned long)&idt_table, 1);
idt_setup_done = true;
}
/**
* idt_setup_early_handler - Initializes the idt table with early handlers
*/
void __init idt_setup_early_handler(void)
{
int i;
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
set_intr_gate(i, early_idt_handler_array[i]);
#ifdef CONFIG_X86_32
for ( ; i < NR_VECTORS; i++)
set_intr_gate(i, early_ignore_irq);
#endif
load_idt(&idt_descr);
}
/**
* idt_invalidate - Invalidate interrupt descriptor table
*/
void idt_invalidate(void)
{
static const struct desc_ptr idt = { .address = 0, .size = 0 };
load_idt(&idt);
}
void __init idt_install_sysvec(unsigned int n, const void *function)
{
if (WARN_ON(n < FIRST_SYSTEM_VECTOR))
return;
if (WARN_ON(idt_setup_done))
return;
if (!WARN_ON(test_and_set_bit(n, system_vectors)))
set_intr_gate(n, function);
}