linux/Documentation
Isaku Yamahata 6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
    "The APIC timer frequency will be the processor’s bus clock or core
     crystal clock frequency (when TSC/core crystal clock ratio is
     enumerated in CPUID leaf 0x15) divided by the value specified in
     the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

Reported-by: Vishal Annapurve <vannapurve@google.com>
Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-05 06:18:27 -07:00
..
ABI Char/Misc and other driver subsystem changes for 6.10-rc1 2024-05-22 12:26:46 -07:00
accel
accounting
admin-guide platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline 2024-05-27 11:42:57 +02:00
arch Documentation: RISC-V: uabi: Only scalar misaligned loads are supported 2024-05-30 09:42:53 -07:00
block
bpf bpf, docs: Fix the description of 'src' in ALU instructions 2024-05-15 09:34:54 -07:00
cdrom
core-api Documentation/core-api: correct reference to SWIOTLB_DYNAMIC 2024-05-27 16:52:09 +02:00
cpu-freq
crypto
dev-tools Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
devicetree Including fixes from bpf and netfilter. 2024-05-30 08:33:04 -07:00
doc-guide
driver-api Char/Misc and other driver subsystem changes for 6.10-rc1 2024-05-22 12:26:46 -07:00
fault-injection
fb
features
filesystems 16 hotfixes, 11 of which are cc:stable. 2024-05-25 15:10:33 -07:00
firmware_class
firmware-guide Documentation: firmware-guide: ACPI: Fix namespace typo 2024-04-26 18:58:13 +02:00
fpga
gpu
hid Merge branch 'for-6.10/intel-ish' into for-linus 2024-05-14 13:53:15 +02:00
hwmon hwmon: (emc1403) Add support for EMC1428 and EMC1438. 2024-05-12 09:02:00 -07:00
i2c
iio docs: iio: ad7944: add documentation for chain mode 2024-04-29 20:53:25 +01:00
images
infiniband
input
isdn
kbuild kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
kernel-hacking
leds
litmus-tests Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus 2024-05-06 14:29:21 -07:00
livepatch
locking
maintainer
mhi
misc-devices
mm The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
netlabel
netlink netdev: add qstat for csum complete 2024-05-30 12:15:56 +02:00
networking net: revert partially applied PHY topology series 2024-05-13 18:35:02 -07:00
nvdimm
nvme
PCI Merge branch 'pci/enumeration' 2024-05-16 18:14:10 -05:00
pcmcia
peci
power
process docs: netdev: Fix typo in Signed-off-by tag 2024-05-27 17:15:22 -07:00
RCU
rust RISC-V Patches for the 6.10 Merge Window, Part 1 2024-05-22 09:56:00 -07:00
scheduler
scsi
security Another not-too-busy cycle for documentation, including: 2024-05-13 10:51:53 -07:00
sound Documentation: sound: Fix trailing whitespaces 2024-05-16 16:00:30 +02:00
sphinx docs: kernel_include.py: Cope with docutils 0.21 2024-05-02 09:50:59 -06:00
sphinx-static
spi spi: pxa2xx: Drop the stale entry in documentation TOC 2024-05-07 23:53:21 +09:00
staging
target
tee
timers sched/isolation: Prevent boot crash when the boot CPU is nohz_full 2024-04-28 10:07:12 +02:00
tools rtla: Documentation: Fix -t, --trace 2024-05-16 16:52:16 +02:00
trace Char/Misc and other driver subsystem changes for 6.10-rc1 2024-05-22 12:26:46 -07:00
translations pci-v6.10-changes 2024-05-21 10:09:28 -07:00
usb
userspace-api mseal: add documentation 2024-05-23 19:40:26 -07:00
virt KVM: x86: Add a capability to configure bus frequency for APIC timer 2024-06-05 06:18:27 -07:00
w1
watchdog
wmi platform/x86: wmi: Add MSI WMI Platform driver 2024-04-29 12:06:21 +02:00
.gitignore
atomic_bitops.txt
atomic_t.txt Documentation/atomic_t: Emphasize that failed atomic operations give no ordering 2024-05-06 14:29:04 -07:00
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile Kbuild updates for v6.10 2024-05-18 12:39:20 -07:00
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst