KVM: x86: Add a capability to configure bus frequency for APIC timer

Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
    "The APIC timer frequency will be the processor’s bus clock or core
     crystal clock frequency (when TSC/core crystal clock ratio is
     enumerated in CPUID leaf 0x15) divided by the value specified in
     the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

Reported-by: Vishal Annapurve <vannapurve@google.com>
Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
This commit is contained in:
Isaku Yamahata 2024-04-25 15:07:01 -07:00 committed by Sean Christopherson
parent b460256b16
commit 6fef518594
3 changed files with 45 additions and 0 deletions

View File

@ -8070,6 +8070,23 @@ error/annotated fault.
See KVM_EXIT_MEMORY_FAULT for more information. See KVM_EXIT_MEMORY_FAULT for more information.
7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
-----------------------------------
:Architectures: x86
:Target: VM
:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
frequency or if any vCPUs have been created, -ENXIO if a virtual
local APIC has not been created using KVM_CREATE_IRQCHIP.
This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel
virtual APIC when emulating APIC timers. KVM's default value can be retrieved
by KVM_CHECK_EXTENSION.
Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
8. Other capabilities. 8. Other capabilities.
====================== ======================

View File

@ -4706,6 +4706,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MEMORY_FAULT_INFO: case KVM_CAP_MEMORY_FAULT_INFO:
r = 1; r = 1;
break; break;
case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
r = APIC_BUS_CYCLE_NS_DEFAULT;
break;
case KVM_CAP_EXIT_HYPERCALL: case KVM_CAP_EXIT_HYPERCALL:
r = KVM_EXIT_HYPERCALL_VALID_MASK; r = KVM_EXIT_HYPERCALL_VALID_MASK;
break; break;
@ -6746,6 +6749,30 @@ split_irqchip_unlock:
} }
mutex_unlock(&kvm->lock); mutex_unlock(&kvm->lock);
break; break;
case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
u64 bus_cycle_ns = cap->args[0];
u64 unused;
/*
* Guard against overflow in tmict_to_ns(). 128 is the highest
* divide value that can be programmed in APIC_TDCR.
*/
r = -EINVAL;
if (!bus_cycle_ns ||
check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
break;
r = 0;
mutex_lock(&kvm->lock);
if (!irqchip_in_kernel(kvm))
r = -ENXIO;
else if (kvm->created_vcpus)
r = -EINVAL;
else
kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
mutex_unlock(&kvm->lock);
break;
}
default: default:
r = -EINVAL; r = -EINVAL;
break; break;

View File

@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235 #define KVM_CAP_VM_TYPES 235
#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
struct kvm_irq_routing_irqchip { struct kvm_irq_routing_irqchip {
__u32 irqchip; __u32 irqchip;