8fa590bf34
* Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. * Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit382b5b87a9
: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). * Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. * Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. * Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. * Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: * Second batch of the lazy destroy patches * First batch of KVM changes for kernel virtual != physical address support * Removal of a unused function x86: * Allow compiling out SMM support * Cleanup and documentation of SMM state save area format * Preserve interrupt shadow in SMM state save area * Respond to generic signals during slow page faults * Fixes and optimizations for the non-executable huge page errata fix. * Reprogram all performance counters on PMU filter change * Cleanups to Hyper-V emulation and tests * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) * Advertise several new Intel features * x86 Xen-for-KVM: ** Allow the Xen runstate information to cross a page boundary ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured ** Add support for 32-bit guests in SCHEDOP_poll * Notable x86 fixes and cleanups: ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. ** Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. ** Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. ** Advertise (on AMD) that the SMM_CTL MSR is not supported ** Remove unnecessary exports Generic: * Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: * Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. * Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. * Introduce actual atomics for clear/set_bit() in selftests * Add support for pinning vCPUs in dirty_log_perf_test. * Rename the so called "perf_util" framework to "memstress". * Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. * Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. * Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). * A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. * x86-specific selftest changes: ** Clean up x86's page table management. ** Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. ** Clean up the nEPT support checks. ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. ** Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: * Remove deleted ioctls from documentation * Clean up the docs for the x86 MSR filter. * Various fixes -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA== =QAYX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit382b5b87a9
: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
799 lines
20 KiB
C
799 lines
20 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
/*
|
|
* This file contains definitions from Hyper-V Hypervisor Top-Level Functional
|
|
* Specification (TLFS):
|
|
* https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
|
|
*/
|
|
|
|
#ifndef _ASM_GENERIC_HYPERV_TLFS_H
|
|
#define _ASM_GENERIC_HYPERV_TLFS_H
|
|
|
|
#include <linux/types.h>
|
|
#include <linux/bits.h>
|
|
#include <linux/time64.h>
|
|
|
|
/*
|
|
* While not explicitly listed in the TLFS, Hyper-V always runs with a page size
|
|
* of 4096. These definitions are used when communicating with Hyper-V using
|
|
* guest physical pages and guest physical page addresses, since the guest page
|
|
* size may not be 4096 on all architectures.
|
|
*/
|
|
#define HV_HYP_PAGE_SHIFT 12
|
|
#define HV_HYP_PAGE_SIZE BIT(HV_HYP_PAGE_SHIFT)
|
|
#define HV_HYP_PAGE_MASK (~(HV_HYP_PAGE_SIZE - 1))
|
|
|
|
/*
|
|
* Hyper-V provides two categories of flags relevant to guest VMs. The
|
|
* "Features" category indicates specific functionality that is available
|
|
* to guests on this particular instance of Hyper-V. The "Features"
|
|
* are presented in four groups, each of which is 32 bits. The group A
|
|
* and B definitions are common across architectures and are listed here.
|
|
* However, not all flags are relevant on all architectures.
|
|
*
|
|
* Groups C and D vary across architectures and are listed in the
|
|
* architecture specific portion of hyperv-tlfs.h. Some of these flags exist
|
|
* on multiple architectures, but the bit positions are different so they
|
|
* cannot appear in the generic portion of hyperv-tlfs.h.
|
|
*
|
|
* The "Enlightenments" category provides recommendations on whether to use
|
|
* specific enlightenments that are available. The Enlighenments are a single
|
|
* group of 32 bits, but they vary across architectures and are listed in
|
|
* the architecture specific portion of hyperv-tlfs.h.
|
|
*/
|
|
|
|
/*
|
|
* Group A Features.
|
|
*/
|
|
|
|
/* VP Runtime register available */
|
|
#define HV_MSR_VP_RUNTIME_AVAILABLE BIT(0)
|
|
/* Partition Reference Counter available*/
|
|
#define HV_MSR_TIME_REF_COUNT_AVAILABLE BIT(1)
|
|
/* Basic SynIC register available */
|
|
#define HV_MSR_SYNIC_AVAILABLE BIT(2)
|
|
/* Synthetic Timer registers available */
|
|
#define HV_MSR_SYNTIMER_AVAILABLE BIT(3)
|
|
/* Virtual APIC assist and VP assist page registers available */
|
|
#define HV_MSR_APIC_ACCESS_AVAILABLE BIT(4)
|
|
/* Hypercall and Guest OS ID registers available*/
|
|
#define HV_MSR_HYPERCALL_AVAILABLE BIT(5)
|
|
/* Access virtual processor index register available*/
|
|
#define HV_MSR_VP_INDEX_AVAILABLE BIT(6)
|
|
/* Virtual system reset register available*/
|
|
#define HV_MSR_RESET_AVAILABLE BIT(7)
|
|
/* Access statistics page registers available */
|
|
#define HV_MSR_STAT_PAGES_AVAILABLE BIT(8)
|
|
/* Partition reference TSC register is available */
|
|
#define HV_MSR_REFERENCE_TSC_AVAILABLE BIT(9)
|
|
/* Partition Guest IDLE register is available */
|
|
#define HV_MSR_GUEST_IDLE_AVAILABLE BIT(10)
|
|
/* Partition local APIC and TSC frequency registers available */
|
|
#define HV_ACCESS_FREQUENCY_MSRS BIT(11)
|
|
/* AccessReenlightenmentControls privilege */
|
|
#define HV_ACCESS_REENLIGHTENMENT BIT(13)
|
|
/* AccessTscInvariantControls privilege */
|
|
#define HV_ACCESS_TSC_INVARIANT BIT(15)
|
|
|
|
/*
|
|
* Group B features.
|
|
*/
|
|
#define HV_CREATE_PARTITIONS BIT(0)
|
|
#define HV_ACCESS_PARTITION_ID BIT(1)
|
|
#define HV_ACCESS_MEMORY_POOL BIT(2)
|
|
#define HV_ADJUST_MESSAGE_BUFFERS BIT(3)
|
|
#define HV_POST_MESSAGES BIT(4)
|
|
#define HV_SIGNAL_EVENTS BIT(5)
|
|
#define HV_CREATE_PORT BIT(6)
|
|
#define HV_CONNECT_PORT BIT(7)
|
|
#define HV_ACCESS_STATS BIT(8)
|
|
#define HV_DEBUGGING BIT(11)
|
|
#define HV_CPU_MANAGEMENT BIT(12)
|
|
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
|
|
#define HV_ISOLATION BIT(22)
|
|
|
|
/*
|
|
* TSC page layout.
|
|
*/
|
|
struct ms_hyperv_tsc_page {
|
|
volatile u32 tsc_sequence;
|
|
u32 reserved1;
|
|
volatile u64 tsc_scale;
|
|
volatile s64 tsc_offset;
|
|
} __packed;
|
|
|
|
union hv_reference_tsc_msr {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 enable:1;
|
|
u64 reserved:11;
|
|
u64 pfn:52;
|
|
} __packed;
|
|
};
|
|
|
|
/*
|
|
* The guest OS needs to register the guest ID with the hypervisor.
|
|
* The guest ID is a 64 bit entity and the structure of this ID is
|
|
* specified in the Hyper-V specification:
|
|
*
|
|
* msdn.microsoft.com/en-us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
|
|
*
|
|
* While the current guideline does not specify how Linux guest ID(s)
|
|
* need to be generated, our plan is to publish the guidelines for
|
|
* Linux and other guest operating systems that currently are hosted
|
|
* on Hyper-V. The implementation here conforms to this yet
|
|
* unpublished guidelines.
|
|
*
|
|
*
|
|
* Bit(s)
|
|
* 63 - Indicates if the OS is Open Source or not; 1 is Open Source
|
|
* 62:56 - Os Type; Linux is 0x100
|
|
* 55:48 - Distro specific identification
|
|
* 47:16 - Linux kernel version number
|
|
* 15:0 - Distro specific identification
|
|
*
|
|
*
|
|
*/
|
|
|
|
#define HV_LINUX_VENDOR_ID 0x8100
|
|
|
|
/*
|
|
* Crash notification flags.
|
|
*/
|
|
#define HV_CRASH_CTL_CRASH_NOTIFY_MSG BIT_ULL(62)
|
|
#define HV_CRASH_CTL_CRASH_NOTIFY BIT_ULL(63)
|
|
|
|
/* Declare the various hypercall operations. */
|
|
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
|
|
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
|
|
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
|
|
#define HVCALL_SEND_IPI 0x000b
|
|
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
|
|
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
|
|
#define HVCALL_SEND_IPI_EX 0x0015
|
|
#define HVCALL_GET_PARTITION_ID 0x0046
|
|
#define HVCALL_DEPOSIT_MEMORY 0x0048
|
|
#define HVCALL_CREATE_VP 0x004e
|
|
#define HVCALL_GET_VP_REGISTERS 0x0050
|
|
#define HVCALL_SET_VP_REGISTERS 0x0051
|
|
#define HVCALL_POST_MESSAGE 0x005c
|
|
#define HVCALL_SIGNAL_EVENT 0x005d
|
|
#define HVCALL_POST_DEBUG_DATA 0x0069
|
|
#define HVCALL_RETRIEVE_DEBUG_DATA 0x006a
|
|
#define HVCALL_RESET_DEBUG_SESSION 0x006b
|
|
#define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
|
|
#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
|
|
#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
|
|
#define HVCALL_RETARGET_INTERRUPT 0x007e
|
|
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
|
|
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
|
|
#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
|
|
|
|
/* Extended hypercalls */
|
|
#define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001
|
|
#define HV_EXT_CALL_MEMORY_HEAT_HINT 0x8003
|
|
|
|
#define HV_FLUSH_ALL_PROCESSORS BIT(0)
|
|
#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES BIT(1)
|
|
#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY BIT(2)
|
|
#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT BIT(3)
|
|
|
|
/* Extended capability bits */
|
|
#define HV_EXT_CAPABILITY_MEMORY_COLD_DISCARD_HINT BIT(8)
|
|
|
|
enum HV_GENERIC_SET_FORMAT {
|
|
HV_GENERIC_SET_SPARSE_4K,
|
|
HV_GENERIC_SET_ALL,
|
|
};
|
|
|
|
#define HV_PARTITION_ID_SELF ((u64)-1)
|
|
#define HV_VP_INDEX_SELF ((u32)-2)
|
|
|
|
#define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
|
|
#define HV_HYPERCALL_FAST_BIT BIT(16)
|
|
#define HV_HYPERCALL_VARHEAD_OFFSET 17
|
|
#define HV_HYPERCALL_VARHEAD_MASK GENMASK_ULL(26, 17)
|
|
#define HV_HYPERCALL_RSVD0_MASK GENMASK_ULL(31, 27)
|
|
#define HV_HYPERCALL_REP_COMP_OFFSET 32
|
|
#define HV_HYPERCALL_REP_COMP_1 BIT_ULL(32)
|
|
#define HV_HYPERCALL_REP_COMP_MASK GENMASK_ULL(43, 32)
|
|
#define HV_HYPERCALL_RSVD1_MASK GENMASK_ULL(47, 44)
|
|
#define HV_HYPERCALL_REP_START_OFFSET 48
|
|
#define HV_HYPERCALL_REP_START_MASK GENMASK_ULL(59, 48)
|
|
#define HV_HYPERCALL_RSVD2_MASK GENMASK_ULL(63, 60)
|
|
#define HV_HYPERCALL_RSVD_MASK (HV_HYPERCALL_RSVD0_MASK | \
|
|
HV_HYPERCALL_RSVD1_MASK | \
|
|
HV_HYPERCALL_RSVD2_MASK)
|
|
|
|
/* hypercall status code */
|
|
#define HV_STATUS_SUCCESS 0
|
|
#define HV_STATUS_INVALID_HYPERCALL_CODE 2
|
|
#define HV_STATUS_INVALID_HYPERCALL_INPUT 3
|
|
#define HV_STATUS_INVALID_ALIGNMENT 4
|
|
#define HV_STATUS_INVALID_PARAMETER 5
|
|
#define HV_STATUS_ACCESS_DENIED 6
|
|
#define HV_STATUS_OPERATION_DENIED 8
|
|
#define HV_STATUS_INSUFFICIENT_MEMORY 11
|
|
#define HV_STATUS_INVALID_PORT_ID 17
|
|
#define HV_STATUS_INVALID_CONNECTION_ID 18
|
|
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
|
|
|
|
/*
|
|
* The Hyper-V TimeRefCount register and the TSC
|
|
* page provide a guest VM clock with 100ns tick rate
|
|
*/
|
|
#define HV_CLOCK_HZ (NSEC_PER_SEC/100)
|
|
|
|
/* Define the number of synthetic interrupt sources. */
|
|
#define HV_SYNIC_SINT_COUNT (16)
|
|
/* Define the expected SynIC version. */
|
|
#define HV_SYNIC_VERSION_1 (0x1)
|
|
/* Valid SynIC vectors are 16-255. */
|
|
#define HV_SYNIC_FIRST_VALID_VECTOR (16)
|
|
|
|
#define HV_SYNIC_CONTROL_ENABLE (1ULL << 0)
|
|
#define HV_SYNIC_SIMP_ENABLE (1ULL << 0)
|
|
#define HV_SYNIC_SIEFP_ENABLE (1ULL << 0)
|
|
#define HV_SYNIC_SINT_MASKED (1ULL << 16)
|
|
#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
|
|
#define HV_SYNIC_SINT_VECTOR_MASK (0xFF)
|
|
|
|
#define HV_SYNIC_STIMER_COUNT (4)
|
|
|
|
/* Define synthetic interrupt controller message constants. */
|
|
#define HV_MESSAGE_SIZE (256)
|
|
#define HV_MESSAGE_PAYLOAD_BYTE_COUNT (240)
|
|
#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
|
|
|
|
/*
|
|
* Define hypervisor message types. Some of the message types
|
|
* are x86/x64 specific, but there's no good way to separate
|
|
* them out into the arch-specific version of hyperv-tlfs.h
|
|
* because C doesn't provide a way to extend enum types.
|
|
* Keeping them all in the arch neutral hyperv-tlfs.h seems
|
|
* the least messy compromise.
|
|
*/
|
|
enum hv_message_type {
|
|
HVMSG_NONE = 0x00000000,
|
|
|
|
/* Memory access messages. */
|
|
HVMSG_UNMAPPED_GPA = 0x80000000,
|
|
HVMSG_GPA_INTERCEPT = 0x80000001,
|
|
|
|
/* Timer notification messages. */
|
|
HVMSG_TIMER_EXPIRED = 0x80000010,
|
|
|
|
/* Error messages. */
|
|
HVMSG_INVALID_VP_REGISTER_VALUE = 0x80000020,
|
|
HVMSG_UNRECOVERABLE_EXCEPTION = 0x80000021,
|
|
HVMSG_UNSUPPORTED_FEATURE = 0x80000022,
|
|
|
|
/* Trace buffer complete messages. */
|
|
HVMSG_EVENTLOG_BUFFERCOMPLETE = 0x80000040,
|
|
|
|
/* Platform-specific processor intercept messages. */
|
|
HVMSG_X64_IOPORT_INTERCEPT = 0x80010000,
|
|
HVMSG_X64_MSR_INTERCEPT = 0x80010001,
|
|
HVMSG_X64_CPUID_INTERCEPT = 0x80010002,
|
|
HVMSG_X64_EXCEPTION_INTERCEPT = 0x80010003,
|
|
HVMSG_X64_APIC_EOI = 0x80010004,
|
|
HVMSG_X64_LEGACY_FP_ERROR = 0x80010005
|
|
};
|
|
|
|
/* Define synthetic interrupt controller message flags. */
|
|
union hv_message_flags {
|
|
__u8 asu8;
|
|
struct {
|
|
__u8 msg_pending:1;
|
|
__u8 reserved:7;
|
|
} __packed;
|
|
};
|
|
|
|
/* Define port identifier type. */
|
|
union hv_port_id {
|
|
__u32 asu32;
|
|
struct {
|
|
__u32 id:24;
|
|
__u32 reserved:8;
|
|
} __packed u;
|
|
};
|
|
|
|
/* Define synthetic interrupt controller message header. */
|
|
struct hv_message_header {
|
|
__u32 message_type;
|
|
__u8 payload_size;
|
|
union hv_message_flags message_flags;
|
|
__u8 reserved[2];
|
|
union {
|
|
__u64 sender;
|
|
union hv_port_id port;
|
|
};
|
|
} __packed;
|
|
|
|
/* Define synthetic interrupt controller message format. */
|
|
struct hv_message {
|
|
struct hv_message_header header;
|
|
union {
|
|
__u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
|
|
} u;
|
|
} __packed;
|
|
|
|
/* Define the synthetic interrupt message page layout. */
|
|
struct hv_message_page {
|
|
struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
|
|
} __packed;
|
|
|
|
/* Define timer message payload structure. */
|
|
struct hv_timer_message_payload {
|
|
__u32 timer_index;
|
|
__u32 reserved;
|
|
__u64 expiration_time; /* When the timer expired */
|
|
__u64 delivery_time; /* When the message was delivered */
|
|
} __packed;
|
|
|
|
|
|
/* Define synthetic interrupt controller flag constants. */
|
|
#define HV_EVENT_FLAGS_COUNT (256 * 8)
|
|
#define HV_EVENT_FLAGS_LONG_COUNT (256 / sizeof(unsigned long))
|
|
|
|
/*
|
|
* Synthetic timer configuration.
|
|
*/
|
|
union hv_stimer_config {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 enable:1;
|
|
u64 periodic:1;
|
|
u64 lazy:1;
|
|
u64 auto_enable:1;
|
|
u64 apic_vector:8;
|
|
u64 direct_mode:1;
|
|
u64 reserved_z0:3;
|
|
u64 sintx:4;
|
|
u64 reserved_z1:44;
|
|
} __packed;
|
|
};
|
|
|
|
|
|
/* Define the synthetic interrupt controller event flags format. */
|
|
union hv_synic_event_flags {
|
|
unsigned long flags[HV_EVENT_FLAGS_LONG_COUNT];
|
|
};
|
|
|
|
/* Define SynIC control register. */
|
|
union hv_synic_scontrol {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 enable:1;
|
|
u64 reserved:63;
|
|
} __packed;
|
|
};
|
|
|
|
/* Define synthetic interrupt source. */
|
|
union hv_synic_sint {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 vector:8;
|
|
u64 reserved1:8;
|
|
u64 masked:1;
|
|
u64 auto_eoi:1;
|
|
u64 polling:1;
|
|
u64 reserved2:45;
|
|
} __packed;
|
|
};
|
|
|
|
/* Define the format of the SIMP register */
|
|
union hv_synic_simp {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 simp_enabled:1;
|
|
u64 preserved:11;
|
|
u64 base_simp_gpa:52;
|
|
} __packed;
|
|
};
|
|
|
|
/* Define the format of the SIEFP register */
|
|
union hv_synic_siefp {
|
|
u64 as_uint64;
|
|
struct {
|
|
u64 siefp_enabled:1;
|
|
u64 preserved:11;
|
|
u64 base_siefp_gpa:52;
|
|
} __packed;
|
|
};
|
|
|
|
struct hv_vpset {
|
|
u64 format;
|
|
u64 valid_bank_mask;
|
|
u64 bank_contents[];
|
|
} __packed;
|
|
|
|
/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
|
|
#define HV_MAX_SPARSE_VCPU_BANKS (64)
|
|
/* The number of vCPUs in one sparse bank */
|
|
#define HV_VCPUS_PER_SPARSE_BANK (64)
|
|
|
|
/* HvCallSendSyntheticClusterIpi hypercall */
|
|
struct hv_send_ipi {
|
|
u32 vector;
|
|
u32 reserved;
|
|
u64 cpu_mask;
|
|
} __packed;
|
|
|
|
/* HvCallSendSyntheticClusterIpiEx hypercall */
|
|
struct hv_send_ipi_ex {
|
|
u32 vector;
|
|
u32 reserved;
|
|
struct hv_vpset vp_set;
|
|
} __packed;
|
|
|
|
/* HvFlushGuestPhysicalAddressSpace hypercalls */
|
|
struct hv_guest_mapping_flush {
|
|
u64 address_space;
|
|
u64 flags;
|
|
} __packed;
|
|
|
|
/*
|
|
* HV_MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
|
|
* by the bitwidth of "additional_pages" in union hv_gpa_page_range.
|
|
*/
|
|
#define HV_MAX_FLUSH_PAGES (2048)
|
|
#define HV_GPA_PAGE_RANGE_PAGE_SIZE_2MB 0
|
|
#define HV_GPA_PAGE_RANGE_PAGE_SIZE_1GB 1
|
|
|
|
/* HvFlushGuestPhysicalAddressList, HvExtCallMemoryHeatHint hypercall */
|
|
union hv_gpa_page_range {
|
|
u64 address_space;
|
|
struct {
|
|
u64 additional_pages:11;
|
|
u64 largepage:1;
|
|
u64 basepfn:52;
|
|
} page;
|
|
struct {
|
|
u64 reserved:12;
|
|
u64 page_size:1;
|
|
u64 reserved1:8;
|
|
u64 base_large_pfn:43;
|
|
};
|
|
};
|
|
|
|
/*
|
|
* All input flush parameters should be in single page. The max flush
|
|
* count is equal with how many entries of union hv_gpa_page_range can
|
|
* be populated into the input parameter page.
|
|
*/
|
|
#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) / \
|
|
sizeof(union hv_gpa_page_range))
|
|
|
|
struct hv_guest_mapping_flush_list {
|
|
u64 address_space;
|
|
u64 flags;
|
|
union hv_gpa_page_range gpa_list[HV_MAX_FLUSH_REP_COUNT];
|
|
};
|
|
|
|
/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
|
|
struct hv_tlb_flush {
|
|
u64 address_space;
|
|
u64 flags;
|
|
u64 processor_mask;
|
|
u64 gva_list[];
|
|
} __packed;
|
|
|
|
/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
|
|
struct hv_tlb_flush_ex {
|
|
u64 address_space;
|
|
u64 flags;
|
|
struct hv_vpset hv_vp_set;
|
|
u64 gva_list[];
|
|
} __packed;
|
|
|
|
/* HvGetPartitionId hypercall (output only) */
|
|
struct hv_get_partition_id {
|
|
u64 partition_id;
|
|
} __packed;
|
|
|
|
/* HvDepositMemory hypercall */
|
|
struct hv_deposit_memory {
|
|
u64 partition_id;
|
|
u64 gpa_page_list[];
|
|
} __packed;
|
|
|
|
struct hv_proximity_domain_flags {
|
|
u32 proximity_preferred : 1;
|
|
u32 reserved : 30;
|
|
u32 proximity_info_valid : 1;
|
|
} __packed;
|
|
|
|
/* Not a union in windows but useful for zeroing */
|
|
union hv_proximity_domain_info {
|
|
struct {
|
|
u32 domain_id;
|
|
struct hv_proximity_domain_flags flags;
|
|
};
|
|
u64 as_uint64;
|
|
} __packed;
|
|
|
|
struct hv_lp_startup_status {
|
|
u64 hv_status;
|
|
u64 substatus1;
|
|
u64 substatus2;
|
|
u64 substatus3;
|
|
u64 substatus4;
|
|
u64 substatus5;
|
|
u64 substatus6;
|
|
} __packed;
|
|
|
|
/* HvAddLogicalProcessor hypercall */
|
|
struct hv_add_logical_processor_in {
|
|
u32 lp_index;
|
|
u32 apic_id;
|
|
union hv_proximity_domain_info proximity_domain_info;
|
|
u64 flags;
|
|
} __packed;
|
|
|
|
struct hv_add_logical_processor_out {
|
|
struct hv_lp_startup_status startup_status;
|
|
} __packed;
|
|
|
|
enum HV_SUBNODE_TYPE
|
|
{
|
|
HvSubnodeAny = 0,
|
|
HvSubnodeSocket = 1,
|
|
HvSubnodeAmdNode = 2,
|
|
HvSubnodeL3 = 3,
|
|
HvSubnodeCount = 4,
|
|
HvSubnodeInvalid = -1
|
|
};
|
|
|
|
/* HvCreateVp hypercall */
|
|
struct hv_create_vp {
|
|
u64 partition_id;
|
|
u32 vp_index;
|
|
u8 padding[3];
|
|
u8 subnode_type;
|
|
u64 subnode_id;
|
|
union hv_proximity_domain_info proximity_domain_info;
|
|
u64 flags;
|
|
} __packed;
|
|
|
|
enum hv_interrupt_source {
|
|
HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
|
|
HV_INTERRUPT_SOURCE_IOAPIC,
|
|
};
|
|
|
|
union hv_ioapic_rte {
|
|
u64 as_uint64;
|
|
|
|
struct {
|
|
u32 vector:8;
|
|
u32 delivery_mode:3;
|
|
u32 destination_mode:1;
|
|
u32 delivery_status:1;
|
|
u32 interrupt_polarity:1;
|
|
u32 remote_irr:1;
|
|
u32 trigger_mode:1;
|
|
u32 interrupt_mask:1;
|
|
u32 reserved1:15;
|
|
|
|
u32 reserved2:24;
|
|
u32 destination_id:8;
|
|
};
|
|
|
|
struct {
|
|
u32 low_uint32;
|
|
u32 high_uint32;
|
|
};
|
|
} __packed;
|
|
|
|
struct hv_interrupt_entry {
|
|
u32 source;
|
|
u32 reserved1;
|
|
union {
|
|
union hv_msi_entry msi_entry;
|
|
union hv_ioapic_rte ioapic_rte;
|
|
};
|
|
} __packed;
|
|
|
|
/*
|
|
* flags for hv_device_interrupt_target.flags
|
|
*/
|
|
#define HV_DEVICE_INTERRUPT_TARGET_MULTICAST 1
|
|
#define HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET 2
|
|
|
|
struct hv_device_interrupt_target {
|
|
u32 vector;
|
|
u32 flags;
|
|
union {
|
|
u64 vp_mask;
|
|
struct hv_vpset vp_set;
|
|
};
|
|
} __packed;
|
|
|
|
struct hv_retarget_device_interrupt {
|
|
u64 partition_id; /* use "self" */
|
|
u64 device_id;
|
|
struct hv_interrupt_entry int_entry;
|
|
u64 reserved2;
|
|
struct hv_device_interrupt_target int_target;
|
|
} __packed __aligned(8);
|
|
|
|
|
|
/* HvGetVpRegisters hypercall input with variable size reg name list*/
|
|
struct hv_get_vp_registers_input {
|
|
struct {
|
|
u64 partitionid;
|
|
u32 vpindex;
|
|
u8 inputvtl;
|
|
u8 padding[3];
|
|
} header;
|
|
struct input {
|
|
u32 name0;
|
|
u32 name1;
|
|
} element[];
|
|
} __packed;
|
|
|
|
|
|
/* HvGetVpRegisters returns an array of these output elements */
|
|
struct hv_get_vp_registers_output {
|
|
union {
|
|
struct {
|
|
u32 a;
|
|
u32 b;
|
|
u32 c;
|
|
u32 d;
|
|
} as32 __packed;
|
|
struct {
|
|
u64 low;
|
|
u64 high;
|
|
} as64 __packed;
|
|
};
|
|
};
|
|
|
|
/* HvSetVpRegisters hypercall with variable size reg name/value list*/
|
|
struct hv_set_vp_registers_input {
|
|
struct {
|
|
u64 partitionid;
|
|
u32 vpindex;
|
|
u8 inputvtl;
|
|
u8 padding[3];
|
|
} header;
|
|
struct {
|
|
u32 name;
|
|
u32 padding1;
|
|
u64 padding2;
|
|
u64 valuelow;
|
|
u64 valuehigh;
|
|
} element[];
|
|
} __packed;
|
|
|
|
enum hv_device_type {
|
|
HV_DEVICE_TYPE_LOGICAL = 0,
|
|
HV_DEVICE_TYPE_PCI = 1,
|
|
HV_DEVICE_TYPE_IOAPIC = 2,
|
|
HV_DEVICE_TYPE_ACPI = 3,
|
|
};
|
|
|
|
typedef u16 hv_pci_rid;
|
|
typedef u16 hv_pci_segment;
|
|
typedef u64 hv_logical_device_id;
|
|
union hv_pci_bdf {
|
|
u16 as_uint16;
|
|
|
|
struct {
|
|
u8 function:3;
|
|
u8 device:5;
|
|
u8 bus;
|
|
};
|
|
} __packed;
|
|
|
|
union hv_pci_bus_range {
|
|
u16 as_uint16;
|
|
|
|
struct {
|
|
u8 subordinate_bus;
|
|
u8 secondary_bus;
|
|
};
|
|
} __packed;
|
|
|
|
union hv_device_id {
|
|
u64 as_uint64;
|
|
|
|
struct {
|
|
u64 reserved0:62;
|
|
u64 device_type:2;
|
|
};
|
|
|
|
/* HV_DEVICE_TYPE_LOGICAL */
|
|
struct {
|
|
u64 id:62;
|
|
u64 device_type:2;
|
|
} logical;
|
|
|
|
/* HV_DEVICE_TYPE_PCI */
|
|
struct {
|
|
union {
|
|
hv_pci_rid rid;
|
|
union hv_pci_bdf bdf;
|
|
};
|
|
|
|
hv_pci_segment segment;
|
|
union hv_pci_bus_range shadow_bus_range;
|
|
|
|
u16 phantom_function_bits:2;
|
|
u16 source_shadow:1;
|
|
|
|
u16 rsvdz0:11;
|
|
u16 device_type:2;
|
|
} pci;
|
|
|
|
/* HV_DEVICE_TYPE_IOAPIC */
|
|
struct {
|
|
u8 ioapic_id;
|
|
u8 rsvdz0;
|
|
u16 rsvdz1;
|
|
u16 rsvdz2;
|
|
|
|
u16 rsvdz3:14;
|
|
u16 device_type:2;
|
|
} ioapic;
|
|
|
|
/* HV_DEVICE_TYPE_ACPI */
|
|
struct {
|
|
u32 input_mapping_base;
|
|
u32 input_mapping_count:30;
|
|
u32 device_type:2;
|
|
} acpi;
|
|
} __packed;
|
|
|
|
enum hv_interrupt_trigger_mode {
|
|
HV_INTERRUPT_TRIGGER_MODE_EDGE = 0,
|
|
HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1,
|
|
};
|
|
|
|
struct hv_device_interrupt_descriptor {
|
|
u32 interrupt_type;
|
|
u32 trigger_mode;
|
|
u32 vector_count;
|
|
u32 reserved;
|
|
struct hv_device_interrupt_target target;
|
|
} __packed;
|
|
|
|
struct hv_input_map_device_interrupt {
|
|
u64 partition_id;
|
|
u64 device_id;
|
|
u64 flags;
|
|
struct hv_interrupt_entry logical_interrupt_entry;
|
|
struct hv_device_interrupt_descriptor interrupt_descriptor;
|
|
} __packed;
|
|
|
|
struct hv_output_map_device_interrupt {
|
|
struct hv_interrupt_entry interrupt_entry;
|
|
} __packed;
|
|
|
|
struct hv_input_unmap_device_interrupt {
|
|
u64 partition_id;
|
|
u64 device_id;
|
|
struct hv_interrupt_entry interrupt_entry;
|
|
} __packed;
|
|
|
|
#define HV_SOURCE_SHADOW_NONE 0x0
|
|
#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
|
|
|
|
/*
|
|
* The whole argument should fit in a page to be able to pass to the hypervisor
|
|
* in one hypercall.
|
|
*/
|
|
#define HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES \
|
|
((HV_HYP_PAGE_SIZE - sizeof(struct hv_memory_hint)) / \
|
|
sizeof(union hv_gpa_page_range))
|
|
|
|
/* HvExtCallMemoryHeatHint hypercall */
|
|
#define HV_EXT_MEMORY_HEAT_HINT_TYPE_COLD_DISCARD 2
|
|
struct hv_memory_hint {
|
|
u64 type:2;
|
|
u64 reserved:62;
|
|
union hv_gpa_page_range ranges[];
|
|
} __packed;
|
|
|
|
#endif
|