228 Commits

Author SHA1 Message Date
Paolo Bonzini
157ed9cb04 KVM x86 PMU changes for 6.3:
- Add support for created masked events for the PMU filter to allow
    userspace to heavily restrict what events the guest can use without
    needing to create an absurd number of events
 
  - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
    support is disabled
 
  - Add PEBS support for Intel SPR
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsFZ4SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5eKEP/0qeZsOQot53wkf+wNiGh1X6qDacBPFP
 A8GAPC70fEisxAt776DeKEBwikHpARPglCt1Il9dFvkG+0jgYpvPu8UGF1LpouKX
 cD/7itr2k8GZlXZBg2Rgu3TRyFBJEGHT6tAu7PBhZyL6yWQDUxao8FPFrRGfmJ7O
 Z6eFMo1cERNHICQm+W/2TBd1xguiF+m4CXKlA70R4wzM37aPF9o5HvmIwAvPzyhU
 w4WzcIQbjVPs1VpBTzwPqRmyZ8omSlDYo7VqmsDiRtJbucqgbhFI2wR+nyImFCa9
 D2pI5TV3CFTt0fvd8SZpH19nR3S6cMLCXONOsijmvR2BmS3PhJMP4dMm5m4R06nF
 RBtnTj9fkbeL1ghFEkMxHBZVTG3bBlO4ySOxIqNHCvPjqQ37mJ+xP4C8kcIC9p5F
 +xL3AvZ7zenPv3A29SY9YH+QvZLBwyDJzAsveLeYkLFoJxoDT4glOY/Wpi1rkZ17
 /zHDZWoF49l1Eu3Bql0hFetkCreUNFGpa4moUmEC0evYOvV2WCb+39TDXZ8CPCGD
 +cDiRnD8MFQpBw47F03EnFheFHxiJoL0Clv5vvM3C+xOq2J9WVG9mqQWCk+4ta2B
 Um4D++0a9lwvJhOImaR7uyiV3K7oVm+rU8+46x+nTNGaIP2bnE+vronY+b6KGeUx
 7+xzTKlYygGe
 =ev5v
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.3' of https://github.com/kvm-x86/linux into HEAD

KVM x86 PMU changes for 6.3:

 - Add support for created masked events for the PMU filter to allow
   userspace to heavily restrict what events the guest can use without
   needing to create an absurd number of events

 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled

 - Add PEBS support for Intel SPR
2023-02-15 08:23:24 -05:00
Vipin Sharma
db9cf24cea KVM: x86: hyper-v: Add extended hypercall support in Hyper-v
Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b
describes hypercalls above call code 0x8000 as extended hypercalls.

A Hyper-v hypervisor's guest VM finds availability of extended
hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the
guest can call extended hypercalls.

All extended hypercalls will exit to userspace by default. This allows
for easy support of future hypercalls without being dependent on KVM
releases.

If there will be need to process the hypercall in KVM instead of
userspace then KVM can create a capability which userspace can query to
know which hypercalls can be handled by the KVM and enable handling
of those hypercalls.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-10-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-01 13:46:00 -08:00
Vipin Sharma
1a9df3262a KVM: x86: hyper-v: Use common code for hypercall userspace exit
Remove duplicate code to exit to userspace for hyper-v hypercalls and
use a common place to exit.

No functional change intended.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-9-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-01 13:41:44 -08:00
Sean Christopherson
e76ae52747 KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrs
Add helpers to print unimplemented MSR accesses and condition all such
prints on report_ignored_msrs, i.e. honor userspace's request to not
print unimplemented MSRs.  Even though vcpu_unimpl() is ratelimited,
printing can still be problematic, e.g. if a print gets stalled when host
userspace is writing MSRs during live migration, an effective stall can
result in very noticeable disruption in the guest.

E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU
counters while the PMU was disabled in KVM.

  -   99.75%     0.00%  [.] __ioctl
   - __ioctl
      - 99.74% entry_SYSCALL_64_after_hwframe
           do_syscall_64
           sys_ioctl
         - do_vfs_ioctl
            - 92.48% kvm_vcpu_ioctl
               - kvm_arch_vcpu_ioctl
                  - 85.12% kvm_set_msr_ignored_check
                       svm_set_msr
                       kvm_set_msr_common
                       printk
                       vprintk_func
                       vprintk_default
                       vprintk_emit
                       console_unlock
                       call_console_drivers
                       univ8250_console_write
                       serial8250_console_write
                       uart_console_write

Reported-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26 18:03:42 -08:00
Sean Christopherson
8d20bd6381 KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code.  In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl().  But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:47:35 -05:00
Vitaly Kuznetsov
2be1bd3a70 KVM: x86: Hyper-V invariant TSC control
Normally, genuine Hyper-V doesn't expose architectural invariant TSC
(CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR
(HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID
feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the
PV MSR is set, invariant TSC bit starts to show up in CPUID. When the
feature is exposed to Hyper-V guests, reenlightenment becomes unneeded.

Add the feature to KVM. Keep CPUID output intact when the feature
wasn't exposed to L1 and implement the required logic for hiding
invariant TSC when the feature was exposed and invariant TSC control
MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to
disable the feature once it was enabled.

For the reference, for linux guests, support for the feature was added
in commit dce7cd62754b ("x86/hyperv: Allow guests to enable InvariantTSC").

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:29 -05:00
Vitaly Kuznetsov
8b9e13d2de KVM: x86: hyper-v: Fix 'using uninitialized value' Coverity warning
In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables
are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM
halves' and increase 'data_offset' otherwise. Coverity discovered, that in
one case both variables are incremented unconditionally. This doesn't seem
to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves'
data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also
takes into account 'hc->fast' but is still worth fixing.

To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to
'struct kvm_hv_hcall' as a union and use at call sites. This allows to
remove explicit 'data_offset'/'consumed_xmm_halves' parameters from
kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries()
helpers.

Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and
is not zeroed, consumers are supposed to initialize the appropriate field
if needed.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527764 ("Uninitialized variables")
Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221208102700.959630-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-23 12:08:16 -05:00
Vitaly Kuznetsov
f4de6a1fa3 KVM: x86: Expose Hyper-V L2 TLB flush feature
With both nSVM and nVMX implementations in place, KVM can now expose
Hyper-V L2 TLB flush feature to userspace.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-30-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:19 -05:00
Vitaly Kuznetsov
b415d8d417 KVM: x86: Make kvm_hv_get_assist_page() return 0/-errno
Convert kvm_hv_get_assist_page() to return 'int' and propagate possible
errors from kvm_read_guest_cached().

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-28-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:18 -05:00
Vitaly Kuznetsov
046f5756c4 KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'
In preparation to enabling L2 TLB flush, cache VP assist page in
'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry()
to nested_get_evmptr() and make it return eVMCS GPA directly.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-26-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:16 -05:00
Vitaly Kuznetsov
c58a318f60 KVM: x86: hyper-v: L2 TLB flush
Handle L2 TLB flush requests by going through all vCPUs and checking
whether there are vCPUs running the same VM_ID with a VP_ID specified
in the requests. Perform synthetic exit to L2 upon finish.

Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
racy, we count on the fact that KVM flushes the whole L2 VPID upon
transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
transition between L1 and L2 to make sure all pending requests are
always processed.

For the reference, Hyper-V TLFS refers to the feature as "Direct
Virtual Flush".

Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-24-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:14 -05:00
Vitaly Kuznetsov
7d5e88d301 KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.

Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2
requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings,
i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so
can remain an on-stack allocation.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-19-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:10 -05:00
Vitaly Kuznetsov
53ca765a04 KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush
To handle L2 TLB flush requests, KVM needs to use a separate fifo from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.

No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-18-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:10 -05:00
Vitaly Kuznetsov
b6c2c22fa7 KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()
Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in  use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).

Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-17-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:09 -05:00
Vitaly Kuznetsov
ca7372aca7 KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64'
It may not be clear from where the '64' limit for the maximum sparse
bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead.
Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's
definition. Opportunistically adjust the comment around BUILD_BUG_ON().

No functional change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-16-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:08 -05:00
Vitaly Kuznetsov
aee738236d KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
To handle L2 TLB flush requests, KVM needs to translate the specified
L2 GPA to L1 GPA to read hypercall arguments from there.

No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-14-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:07 -05:00
Vitaly Kuznetsov
f84fcb6656 KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls
Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.

Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-13-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:06 -05:00
Vitaly Kuznetsov
260970862c KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
flushing the whole VPID and this is sub-optimal. Switch to handling
these requests with 'flush_tlb_gva()' hooks instead. Use the newly
introduced TLB flush fifo to queue the requests.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-12-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:05 -05:00
Sean Christopherson
56b5354fd8 KVM: x86: hyper-v: Add helper to read hypercall data for array
Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.

Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-11-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:04 -05:00
Vitaly Kuznetsov
0823570f01 KVM: x86: hyper-v: Introduce TLB flush fifo
To allow flushing individual GVAs instead of always flushing the whole
VPID a per-vCPU structure to pass the requests is needed. Use standard
'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095
following GFNs in the lower 12 bits) and 'flush all'.

The size of the fifo is arbitrarily set to '16'.

Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and
kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the
queue before returning -EOPNOTSUPP (which triggers full TLB flush) so
the functional change is very small but the infrastructure is prepared
to handle individual GVA flush requests.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-10-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:04 -05:00
Vitaly Kuznetsov
adc43caa0a KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
request in kvm_vcpu_flush_tlb_guest().

The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest().

No functional change intended.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18 12:59:03 -05:00
Vitaly Kuznetsov
4da77090b0 KVM: nVMX: Support PERF_GLOBAL_CTRL with enlightened VMCS
Enlightened VMCS v1 got updated and now includes the required fields
for loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT features. For KVM on
Hyper-V enablement, KVM can just observe VMX control MSRs and use the
features (with or without eVMCS) when possible.

Hyper-V on KVM is messier as Windows 11 guests fail to boot if the
controls are advertised and a new PV feature flag, CPUID.0x4000000A.EBX
BIT(0), is not set.  Honor the Hyper-V CPUID feature flag to play nice
with Windows guests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-16-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:47 -04:00
Vitaly Kuznetsov
dea6e140d9 KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf
KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID
leaf to know which Enlightened VMCS definition to use (original or 2022
update). Cache the leaf along with other Hyper-V CPUID feature leaves
to make the check quick.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-12-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:44 -04:00
Sean Christopherson
3be29eb7b5 KVM: x86: Report error when setting CPUID if Hyper-V allocation fails
Return -ENOMEM back to userspace if allocating the Hyper-V vCPU struct
fails when enabling Hyper-V in guest CPUID.  Silently ignoring failure
means that KVM will not have an up-to-date CPUID cache if allocating the
struct succeeds later on, e.g. when activating SynIC.

Rejecting the CPUID operation also guarantess that vcpu->arch.hyperv is
non-NULL if hyperv_enabled is true, which will allow for additional
cleanup, e.g. in the eVMCS code.

Note, the initialization needs to be done before CPUID is set, and more
subtly before kvm_check_cpuid(), which potentially enables dynamic
XFEATURES.  Sadly, there's no easy way to avoid exposing Hyper-V details
to CPUID or vice versa.  Expose kvm_hv_vcpu_init() and the Hyper-V CPUID
signature to CPUID instead of exposing cpuid_entry2_find() outside of
CPUID code.  It's hard to envision kvm_hv_vcpu_init() being misused,
whereas cpuid_entry2_find() absolutely shouldn't be used outside of core
CPUID code.

Fixes: 10d7bf1e46dc ("KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-6-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:39 -04:00
Sean Christopherson
1cac8d9f6b KVM: x86: Check for existing Hyper-V vCPU in kvm_hv_vcpu_init()
When potentially allocating/initializing the Hyper-V vCPU struct, check
for an existing instance in kvm_hv_vcpu_init() instead of requiring
callers to perform the check.  Relying on callers to do the check is
risky as it's all too easy for KVM to overwrite vcpu->arch.hyperv and
leak memory, and it adds additional burden on callers without much
benefit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220830133737.1539624-5-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:38 -04:00
Vitaly Kuznetsov
ce2196b831 KVM: x86: Zero out entire Hyper-V CPUID cache before processing entries
Wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead of having to
zero each particular member when the corresponding CPUID entry was not
found.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220830133737.1539624-4-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:38 -04:00
Sean Christopherson
277ad7d586 KVM: x86: Add dedicated helper to get CPUID entry with significant index
Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM
queries for CPUID leaves whose index _may_ be significant, and drop the
index param from the existing kvm_find_cpuid_entry().  Add a WARN in the
inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID
entry whose index is significant without explicitly providing an index.

Using an explicit magic number and letting callers omit the index avoids
confusion by eliminating the myriad cases where KVM specifies '0' as a
dummy value.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 11:38:32 -04:00
Linus Torvalds
d0e60d46bc Bitmap patches for 5.19-rc1
This series includes the following patchsets:
  - bitmap: optimize bitmap_weight() usage(w/o bitmap_weight_cmp), from me;
  - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
    Carvalho Chehab;
  - include/linux/find: Fix documentation, from Anna-Maria Behnsen;
  - bitmap: fix conversion from/to fix-sized arrays, from me;
  - bitmap: Fix return values to be unsigned, from Kees Cook.
 
 It has been in linux-next for at least a week with no problems.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmKaEzYACgkQsUSA/Tof
 vsiGKwv8Dgr3G0mLbSfmHZqdFMIsmSmwhxlEH6eBNtX6vjQbGafe/Buhj/1oSY8N
 NYC4+5Br6s7MmMRth3Kp6UECdl94TS3Ka06T+lVBKkG+C+B1w1/svqUMM2ZCQF3e
 Z5R/HhR6av9X9Qb2mWSasWLkWp629NjdtRsJSDWiVt1emVVwh+iwxQnMH9VuE+ao
 z3mvaQfSRhe4h+xCZOiohzFP+0jZb1EnPrQAIVzNUjigo7mglpNvVyO7p/8LU7gD
 dIjfGmSbtsHU72J+/0lotRqjhjORl1F/EILf8pIzx5Ga7ExUGhOzGWAj7/3uZxfA
 Cp1Z/QV271MGwv/sNdSPwCCJHf51eOmsbyOyUScjb3gFRwIStEa1jB4hKwLhS5wF
 3kh4kqu3WGuIQAdxkUpDBsy3CQjAPDkvtRJorwyWGbjwa9xUETESAgH7XCCTsgWc
 0sIuldWWaxC581+fAP1Dzmo8uuWBURTaOrVmRMILQHMTw54zoFyLY+VI9gEAT9aV
 gnPr3M4F
 =U7DN
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...
2022-06-04 14:04:27 -07:00
Yury Norov
d603fd8dd3 KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
kvm_hv_flush_tlb() applies bitmap API to a u64 variable valid_bank_mask.
Since valid_bank_mask has a fixed size, we can use hweight64() and avoid
excessive bloating.

CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: kvm@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: x86@kernel.org
Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-03 06:52:58 -07:00
Yury Norov
a7ef9b455c KVM: x86: hyper-v: fix type of valid_bank_mask
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"

https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/

But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.

>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                                            ^~
   include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
      10 |          ((!!((w) & (1ULL << 0))) +     \
         |                ^
   include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
      20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
         |                               ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                      ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
      29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
         |                                                 ^~~~~~~~~~~~~~~~~
   arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
    1983 |                 if (hc->var_cnt != hweight64(valid_bank_mask))
         |                                    ^~~~~~~~~

CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: kvm@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: x86@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-03 06:52:58 -07:00
Yury Norov
ea8c66fe8d KVM: x86: hyper-v: fix type of valid_bank_mask
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"

https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/

But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.

>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                                            ^~
   include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
      10 |          ((!!((w) & (1ULL << 0))) +     \
         |                ^
   include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
      20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
         |                               ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                      ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
      29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
         |                                                 ^~~~~~~~~~~~~~~~~
   arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
    1983 |                 if (hc->var_cnt != hweight64(valid_bank_mask))
         |                                    ^~~~~~~~~

CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: kvm@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: x86@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20 13:03:19 -04:00
Yury Norov
fe06a0c09b KVM: x86: replace bitmap_weight with bitmap_empty where appropriate
In some places kvm/hyperv.c code calls bitmap_weight() to check if any bit
of a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2022-05-02 06:30:39 -07:00
Vitaly Kuznetsov
42dcbe7d8b KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU
The following WARN is triggered from kvm_vm_ioctl_set_clock():
 WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G        W  O      5.16.0.stable #20
 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021
 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 Call Trace:
  <TASK>
  ? kvm_write_guest+0x114/0x120 [kvm]
  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
  ? schedule+0x4e/0xc0
  ? __cond_resched+0x1a/0x50
  ? futex_wait+0x166/0x250
  ? __send_signal+0x1f1/0x3d0
  kvm_vm_ioctl+0x747/0xda0 [kvm]
  ...

The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if
mark_page_dirty() is called without an active vCPU") but the change seems
to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's
no real need to actually write to guest memory to invalidate TSC page, this
can be done by the first vCPU which goes through kvm_guest_time_update().

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220407201013.963226-1-vkuznets@redhat.com>
2022-04-11 13:29:51 -04:00
Sean Christopherson
320af55a93 KVM: x86: Add wrappers for setting/clearing APICv inhibits
Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well.  Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.

For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:44 -04:00
Vitaly Kuznetsov
b1e34d3253 KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
Setting non-zero values to SYNIC/STIMER MSRs activates certain features,
this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated.

Note, it would've been better to forbid writing anything to SYNIC/STIMER
MSRs, including zeroes, however, at least QEMU tries clearing
HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat
'special' as writing zero there triggers an action, this also should not
happen when SynIC wasn't activated.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-4-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:22:07 -04:00
Vitaly Kuznetsov
7ec37d1cbe KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq
When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however,  possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.

The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:21 -04:00
Paolo Bonzini
c0f1eaeb9e Merge branch 'kvm-hv-xmm-hypercall-fixes' into HEAD
The fixes for 5.17 conflict with cleanups made in the same area
earlier in the 5.18 development cycle.
2022-02-25 08:20:08 -05:00
Vitaly Kuznetsov
47d3e5cdfe KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall
It has been proven on practice that at least Windows Server 2019 tries
using HVCALL_SEND_IPI_EX in 'XMM fast' mode when it has more than 64 vCPUs
and it needs to send an IPI to a vCPU > 63. Similarly to other XMM Fast
hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}{,_EX}), this
information is missing in TLFS as of 6.0b. Currently, KVM returns an error
(HV_STATUS_INVALID_HYPERCALL_INPUT) and Windows crashes.

Note, HVCALL_SEND_IPI is a 'standard' fast hypercall (not 'XMM fast') as
all its parameters fit into RDX:R8 and this is handled by KVM correctly.

Cc: stable@vger.kernel.org # 5.14.x: 3244867af8c0: KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
Cc: stable@vger.kernel.org # 5.14.x
Fixes: d8f5537a8816 ("KVM: hyper-v: Advertise support for fast XMM hypercalls")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220222154642.684285-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 05:52:22 -05:00
Vitaly Kuznetsov
7321f47ead KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls
When TLB flush hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX are
issued in 'XMM fast' mode, the maximum number of allowed sparse_banks is
not 'HV_HYPERCALL_MAX_XMM_REGISTERS - 1' (5) but twice as many (10) as each
XMM register is 128 bit long and can hold two 64 bit long banks.

Cc: stable@vger.kernel.org # 5.14.x
Fixes: 5974565bc26d ("KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220222154642.684285-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 05:52:12 -05:00
Vitaly Kuznetsov
82c1ead0d6 KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb()
'struct kvm_hv_hcall' has all the required information already,
there's no need to pass 'ex' additionally.

No functional change intended.

Cc: stable@vger.kernel.org # 5.14.x
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220222154642.684285-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 05:52:04 -05:00
Vitaly Kuznetsov
50e523dd79 KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi()
'struct kvm_hv_hcall' has all the required information already,
there's no need to pass 'ex' additionally.

No functional change intended.

Cc: stable@vger.kernel.org # 5.14.x
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220222154642.684285-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 05:51:53 -05:00
Vitaly Kuznetsov
66c03a926f KVM: nSVM: Implement Enlightened MSR-Bitmap feature
Similar to nVMX commit 502d2bf5f2fd ("KVM: nVMX: Implement Enlightened MSR
Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM).

Notable differences from nVMX implementation:
- As the feature uses SW reserved fields in VMCB control, KVM needs to
make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()).

- 'msrpm_base_pa' needs to be always be overwritten in
nested_svm_vmrun_msrpm(), even when the update is skipped. As an
optimization, nested_vmcb02_prepare_control() copies it from VMCB01
so when MSR-Bitmap feature for L2 is disabled nothing needs to be done.

- 'struct vmcb_ctrl_area_cached' needs to be extended with clean
fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to
copy it so nested_svm_vmrun_msrpm() can use it later.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:45 -05:00
Vitaly Kuznetsov
ce3859172c KVM: x86: Make kvm_hv_hypercall_enabled() static inline
In preparation for using kvm_hv_hypercall_enabled() from SVM code, make
it static inline to avoid the need to export it. The function is a
simple check with only two call sites currently.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:44 -05:00
Sean Christopherson
413af6601f KVM: x86: Add checks for reserved-to-zero Hyper-V hypercall fields
Add checks for the three fields in Hyper-V's hypercall params that must
be zero.  Per the TLFS, HV_STATUS_INVALID_HYPERCALL_INPUT is returned if
"A reserved bit in the specified hypercall input value is non-zero."

Note, some versions of the TLFS have an off-by-one bug for the last
reserved field, and define it as being bits 64:60.  See
https://github.com/MicrosoftDocs/Virtualization-Documentation/pull/1682.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:36 -05:00
Sean Christopherson
40421f38f6 KVM: x86: Reject fixeds-size Hyper-V hypercalls with non-zero "var_cnt"
Reject Hyper-V hypercalls if the guest specifies a non-zero variable size
header (var_cnt in KVM) for a hypercall that has a fixed header size.
Per the TLFS:

  It is illegal to specify a non-zero variable header size for a
  hypercall that is not explicitly documented as accepting variable sized
  input headers. In such a case the hypercall will result in a return
  code of HV_STATUS_INVALID_HYPERCALL_INPUT.

Note, at least some of the various DEBUG commands likely aren't allowed
to use variable size headers, but the TLFS documentation doesn't clearly
state what is/isn't allowed.  Omit them for now to avoid unnecessary
breakage.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:35 -05:00
Sean Christopherson
9c52f6b3d8 KVM: x86: Shove vp_bitmap handling down into sparse_set_to_vcpu_mask()
Move the vp_bitmap "allocation" that's needed to handle mismatched vp_index
values down into sparse_set_to_vcpu_mask() and drop __always_inline from
said helper.  The need for an intermediate vp_bitmap is a detail that's
specific to the sparse translation with mismatched VP<=>vCPU indexes and
does not need to be exposed to the caller.

Regarding the __always_inline, prior to commit f21dd494506a ("KVM: x86:
hyperv: optimize sparse VP set processing") the helper, then named
hv_vcpu_in_sparse_set(), was a tiny bit of code that effectively boiled
down to a handful of bit ops.  The __always_inline was understandable, if
not justifiable.  Since the aforementioned change, sparse_set_to_vcpu_mask()
is a chunky 350-450+ bytes of code without KASAN=y, and balloons to 1100+
with KASAN=y.  In other words, it has no business being forcefully inlined.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:35 -05:00
Sean Christopherson
79661c3766 KVM: x86: Don't bother reading sparse banks that end up being ignored
When handling "sparse" VP_SET requests, don't read sparse banks that
can't possibly contain a legal VP index instead of ignoring such banks
later on in sparse_set_to_vcpu_mask().  This allows KVM to cap the size
of its sparse_banks arrays for VP_SET at KVM_HV_MAX_SPARSE_VCPU_SET_BITS.
Add a compile time assert that KVM_HV_MAX_SPARSE_VCPU_SET_BITS<=64, i.e.
that KVM_MAX_VCPUS<=4096, as the TLFS allows for at most 64 sparse banks,
and KVM will need to do _something_ to play nice with Hyper-V.

Reducing the size of sparse_banks fudges around a compilation warning
(that becomes error with KVM_WERROR=y) when CONFIG_KASAN_STACK=y, which
is selected (and can't be unselected) by CONFIG_KASAN=y when using gcc
(clang/LLVM is a stack hog in some cases so it's opt-in for clang).
KASAN_STACK adds a redzone around every stack variable, which pushes the
Hyper-V functions over the default limit of 1024.

Ideally, KVM would flat out reject such impossibilities, but the TLFS
explicitly allows providing empty banks, even if a bank can't possibly
contain a valid VP index due to its position exceeding KVM's max.

  Furthermore, for a bit 1 in ValidBankMask, it is valid state for the
  corresponding element in BanksContents can be all 0s, meaning no
  processors are specified in this bank.

Arguably KVM should reject and not ignore the "extra" banks, but that can
be done independently and without bloating sparse_banks, e.g. by reading
each "extra" 8-byte chunk individually.

Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:34 -05:00
Sean Christopherson
a0dd008fe9 KVM: x86: Add a helper to get the sparse VP_SET for IPIs and TLB flushes
Add a helper, kvm_get_sparse_vp_set(), to handle sanity checks related to
the VARHEAD field and reading the sparse banks of a VP_SET.  A future
commit to reduce the memory footprint of sparse_banks will introduce more
common code to the sparse bank retrieval.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:34 -05:00
Sean Christopherson
25af908118 KVM: x86: Refactor kvm_hv_flush_tlb() to reduce indentation
Refactor the "extended" path of kvm_hv_flush_tlb() to reduce the nesting
depth for the non-fast sparse path, and to make the code more similar to
the extended path in kvm_hv_send_ipi().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:34 -05:00
Sean Christopherson
bd1ba5732b KVM: x86: Get the number of Hyper-V sparse banks from the VARHEAD field
Get the number of sparse banks from the VARHEAD field, which the guest is
required to provide as "The size of a variable header, in QWORDS.", where
the variable header is:

  Variable Header Bytes = {Total Header Bytes - sizeof(Fixed Header)}
                          rounded up to nearest multiple of 8
  Variable HeaderSize = Variable Header Bytes / 8

In other words, the VARHEAD should match the number of sparse banks.
Keep the manual count as a sanity check, but otherwise rely on the field
so as to more closely align with the logic defined in the TLFS and to
allow for future cleanups.

Tweak the tracepoint output to use "rep_cnt" instead of simply "cnt" now
that there is also "var_cnt".

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:33 -05:00