1121529 Commits

Author SHA1 Message Date
Vitaly Kuznetsov
a0fa4b7abf KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields
The updated Enlightened VMCS definition has 'encls_exiting_bitmap'
field which needs mapping to VMCS, add the missing encoding.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-13-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:45 -04:00
Vitaly Kuznetsov
dea6e140d9 KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf
KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID
leaf to know which Enlightened VMCS definition to use (original or 2022
update). Cache the leaf along with other Hyper-V CPUID feature leaves
to make the check quick.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-12-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:44 -04:00
Vitaly Kuznetsov
c9d31986e8 KVM: nVMX: Support several new fields in eVMCSv1
Enlightened VMCS v1 definition was updated with new fields, add
support for them for Hyper-V on KVM.

Note: SSP, CET and Guest LBR features are not supported by KVM yet
and 'struct vmcs12' has no corresponding fields.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-11-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:43 -04:00
Vitaly Kuznetsov
b19e4ff5e5 KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields
Enlightened VMCS v1 definition was updated with new fields, support
them in KVM by defining VMCS-to-EVMCS conversion.

Note: SSP, CET and Guest LBR features are not supported by KVM yet and
the corresponding fields are not defined in 'enum vmcs_field', leave
them commented out for now.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-10-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:42 -04:00
Sean Christopherson
6cce93de28 KVM: nVMX: Use CC() macro to handle eVMCS unsupported controls checks
Locally #define and use the nested virtualization Consistency Check (CC)
macro to handle eVMCS unsupported controls checks.  Using the macro loses
the existing printing of the unsupported controls, but that's a feature
and not a bug.  The existing approach is flawed because the @err param to
trace_kvm_nested_vmenter_failed() is the error code, not the error value.

The eVMCS trickery mostly works as __print_symbolic() falls back to
printing the raw hex value, but that subtly relies on not having a match
between the unsupported value and VMX_VMENTER_INSTRUCTION_ERRORS.

If it's really truly necessary to snapshot the bad value, then the
tracepoint can be extended in the future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-9-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:42 -04:00
Vitaly Kuznetsov
f4d361b4c2 KVM: nVMX: Refactor unsupported eVMCS controls logic to use 2-d array
Refactor the handling of unsupported eVMCS to use a 2-d array to store
the set of unsupported controls.  KVM's handling of eVMCS is completely
broken as there is no way for userspace to query which features are
unsupported, nor does KVM prevent userspace from attempting to enable
unsupported features.  A future commit will remedy that by filtering and
enforcing unsupported features when eVMCS, but that needs to be opt-in
from userspace to avoid breakage, i.e. KVM needs to maintain its legacy
behavior by snapshotting the exact set of controls that are currently
(un)supported by eVMCS.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[sean: split to standalone patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-8-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:41 -04:00
Sean Christopherson
85ab071af8 KVM: nVMX: Treat eVMCS as enabled for guest iff Hyper-V is also enabled
When querying whether or not eVMCS is enabled on behalf of the guest,
treat eVMCS as enable if and only if Hyper-V is enabled/exposed to the
guest.

Note, flows that come from the host, e.g. KVM_SET_NESTED_STATE, must NOT
check for Hyper-V being enabled as KVM doesn't require guest CPUID to be
set before most ioctls().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-7-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:40 -04:00
Sean Christopherson
3be29eb7b5 KVM: x86: Report error when setting CPUID if Hyper-V allocation fails
Return -ENOMEM back to userspace if allocating the Hyper-V vCPU struct
fails when enabling Hyper-V in guest CPUID.  Silently ignoring failure
means that KVM will not have an up-to-date CPUID cache if allocating the
struct succeeds later on, e.g. when activating SynIC.

Rejecting the CPUID operation also guarantess that vcpu->arch.hyperv is
non-NULL if hyperv_enabled is true, which will allow for additional
cleanup, e.g. in the eVMCS code.

Note, the initialization needs to be done before CPUID is set, and more
subtly before kvm_check_cpuid(), which potentially enables dynamic
XFEATURES.  Sadly, there's no easy way to avoid exposing Hyper-V details
to CPUID or vice versa.  Expose kvm_hv_vcpu_init() and the Hyper-V CPUID
signature to CPUID instead of exposing cpuid_entry2_find() outside of
CPUID code.  It's hard to envision kvm_hv_vcpu_init() being misused,
whereas cpuid_entry2_find() absolutely shouldn't be used outside of core
CPUID code.

Fixes: 10d7bf1e46dc ("KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-6-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:39 -04:00
Sean Christopherson
1cac8d9f6b KVM: x86: Check for existing Hyper-V vCPU in kvm_hv_vcpu_init()
When potentially allocating/initializing the Hyper-V vCPU struct, check
for an existing instance in kvm_hv_vcpu_init() instead of requiring
callers to perform the check.  Relying on callers to do the check is
risky as it's all too easy for KVM to overwrite vcpu->arch.hyperv and
leak memory, and it adds additional burden on callers without much
benefit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220830133737.1539624-5-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:38 -04:00
Vitaly Kuznetsov
ce2196b831 KVM: x86: Zero out entire Hyper-V CPUID cache before processing entries
Wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead of having to
zero each particular member when the corresponding CPUID entry was not
found.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220830133737.1539624-4-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:38 -04:00
Vitaly Kuznetsov
5ef384a60f x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
Updated Hyper-V Enlightened VMCS specification lists several new
fields for the following features:

- PerfGlobalCtrl
- EnclsExitingBitmap
- Tsc Scaling
- GuestLbrCtl
- CET
- SSP

Update the definition.

Note, the updated spec also provides an additional CPUID feature flag,
CPUIDD.0x4000000A.EBX BIT(0), for PerfGlobalCtrl to workaround a Windows
11 quirk.  Despite what the TLFS says:

  Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl
  fields in the enlightened VMCS.

guests can safely use the fields if they are enumerated in the
architectural VMX MSRs.  I.e. KVM-on-HyperV doesn't need to check the
CPUID bit, but KVM-as-HyperV must ensure the bit is set if PerfGlobalCtrl
fields are exposed to L1.

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[sean: tweak CPUID name to make it PerfGlobalCtrl only]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220830133737.1539624-3-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:37 -04:00
Vitaly Kuznetsov
ea9da788a6 x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
Section 1.9 of TLFS v6.0b says:

"All structures are padded in such a way that fields are aligned
naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
and so on)".

'struct enlightened_vmcs' has a glitch:

...
        struct {
                u32                nested_flush_hypercall:1; /*   836: 0  4 */
                u32                msr_bitmap:1;         /*   836: 1  4 */
                u32                reserved:30;          /*   836: 2  4 */
        } hv_enlightenments_control;                     /*   836     4 */
        u32                        hv_vp_id;             /*   840     4 */
        u64                        hv_vm_id;             /*   844     8 */
        u64                        partition_assist_page; /*   852     8 */
...

And the observed values in 'partition_assist_page' make no sense at
all. Fix the layout by padding the structure properly.

Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-2-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:36 -04:00
Oliver Upton
5f5651c673 KVM: selftests: Require DISABLE_NX_HUGE_PAGES cap for NX hugepage test
Require KVM_CAP_VM_DISABLE_NX_HUGE_PAGES for the entire NX hugepage test
instead of skipping the "disable" subtest if the capability isn't
supported by the host kernel.  While the "enable" subtest does provide
value when the capability isn't supported, silently providing only half
the promised coveraged is undesirable, i.e. it's better to skip the test
so that the user knows something.

Alternatively, the test could print something to alert the user instead
of silently skipping the subtest, but that would encourage other tests
to follow suit, and it's not clear that it's desirable to take selftests
in that direction.  And if selftests do head down the path of skipping
subtests, such behavior needs first-class support in the framework.

Opportunistically convert other test preconditions to TEST_REQUIRE().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220812175301.3915004-1-oliver.upton@linux.dev
[sean: rewrote changelog to capture discussion about skipping the test]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:36 -04:00
Uros Bizjak
57abfa11ba KVM: VMX: Do not declare vmread_error() asmlinkage
There is no need to declare vmread_error() asmlinkage, its arguments
can be passed via registers for both 32-bit and 64-bit targets.
Function argument registers are considered call-clobbered registers,
they are saved in the trampoline just before the function call and
restored afterwards.

Dropping "asmlinkage" patch unifies trampoline function argument handling
between 32-bit and 64-bit targets and improves generated code for 32-bit
targets.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20220817144045.3206-1-ubizjak@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:35 -04:00
Liam Ni
e390f4d69d KVM:x86: Clean up ModR/M "reg" initialization in reg op decoding
Refactor decode_register_operand() to get the ModR/M register if and
only if the instruction uses a ModR/M encoding to make it more obvious
how the register operand is retrieved.

Signed-off-by: Liam Ni <zhiguangni01@gmail.com>
Link: https://lore.kernel.org/r/20220908141210.1375828-1-zhiguangni01@zhaoxin.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:34 -04:00
Mingwei Zhang
02dfc44f20 KVM: x86: Print guest pgd in kvm_nested_vmenter()
Print guest pgd in kvm_nested_vmenter() to enrich the information for
tracing. When tdp is enabled, print the value of tdp page table (EPT/NPT);
when tdp is disabled, print the value of non-nested CR3.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20220825225755.907001-4-mizhang@google.com
[sean: print nested_cr3 vs. nested_eptp vs. guest_cr3]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:33 -04:00
David Matlack
37ef0be269 KVM: nVMX: Add tracepoint for nested VM-Enter
Call trace_kvm_nested_vmenter() during nested VMLAUNCH/VMRESUME to bring
parity with nSVM's usage of the tracepoint during nested VMRUN.

Attempt to use analagous VMCS fields to the VMCB fields that are
reported in the SVM case:

"int_ctl": 32-bit field of the VMCB that the CPU uses to deliver virtual
interrupts. The analagous VMCS field is the 16-bit "guest interrupt
status".

"event_inj": 32-bit field of VMCB that is used to inject events
(exceptions and interrupts) into the guest. The analagous VMCS field
is the "VM-entry interruption-information field".

"npt_enabled": 1 when the VCPU has enabled nested paging. The analagous
VMCS field is the enable-EPT execution control.

"npt_addr": 64-bit field when the VCPU has enabled nested paging. The
analagous VMCS field is the ept_pointer.

Signed-off-by: David Matlack <dmatlack@google.com>
[move the code into the nested_vmx_enter_non_root_mode().]
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20220825225755.907001-3-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:32 -04:00
Mingwei Zhang
89e54ec592 KVM: x86: Update trace function for nested VM entry to support VMX
Update trace function for nested VM entry to support VMX. Existing trace
function only supports nested VMX and the information printed out is AMD
specific.

So, rename trace_kvm_nested_vmrun() to trace_kvm_nested_vmenter(), since
'vmenter' is generic. Add a new field 'isa' to recognize Intel and AMD;
Update the output to print out VMX/SVM related naming respectively, eg.,
vmcb vs. vmcs; npt vs. ept.

Opportunistically update the call site of trace_kvm_nested_vmenter() to
make one line per parameter.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20220825225755.907001-2-mizhang@google.com
[sean: align indentation, s/update/rename in changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:32 -04:00
Sean Christopherson
bff0adc40c KVM: x86: Use u64 for address and error code in page fault tracepoint
Track the address and error code as 64-bit values in the page fault
tracepoint.  When TDP is enabled, the address is a GPA and thus can be a
64-bit value even on 32-bit hosts.  And SVM's #NPF genereates 64-bit
error codes.

Opportunistically clean up the formatting.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:31 -04:00
Wonhyuk Yang
faa03b3972 KVM: Add extra information in kvm_page_fault trace point
Currently, kvm_page_fault trace point provide fault_address and error
code. However it is not enough to find which cpu and instruction
cause kvm_page_faults. So add vcpu id and instruction pointer in
kvm_page_fault trace point.

Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Link: https://lore.kernel.org/r/20220510071001.87169-1-vvghjk1234@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:30 -04:00
Aaron Lewis
b5cb32b16c KVM: x86: Delete duplicate documentation for KVM_X86_SET_MSR_FILTER
Two copies of KVM_X86_SET_MSR_FILTER somehow managed to make it's way
into the documentation.  Remove one copy and merge the difference from
the removed copy into the copy that's being kept.

Fixes: fd49e8ee70b3 ("Merge branch 'kvm-sev-cgroup' into HEAD")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20220712001045.2364298-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:29 -04:00
Paolo Bonzini
db25eb87ad KVM: SVM: remove unnecessary check on INIT intercept
Since svm_check_nested_events() is now handling INIT signals, there is
no need to latch it until the VMEXIT is injected.  The only condition
under which INIT signals are latched is GIF=0.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220819165643.83692-1-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:28 -04:00
Uros Bizjak
afe30b59d3 KVM/VMX: Avoid stack engine synchronization uop in __vmx_vcpu_run
Avoid instructions with explicit uses of the stack pointer between
instructions that implicitly refer to it. The sequence of
POP %reg; ADD $x, %RSP; POP %reg forces emission of synchronization
uop to synchronize the value of the stack pointer in the stack engine
and the out-of-order core.

Using POP with the dummy register instead of ADD $x, %RSP results in a
smaller code size and faster code.

The patch also fixes the reference to the wrong register in the
nearby comment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20220816211010.25693-1-ubizjak@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:27 -04:00
Miaohe Lin
5a2a961be2 KVM: fix memoryleak in kvm_init()
When alloc_cpumask_var_node() fails for a certain cpu, there might be some
allocated cpumasks for percpu cpu_kick_mask. We should free these cpumasks
or memoryleak will occur.

Fixes: baff59ccdc65 ("KVM: Pre-allocate cpumasks for kvm_make_all_cpus_request_except()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/20220823063414.59778-1-linmiaohe@huawei.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:27 -04:00
Yosry Ahmed
d38ba8ccd9 KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
Count the pages used by KVM in arm64 for stage2 mmu in memory stats
under secondary pagetable stats (e.g. "SecPageTables" in /proc/meminfo)
to give better visibility into the memory consumption of KVM mmu in a
similar way to how normal user page tables are accounted.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220823004639.2387269-5-yosryahmed@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-30 07:44:25 -07:00
Yosry Ahmed
43a063cab3 KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
Count the pages used by KVM mmu on x86 in memory stats under secondary
pagetable stats (e.g. "SecPageTables" in /proc/meminfo) to give better
visibility into the memory consumption of KVM mmu in a similar way to
how normal user page tables are accounted.

Add the inner helper in common KVM, ARM will also use it to count stats
in a future commit.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: Marc Zyngier <maz@kernel.org> # generic KVM changes
Link: https://lore.kernel.org/r/20220823004639.2387269-3-yosryahmed@google.com
Link: https://lore.kernel.org/r/20220823004639.2387269-4-yosryahmed@google.com
[sean: squash x86 usage to workaround modpost issues]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-30 07:41:12 -07:00
Yosry Ahmed
ebc97a52b5 mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by KVM mmu is not accounted in any of those
kernel memory stats. This patch series accounts the memory pages
used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account
for other types of secondary pages tables (e.g. iommu page tables).

KVM has a decent number of large allocations that aren't for page
tables, but for most of them, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory
assigned to the VM. KVM's secondary page table allocations do not scale
linearly, especially when nested virtualization is in use.

From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's
per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is
forced to allocate a large number of page tables even though the guest
isn't accessing that much memory). However, someone would need to either
understand how KVM works to make that connection, or know (or be told) to
go look at KVM's stats if they're running VMs to better decipher the stats.

Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE
is informative. For example, when backing a VM with THP vs. HugeTLB,
NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order
of magnitude higher with THP. So having this stat will at the very least
prove to be useful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

The original discussion with more details about the rationale:
https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:51:42 -07:00
Miaohe Lin
d7c9bfb9ca KVM: x86/mmu: fix memoryleak in kvm_mmu_vendor_module_init()
When register_shrinker() fails, KVM doesn't release the percpu counter
kvm_total_used_mmu_pages leading to memoryleak. Fix this issue by calling
percpu_counter_destroy() when register_shrinker() fails.

Fixes: ab271bd4dfd5 ("x86: kvm: propagate register_shrinker return code")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/20220823063237.47299-1-linmiaohe@huawei.com
[sean: tweak shortlog and changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:47:49 -07:00
Michal Luczaj
6aa5c47c35 KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
The emulator checks the wrong variable while setting the CPU
interruptibility state, the target segment is embedded in the instruction
opcode, not the ModR/M register.  Fix the condition.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Fixes: a5457e7bcf9a ("KVM: emulate: POP SS triggers a MOV SS shadow too")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20220821215900.1419215-1-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:45:40 -07:00
Junaid Shahid
b24ede2253 kvm: x86: Do proper cleanup if kvm_x86_ops->vm_init() fails
If vm_init() fails [which can happen, for instance, if a memory
allocation fails during avic_vm_init()], we need to cleanup some
state in order to avoid resource leaks.

Signed-off-by: Junaid Shahid <junaids@google.com>
Link: https://lore.kernel.org/r/20220729224329.323378-1-junaids@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:41:59 -07:00
David Matlack
372d070845 KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE()
Change the mov in KVM_ASM_SAFE() that zeroes @vector to a movb to
make it unambiguous.

This fixes a build failure with Clang since, unlike the GNU assembler,
the LLVM integrated assembler rejects ambiguous X86 instructions that
don't have suffixes:

  In file included from x86_64/hyperv_features.c:13:
  include/x86_64/processor.h:825:9: error: ambiguous instructions require an explicit suffix (could be 'movb', 'movw', 'movl', or 'movq')
          return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                 ^
  include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
          asm volatile(KVM_ASM_SAFE(insn)                 \
                       ^
  include/x86_64/processor.h:788:16: note: expanded from macro 'KVM_ASM_SAFE'
          "1: " insn "\n\t"                                       \
                        ^
  <inline asm>:5:2: note: instantiated into assembly here
          mov $0, 15(%rsp)
          ^

It seems like this change could introduce undesirable behavior in the
future, e.g. if someone used a type larger than a u8 for @vector, since
KVM_ASM_SAFE() will only zero the bottom byte. I tried changing the type
of @vector to an int to see what would happen. GCC failed to compile due
to a size mismatch between `movb` and `%eax`. Clang succeeded in
compiling, but the generated code looked correct, so perhaps it will not
be an issue. That being said it seems like there could be a better
solution to this issue that does not assume @vector is a u8.

Fixes: 3b23054cd3f5 ("KVM: selftests: Add x86-64 support for exception fixup")
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722234838.2160385-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 07:38:05 -04:00
David Matlack
67ef8664cc KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang
Change KVM_EXCEPTION_MAGIC to use the all-caps "ULL", rather than lower
case. This fixes a build failure with Clang:

  In file included from x86_64/hyperv_features.c:13:
  include/x86_64/processor.h:825:9: error: unexpected token in argument list
          return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                 ^
  include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
          asm volatile(KVM_ASM_SAFE(insn)                 \
                       ^
  include/x86_64/processor.h:785:2: note: expanded from macro 'KVM_ASM_SAFE'
          "mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t"   \
          ^
  <inline asm>:1:18: note: instantiated into assembly here
          mov $0xabacadabaull, %r9
                          ^

Fixes: 3b23054cd3f5 ("KVM: selftests: Add x86-64 support for exception fixup")
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722234838.2160385-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 07:38:05 -04:00
Jim Mattson
020dac4187 KVM: VMX: Heed the 'msr' argument in msr_write_intercepted()
Regardless of the 'msr' argument passed to the VMX version of
msr_write_intercepted(), the function always checks to see if a
specific MSR (IA32_SPEC_CTRL) is intercepted for write.  This behavior
seems unintentional and unexpected.

Modify the function so that it checks to see if the provided 'msr'
index is intercepted for write.

Fixes: 67f4b9969c30 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220810213050.2655000-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 07:38:04 -04:00
Junaid Shahid
b64d740ea7 kvm: x86: mmu: Always flush TLBs when enabling dirty logging
When A/D bits are not available, KVM uses a software access tracking
mechanism, which involves making the SPTEs inaccessible. However,
the clear_young() MMU notifier does not flush TLBs. So it is possible
that there may still be stale, potentially writable, TLB entries.
This is usually fine, but can be problematic when enabling dirty
logging, because it currently only does a TLB flush if any SPTEs were
modified. But if all SPTEs are in access-tracked state, then there
won't be a TLB flush, which means that the guest could still possibly
write to memory and not have it reflected in the dirty bitmap.

So just unconditionally flush the TLBs when enabling dirty logging.
As an alternative, KVM could explicitly check the MMU-Writable bit when
write-protecting SPTEs to decide if a flush is needed (instead of
checking the Writable bit), but given that a flush almost always happens
anyway, so just making it unconditional seems simpler.

Signed-off-by: Junaid Shahid <junaids@google.com>
Message-Id: <20220810224939.2611160-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 07:38:03 -04:00
Junaid Shahid
1441ca1494 kvm: x86: mmu: Drop the need_remote_flush() function
This is only used by kvm_mmu_pte_write(), which no longer actually
creates the new SPTE and instead just clears the old SPTE. So we
just need to check if the old SPTE was shadow-present instead of
calling need_remote_flush(). Hence we can drop this function. It was
incomplete anyway as it didn't take access-tracking into account.

This patch should not result in any functional change.

Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220723024316.2725328-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 07:38:02 -04:00
Paolo Bonzini
959d6c4ae2 KVM/arm64 fixes for 6.0, take #1
- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
 
 - Tidy-up handling of AArch32 on asymmetric systems
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmL+RsgPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDHrUP/3IYZ0LnYUZBImSU/YTPL5yYzdSVAuMNcdRQ
 EgvLQKwP+JSrmd7B7wZ4MhY1LheKjpNmmuSqTRsZOHb/yBmnh3+ao5n2gqusYQeJ
 PCuLYjeF7ZU5fGIrPAW6BW0BFlmYMVbTrC6SEMhZsisBhna44jrrWgkBz9mOsXE/
 YcDWv8kP15lisuQzMvnYxmZobbVgSJ3KgQY4/Dp6vyKMR8ULujCxziFV5R4RD0xP
 Ay8wnxtMUymx9P6sZsd6Vwi5h1MUXOOoI4He7+8ejIfoMOManIMOIq4PDQhINwQv
 tGysDmQavftSbUkXJ1VB+8cJ/9KufzwKxFoc5WqGk1y14QulyBNyb/XR3UtORe1n
 bitINTTkqibHY6fdQJA7z1sD0jaEAh/xNwO1Gq0BS40o4XVQDv2BjdQir9TEdlZO
 tsZKVaFpN3UZe681ru12No8YzQDhpuLH65gDHDjLaftH99WKsrSwMZLoEjqZTlM/
 vH/9acd4UB+9zMGTpN2tJ//2cq6g3JoUC7jJIQB1oGStHX0/7AxKMlabR2xHmt9E
 4CmJND9RLK6+yEelagxOAYMQfnCdj6pW/3bhvAmsZWh0t3fNxCeBBXFr2I5os+E9
 hV0FYx4PG9GtorMSqudCDsP83SDIxCluNZ5iM8t1suSn3dFhk5bDFChS86XGuqQe
 XxHQ6JTF
 =r5iq
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.0, take #1

- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK

- Tidy-up handling of AArch32 on asymmetric systems
2022-08-19 05:43:53 -04:00
Li kunyu
eceb6e1d53 KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device()
The variable is initialized but it is only used after its assignment.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Message-Id: <20220819021535.483702-1-kunyu@nfschina.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:43 -04:00
Li kunyu
2824913976 KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()
The variable is initialized but it is only used after its assignment.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Message-Id: <20220819022804.483914-1-kunyu@nfschina.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:43 -04:00
Josh Poimboeuf
3d9606b0e0 x86/kvm: Fix "missing ENDBR" BUG for fastop functions
The following BUG was reported:

  traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm]
  ------------[ cut here ]------------
  kernel BUG at arch/x86/kernel/traps.c:253!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
   <TASK>
   asm_exc_control_protection+0x2b/0x30
  RIP: 0010:andw_ax_dx+0x0/0x10 [kvm]
  Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc
        cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00
        <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21
        d0

   ? andb_al_dl+0x10/0x10 [kvm]
   ? fastop+0x5d/0xa0 [kvm]
   x86_emulate_insn+0x822/0x1060 [kvm]
   x86_emulate_instruction+0x46f/0x750 [kvm]
   complete_emulated_mmio+0x216/0x2c0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm]
   kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm]
   ? wake_up_q+0xa0/0xa0

The BUG occurred because the ENDBR in the andw_ax_dx() fastop function
had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr().

Objtool marked it to be sealed because KVM has no compile-time
references to the function.  Instead KVM calculates its address at
runtime.

Prevent objtool from annotating fastop functions as sealable by creating
throwaway dummy compile-time references to the functions.

Fixes: 6649fa876da4 ("x86/ibt,kvm: Add ENDBR to fastops")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Debugged-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:42 -04:00
Josh Poimboeuf
22472d1260 x86/kvm: Simplify FOP_SETCC()
SETCC_ALIGN and FOP_ALIGN are both 16.  Remove the special casing for
FOP_SETCC() and just make it a normal fastop.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:42 -04:00
Josh Poimboeuf
e27e5bea95 x86/ibt, objtool: Add IBT_NOSEAL()
Add a macro which prevents a function from getting sealed if there are
no compile-time references to it.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:42 -04:00
Chao Peng
20ec3ebd70 KVM: Rename mmu_notifier_* to mmu_invalidate_*
The motivation of this renaming is to make these variables and related
helper functions less mmu_notifier bound and can also be used for non
mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
better describe the purpose of 'invalidating' a page that those
variables are used for.

  - mmu_notifier_seq/range_start/range_end are renamed to
    mmu_invalidate_seq/range_start/range_end.

  - mmu_notifier_retry{_hva} helper functions are renamed to
    mmu_invalidate_retry{_hva}.

  - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
    avoid confusion with mn_active_invalidate_count.

  - While here, also update kvm_inc/dec_notifier_count() to
    kvm_mmu_invalidate_begin/end() to match the change for
    mmu_notifier_count.

No functional change intended.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:41 -04:00
Chao Peng
bdd1c37a31 KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS
KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM
internally used (invisible to userspace) and avoids confusion to future
private slots that can have different meaning.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:40 -04:00
Paolo Bonzini
b075450868 KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
KVM_PRIVATE_MEM_SLOTS defaults to zero, so it is not necessary to
define it in MIPS's asm/kvm_host.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:26 -04:00
Sean Christopherson
c2b8239701 KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()
Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
and initializing coalesced MMIO objects is separate from registering any
associated devices.  Moving coalesced MMIO cleans up the last oddity
where KVM does VM creation/initialization after kvm_create_vm(), and more
importantly after kvm_arch_post_init_vm() is called and the VM is added
to the global vm_list, i.e. after the VM is fully created as far as KVM
is concerned.

Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
the original implementation was completely devoid of error handling.
Commit 6ce5a090a9a0 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
error handling" fixed the various bugs, and in doing so rightly moved the
call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
registered the coalesced MMIO device.  Commit 2b3c246a682c ("KVM: Make
coalesced mmio use a device per zone") cleaned up that mess by having
each zone register a separate device, i.e. moved device registration to
its logical home in kvm_vm_ioctl_register_coalesced_mmio().  As a result,
kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
be safely called from kvm_create_vm().

Opportunstically drop the #ifdef, KVM provides stubs for
kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:02:31 -04:00
Sean Christopherson
405294f29f KVM: Unconditionally get a ref to /dev/kvm module when creating a VM
Unconditionally get a reference to the /dev/kvm module when creating a VM
instead of using try_get_module(), which will fail if the module is in
the process of being forcefully unloaded.  The error handling when
try_get_module() fails doesn't properly unwind all that has been done,
e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
from the global list.  Not removing VMs from the global list tends to be
fatal, e.g. leads to use-after-free explosions.

The obvious alternative would be to add proper unwinding, but the
justification for using try_get_module(), "rmmod --wait", is completely
bogus as support for "rmmod --wait", i.e. delete_module() without
O_NONBLOCK, was removed by commit 3f2b9c9cdf38 ("module: remove rmmod
--wait option.") nearly a decade ago.

It's still possible for try_get_module() to fail due to the module dying
(more like being killed), as the module will be tagged MODULE_STATE_GOING
by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
with forced unloading is an exercise in futility and gives a falsea sense
of security.  Using try_get_module() only prevents acquiring _new_
references, it doesn't magically put the references held by other VMs,
and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
guaranteed to cause spectacular fireworks; the window where KVM will fail
try_get_module() is tiny compared to the window where KVM is building and
running the VM with an elevated module refcount.

Addressing KVM's inability to play nice with "rmmod --force" is firmly
out-of-scope.  Forcefully unloading any module taints kernel (for obvious
reasons)  _and_ requires the kernel to be built with
CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
amusing disclaimer that it's "mainly for kernel developers and desperate
users".  In other words, KVM is free to scoff at bug reports due to using
"rmmod --force" while VMs may be running.

Fixes: 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:02:31 -04:00
Sean Christopherson
4ba4f41942 KVM: Properly unwind VM creation if creating debugfs fails
Properly unwind VM creation if kvm_create_vm_debugfs() fails.  A recent
change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
by buggy try_get_module() handling adding by commit 5f6de5cbebee ("KVM:
Prevent module exit until all VMs are freed").  The debugfs error path
effectively inherits the bad error path of try_module_get(), e.g. KVM
leaves the to-be-free VM on vm_list even though KVM appears to do the
right thing by calling module_put() and falling through.

Opportunistically hoist kvm_create_vm_debugfs() above the call to
kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
for the moment).  x86 is the only non-nop implementation of the post-init
hook, and it doesn't allocate/initialize any objects that are reachable
via debugfs code (spawns a kthread worker for the NX huge page mitigation).

Leave the buggy try_get_module() alone for now, it will be fixed in a
separate commit.

Fixes: b74ed7a68ec1 ("KVM: Actually create debugfs in kvm_create_vm()")
Reported-by: syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com
Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20220816053937.2477106-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:02:30 -04:00
Oliver Upton
b10d86fb8e KVM: arm64: Reject 32bit user PSTATE on asymmetric systems
KVM does not support AArch32 EL0 on asymmetric systems. To that end,
prevent userspace from configuring a vCPU in such a state through
setting PSTATE.

It is already ABI that KVM rejects such a write on a system where
AArch32 EL0 is unsupported. Though the kernel's definition of a 32bit
system changed in commit 2122a833316f ("arm64: Allow mismatched
32-bit EL0 support"), KVM's did not.

Fixes: 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220816192554.1455559-3-oliver.upton@linux.dev
2022-08-17 10:29:07 +01:00
Oliver Upton
f3c6efc72f KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems
KVM does not support AArch32 on asymmetric systems. To that end, enforce
AArch64-only behavior on PMCR_EL1.LC when on an asymmetric system.

Fixes: 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220816192554.1455559-2-oliver.upton@linux.dev
2022-08-17 10:29:07 +01:00
Linus Torvalds
568035b01c Linux 6.0-rc1 v6.0-rc1 2022-08-14 15:50:18 -07:00