License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
/* SPDX-License-Identifier: GPL-2.0 */
2008-07-03 14:59:22 +03:00
# ifndef ARCH_X86_KVM_X86_H
# define ARCH_X86_KVM_X86_H
# include <linux/kvm_host.h>
2023-04-04 17:45:16 -07:00
# include <asm/fpu/xstate.h>
2020-10-29 14:56:00 +01:00
# include <asm/mce.h>
2016-06-20 22:28:02 -03:00
# include <asm/pvclock.h>
2010-01-21 15:31:48 +02:00
# include "kvm_cache_regs.h"
2020-02-18 15:29:49 -08:00
# include "kvm_emulate.h"
2008-07-03 14:59:22 +03:00
2022-05-24 21:56:23 +08:00
struct kvm_caps {
/* control of guest tsc rate supported? */
bool has_tsc_control ;
/* maximum supported tsc_khz for guests */
u32 max_guest_tsc_khz ;
/* number of bits of the fractional part of the TSC scaling ratio */
u8 tsc_scaling_ratio_frac_bits ;
/* maximum allowed value of TSC scaling ratio */
u64 max_tsc_scaling_ratio ;
/* 1ull << kvm_caps.tsc_scaling_ratio_frac_bits */
u64 default_tsc_scaling_ratio ;
/* bus lock detection supported? */
bool has_bus_lock_exit ;
2022-05-24 21:56:24 +08:00
/* notify VM exit supported? */
bool has_notify_vmexit ;
2024-04-04 08:13:18 -04:00
/* bit mask of VM types */
u32 supported_vm_types ;
2022-05-24 21:56:23 +08:00
u64 supported_mce_cap ;
u64 supported_xcr0 ;
u64 supported_xss ;
2022-10-06 00:03:11 +00:00
u64 supported_perf_cap ;
2022-05-24 21:56:23 +08:00
} ;
2024-04-23 15:15:18 -07:00
struct kvm_host_values {
2024-04-23 15:15:21 -07:00
/*
* The host ' s raw MAXPHYADDR , i . e . the number of non - reserved physical
* address bits irrespective of features that repurpose legal bits ,
* e . g . MKTME .
*/
u8 maxphyaddr ;
2024-04-23 15:15:18 -07:00
u64 efer ;
u64 xcr0 ;
u64 xss ;
u64 arch_capabilities ;
} ;
2021-08-09 10:39:55 -07:00
void kvm_spurious_fault ( void ) ;
2021-02-03 16:01:16 -08:00
# define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check) \
( { \
bool failed = ( consistency_check ) ; \
if ( failed ) \
trace_kvm_nested_vmenter_failed ( # consistency_check , 0 ) ; \
failed ; \
} )
2023-03-10 16:46:00 -08:00
/*
* The first . . . last VMX feature MSRs that are emulated by KVM . This may or may
* not cover all known VMX MSRs , as KVM doesn ' t emulate an MSR until there ' s an
* associated feature that KVM supports for nested virtualization .
*/
# define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
# define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
2018-03-16 16:37:24 -04:00
# define KVM_DEFAULT_PLE_GAP 128
# define KVM_VMX_DEFAULT_PLE_WINDOW 4096
# define KVM_DEFAULT_PLE_WINDOW_GROW 2
# define KVM_DEFAULT_PLE_WINDOW_SHRINK 0
# define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX
2018-03-16 16:37:26 -04:00
# define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX
# define KVM_SVM_DEFAULT_PLE_WINDOW 3000
2018-03-16 16:37:24 -04:00
static inline unsigned int __grow_ple_window ( unsigned int val ,
unsigned int base , unsigned int modifier , unsigned int max )
{
u64 ret = val ;
if ( modifier < 1 )
return base ;
if ( modifier < base )
ret * = modifier ;
else
ret + = modifier ;
return min ( ret , ( u64 ) max ) ;
}
static inline unsigned int __shrink_ple_window ( unsigned int val ,
unsigned int base , unsigned int modifier , unsigned int min )
{
if ( modifier < 1 )
return base ;
if ( modifier < base )
val / = modifier ;
else
val - = modifier ;
return max ( val , min ) ;
}
2015-04-27 15:11:25 +02:00
# define MSR_IA32_CR_PAT_DEFAULT 0x0007040600070406ULL
2021-11-25 01:49:43 +00:00
void kvm_service_local_tlb_flush_requests ( struct kvm_vcpu * vcpu ) ;
2021-03-02 09:45:14 -08:00
int kvm_check_nested_events ( struct kvm_vcpu * vcpu ) ;
2023-03-10 16:45:59 -08:00
static inline bool kvm_vcpu_has_run ( struct kvm_vcpu * vcpu )
{
return vcpu - > arch . last_vmentry_cpu ! = - 1 ;
}
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry. This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2. Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.
KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.
The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.
Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.
When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated. Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal. The injected exception is captured, RIP
still points at the original faulting instruction, etc... So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.
Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-30 23:16:08 +00:00
static inline bool kvm_is_exception_pending ( struct kvm_vcpu * vcpu )
{
return vcpu - > arch . exception . pending | |
2022-08-30 23:16:09 +00:00
vcpu - > arch . exception_vmexit . pending | |
kvm_test_request ( KVM_REQ_TRIPLE_FAULT , vcpu ) ;
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry. This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2. Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.
KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.
The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.
Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.
When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated. Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal. The injected exception is captured, RIP
still points at the original faulting instruction, etc... So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.
Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-30 23:16:08 +00:00
}
2008-07-03 14:59:22 +03:00
static inline void kvm_clear_exception_queue ( struct kvm_vcpu * vcpu )
{
2017-11-19 18:25:43 +02:00
vcpu - > arch . exception . pending = false ;
2017-08-24 03:35:09 -07:00
vcpu - > arch . exception . injected = false ;
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry. This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2. Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.
KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.
The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.
Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.
When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated. Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal. The injected exception is captured, RIP
still points at the original faulting instruction, etc... So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.
Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-30 23:16:08 +00:00
vcpu - > arch . exception_vmexit . pending = false ;
2008-07-03 14:59:22 +03:00
}
2009-05-11 13:35:50 +03:00
static inline void kvm_queue_interrupt ( struct kvm_vcpu * vcpu , u8 vector ,
bool soft )
2008-07-03 15:17:01 +03:00
{
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23 03:01:31 +03:00
vcpu - > arch . interrupt . injected = true ;
2009-05-11 13:35:50 +03:00
vcpu - > arch . interrupt . soft = soft ;
2008-07-03 15:17:01 +03:00
vcpu - > arch . interrupt . nr = vector ;
}
static inline void kvm_clear_interrupt_queue ( struct kvm_vcpu * vcpu )
{
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23 03:01:31 +03:00
vcpu - > arch . interrupt . injected = false ;
2008-07-03 15:17:01 +03:00
}
2009-05-11 13:35:46 +03:00
static inline bool kvm_event_needs_reinjection ( struct kvm_vcpu * vcpu )
{
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23 03:01:31 +03:00
return vcpu - > arch . exception . injected | | vcpu - > arch . interrupt . injected | |
2009-05-11 13:35:46 +03:00
vcpu - > arch . nmi_injected ;
}
2009-05-11 13:35:50 +03:00
static inline bool kvm_exception_is_soft ( unsigned int nr )
{
return ( nr = = BP_VECTOR ) | | ( nr = = OF_VECTOR ) ;
}
2009-07-05 17:39:35 +03:00
2010-01-21 15:31:48 +02:00
static inline bool is_protmode ( struct kvm_vcpu * vcpu )
{
2023-03-22 12:58:21 +08:00
return kvm_is_cr0_bit_set ( vcpu , X86_CR0_PE ) ;
2010-01-21 15:31:48 +02:00
}
2023-03-22 12:58:24 +08:00
static inline bool is_long_mode ( struct kvm_vcpu * vcpu )
2010-01-21 15:31:49 +02:00
{
# ifdef CONFIG_X86_64
2023-03-22 12:58:24 +08:00
return ! ! ( vcpu - > arch . efer & EFER_LMA ) ;
2010-01-21 15:31:49 +02:00
# else
2023-03-22 12:58:24 +08:00
return false ;
2010-01-21 15:31:49 +02:00
# endif
}
2014-06-18 17:19:23 +03:00
static inline bool is_64_bit_mode ( struct kvm_vcpu * vcpu )
{
int cs_db , cs_l ;
2021-05-24 12:48:57 -05:00
WARN_ON_ONCE ( vcpu - > arch . guest_state_protected ) ;
2014-06-18 17:19:23 +03:00
if ( ! is_long_mode ( vcpu ) )
return false ;
2024-05-07 21:31:02 +08:00
kvm_x86_call ( get_cs_db_l_bits ) ( vcpu , & cs_db , & cs_l ) ;
2014-06-18 17:19:23 +03:00
return cs_l ;
}
2021-05-24 12:48:57 -05:00
static inline bool is_64_bit_hypercall ( struct kvm_vcpu * vcpu )
{
/*
* If running with protected guest state , the CS register is not
* accessible . The hypercall register values will have had to been
* provided in 64 - bit mode , so assume the guest is in 64 - bit .
*/
return vcpu - > arch . guest_state_protected | | is_64_bit_mode ( vcpu ) ;
}
2018-06-20 17:21:29 -07:00
static inline bool x86_exception_has_error_code ( unsigned int vector )
{
static u32 exception_has_error_code = BIT ( DF_VECTOR ) | BIT ( TS_VECTOR ) |
BIT ( NP_VECTOR ) | BIT ( SS_VECTOR ) | BIT ( GP_VECTOR ) |
BIT ( PF_VECTOR ) | BIT ( AC_VECTOR ) ;
return ( 1U < < vector ) & exception_has_error_code ;
}
2010-09-10 17:30:50 +02:00
static inline bool mmu_is_nested ( struct kvm_vcpu * vcpu )
{
return vcpu - > arch . walk_mmu = = & vcpu - > arch . nested_mmu ;
}
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
static inline bool is_pae ( struct kvm_vcpu * vcpu )
2010-01-21 15:31:49 +02:00
{
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
return kvm_is_cr4_bit_set ( vcpu , X86_CR4_PAE ) ;
2010-01-21 15:31:49 +02:00
}
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
static inline bool is_pse ( struct kvm_vcpu * vcpu )
2010-01-21 15:31:49 +02:00
{
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
return kvm_is_cr4_bit_set ( vcpu , X86_CR4_PSE ) ;
2010-01-21 15:31:49 +02:00
}
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
static inline bool is_paging ( struct kvm_vcpu * vcpu )
2010-01-21 15:31:49 +02:00
{
KVM: x86: Use boolean return value for is_{pae,pse,paging}()
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return
bools. Returning an "int" requires not one, but two implicit casts, first
from "unsigned long" to "int", and then again to a "bool". Both casts are
more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit
kernels _if_ the bits in question weren't in the lower 32 bits, and the
int=>bool cast can result in false negatives/positives, e.g. see commit
0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in
svm_set_cr0()").
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 12:58:22 +08:00
return likely ( kvm_is_cr0_bit_set ( vcpu , X86_CR0_PG ) ) ;
2010-01-21 15:31:49 +02:00
}
2019-06-06 18:52:44 +02:00
static inline bool is_pae_paging ( struct kvm_vcpu * vcpu )
{
return ! is_long_mode ( vcpu ) & & is_pae ( vcpu ) & & is_paging ( vcpu ) ;
}
2017-08-24 20:27:56 +08:00
static inline u8 vcpu_virt_addr_bits ( struct kvm_vcpu * vcpu )
{
2023-03-22 12:58:21 +08:00
return kvm_is_cr4_bit_set ( vcpu , X86_CR4_LA57 ) ? 57 : 48 ;
2017-08-24 20:27:56 +08:00
}
static inline bool is_noncanonical_address ( u64 la , struct kvm_vcpu * vcpu )
{
2022-01-31 09:24:50 +02:00
return ! __is_canonical_address ( la , vcpu_virt_addr_bits ( vcpu ) ) ;
2017-08-24 20:27:56 +08:00
}
2011-07-12 03:23:20 +08:00
static inline void vcpu_cache_mmio_info ( struct kvm_vcpu * vcpu ,
gva_t gva , gfn_t gfn , unsigned access )
{
2019-02-05 13:01:13 -08:00
u64 gen = kvm_memslots ( vcpu - > kvm ) - > generation ;
KVM: Explicitly define the "memslot update in-progress" bit
KVM uses bit 0 of the memslots generation as an "update in-progress"
flag, which is used by x86 to prevent caching MMIO access while the
memslots are changing. Although the intended behavior is flag-like,
e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
caching data from in-flux memslots, the implementation oftentimes treats
the bit as part of the generation number itself, e.g. incrementing the
generation increments twice, once to set the flag and once to clear it.
Prior to commit 4bd518f1598d ("KVM: use separate generations for
each address space"), incorporating the "update in-progress" bit into
the generation number largely made sense, e.g. "real" generations are
even, "bogus" generations are odd, most code doesn't need to be aware of
the bit, etc...
Now that unique memslots generation numbers are assigned to each address
space, stealthing the in-progress status into the generation number
results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
over bit 0 when initializing the memslots generation without any hint as
to why.
Explicitly define the flag and convert as much code as possible (which
isn't much) to actually treat it like a flag. This paves the way for
eventually using a different bit for "update in-progress" so that it can
be a flag in truth instead of a awkward extension to the generation
number.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 13:01:14 -08:00
if ( unlikely ( gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS ) )
2019-02-05 13:01:13 -08:00
return ;
2017-08-17 18:36:58 +02:00
/*
* If this is a shadow nested page table , the " GVA " is
* actually a nGPA .
*/
vcpu - > arch . mmio_gva = mmu_is_nested ( vcpu ) ? 0 : gva & PAGE_MASK ;
2019-08-01 13:35:21 -07:00
vcpu - > arch . mmio_access = access ;
2011-07-12 03:23:20 +08:00
vcpu - > arch . mmio_gfn = gfn ;
2019-02-05 13:01:13 -08:00
vcpu - > arch . mmio_gen = gen ;
2014-08-18 15:46:07 -07:00
}
static inline bool vcpu_match_mmio_gen ( struct kvm_vcpu * vcpu )
{
return vcpu - > arch . mmio_gen = = kvm_memslots ( vcpu - > kvm ) - > generation ;
2011-07-12 03:23:20 +08:00
}
/*
2014-08-18 15:46:07 -07:00
* Clear the mmio cache info for the given gva . If gva is MMIO_GVA_ANY , we
* clear all mmio cache info .
2011-07-12 03:23:20 +08:00
*/
2014-08-18 15:46:07 -07:00
# define MMIO_GVA_ANY (~(gva_t)0)
2011-07-12 03:23:20 +08:00
static inline void vcpu_clear_mmio_info ( struct kvm_vcpu * vcpu , gva_t gva )
{
2014-08-18 15:46:07 -07:00
if ( gva ! = MMIO_GVA_ANY & & vcpu - > arch . mmio_gva ! = ( gva & PAGE_MASK ) )
2011-07-12 03:23:20 +08:00
return ;
vcpu - > arch . mmio_gva = 0 ;
}
static inline bool vcpu_match_mmio_gva ( struct kvm_vcpu * vcpu , unsigned long gva )
{
2014-08-18 15:46:07 -07:00
if ( vcpu_match_mmio_gen ( vcpu ) & & vcpu - > arch . mmio_gva & &
vcpu - > arch . mmio_gva = = ( gva & PAGE_MASK ) )
2011-07-12 03:23:20 +08:00
return true ;
return false ;
}
static inline bool vcpu_match_mmio_gpa ( struct kvm_vcpu * vcpu , gpa_t gpa )
{
2014-08-18 15:46:07 -07:00
if ( vcpu_match_mmio_gen ( vcpu ) & & vcpu - > arch . mmio_gfn & &
vcpu - > arch . mmio_gfn = = gpa > > PAGE_SHIFT )
2011-07-12 03:23:20 +08:00
return true ;
return false ;
}
2021-04-21 19:21:28 -07:00
static inline unsigned long kvm_register_read ( struct kvm_vcpu * vcpu , int reg )
2014-06-18 17:19:23 +03:00
{
2021-04-21 19:21:28 -07:00
unsigned long val = kvm_register_read_raw ( vcpu , reg ) ;
2014-06-18 17:19:23 +03:00
return is_64_bit_mode ( vcpu ) ? val : ( u32 ) val ;
}
2021-04-21 19:21:28 -07:00
static inline void kvm_register_write ( struct kvm_vcpu * vcpu ,
2019-09-27 14:45:20 -07:00
int reg , unsigned long val )
2014-06-18 17:19:26 +03:00
{
if ( ! is_64_bit_mode ( vcpu ) )
val = ( u32 ) val ;
2021-04-21 19:21:28 -07:00
return kvm_register_write_raw ( vcpu , reg , val ) ;
2014-06-18 17:19:26 +03:00
}
2015-07-23 08:22:45 +02:00
static inline bool kvm_check_has_quirk ( struct kvm * kvm , u64 quirk )
{
return ! ( kvm - > arch . disabled_quirks & quirk ) ;
}
2019-08-27 14:40:36 -07:00
void kvm_inject_realmode_interrupt ( struct kvm_vcpu * vcpu , int irq , int inc_eip ) ;
2010-04-19 13:32:45 +08:00
2016-09-01 14:21:03 +02:00
u64 get_kvmclock_ns ( struct kvm * kvm ) ;
2023-10-05 10:16:10 +01:00
uint64_t kvm_get_wall_clock_epoch ( struct kvm * kvm ) ;
KVM: x86/xen: improve accuracy of Xen timers
A test program such as http://david.woodhou.se/timerlat.c confirms user
reports that timers are increasingly inaccurate as the lifetime of a
guest increases. Reporting the actual delay observed when asking for
100µs of sleep, it starts off OK on a newly-launched guest but gets
worse over time, giving incorrect sleep times:
root@ip-10-0-193-21:~# ./timerlat -c -n 5
00000000 latency 103243/100000 (3.2430%)
00000001 latency 103243/100000 (3.2430%)
00000002 latency 103242/100000 (3.2420%)
00000003 latency 103245/100000 (3.2450%)
00000004 latency 103245/100000 (3.2450%)
The biggest problem is that get_kvmclock_ns() returns inaccurate values
when the guest TSC is scaled. The guest sees a TSC value scaled from the
host TSC by a mul/shift conversion (hopefully done in hardware). The
guest then converts that guest TSC value into nanoseconds using the
mul/shift conversion given to it by the KVM pvclock information.
But get_kvmclock_ns() performs only a single conversion directly from
host TSC to nanoseconds, giving a different result. A test program at
http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
over a day.
It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
that. The actual guest hv_clock is per-CPU, and *theoretically* each
vCPU could be running at a *different* frequency. But this patch is
needed anyway because...
The other issue with Xen timers was that the code would snapshot the
host CLOCK_MONOTONIC at some point in time, and then... after a few
interrupts may have occurred, some preemption perhaps... would also read
the guest's kvmclock. Then it would proceed under the false assumption
that those two happened at the *same* time. Any time which *actually*
elapsed between reading the two clocks was introduced as inaccuracies
in the time at which the timer fired.
Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
host TSC just *once*, then use the returned TSC value to calculate the
kvmclock (making sure to do that the way the guest would instead of
making the same mistake get_kvmclock_ns() does).
Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
timers still have to use CLOCK_MONOTONIC. In practice the difference
between the two won't matter over the timescales involved, as the
*absolute* values don't matter; just the delta.
This does mean a new variant of kvm_get_time_and_clockread() is needed;
called kvm_get_monotonic_and_clockread() because that's what it does.
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240227115648.3104-2-dwmw2@infradead.org
[sean: massage moved comment, tweak if statement formatting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-27 11:49:15 +00:00
bool kvm_get_monotonic_and_clockread ( s64 * kernel_ns , u64 * tsc_timestamp ) ;
2010-08-19 22:07:17 -10:00
2018-06-06 17:37:49 +02:00
int kvm_read_guest_virt ( struct kvm_vcpu * vcpu ,
2011-05-25 23:04:56 +03:00
gva_t addr , void * val , unsigned int bytes ,
struct x86_exception * exception ) ;
2018-06-06 17:37:49 +02:00
int kvm_write_guest_virt_system ( struct kvm_vcpu * vcpu ,
2011-05-25 23:08:00 +03:00
gva_t addr , void * val , unsigned int bytes ,
struct x86_exception * exception ) ;
2018-04-03 16:28:48 -07:00
int handle_ud ( struct kvm_vcpu * vcpu ) ;
2022-08-30 23:16:01 +00:00
void kvm_deliver_exception_payload ( struct kvm_vcpu * vcpu ,
struct kvm_queued_exception * ex ) ;
2018-10-16 14:29:22 -07:00
2015-06-15 16:55:22 +08:00
int kvm_mtrr_set_msr ( struct kvm_vcpu * vcpu , u32 msr , u64 data ) ;
int kvm_mtrr_get_msr ( struct kvm_vcpu * vcpu , u32 msr , u64 * pdata ) ;
2016-01-25 16:53:33 +08:00
bool kvm_vector_hashing_enabled ( void ) ;
2020-07-10 17:48:03 +02:00
void kvm_fixup_and_inject_pf_error ( struct kvm_vcpu * vcpu , gva_t gva , u16 error_code ) ;
2021-01-26 03:18:28 -05:00
int x86_decode_emulated_instruction ( struct kvm_vcpu * vcpu , int emulation_type ,
void * insn , int insn_len ) ;
2019-12-06 15:57:14 -08:00
int x86_emulate_instruction ( struct kvm_vcpu * vcpu , gpa_t cr2_or_gpa ,
2018-08-23 13:56:53 -07:00
int emulation_type , void * insn , int insn_len ) ;
2020-04-28 14:23:25 +08:00
fastpath_t handle_fastpath_set_msr_irqoff ( struct kvm_vcpu * vcpu ) ;
2014-09-18 22:39:44 +03:00
2022-05-24 21:56:23 +08:00
extern struct kvm_caps kvm_caps ;
2024-04-23 15:15:18 -07:00
extern struct kvm_host_values kvm_host ;
2022-05-24 21:56:23 +08:00
2022-01-11 15:38:23 +08:00
extern bool enable_pmu ;
2014-02-24 12:15:16 +01:00
2023-04-04 17:45:15 -07:00
/*
* Get a filtered version of KVM ' s supported XCR0 that strips out dynamic
* features for which the current process doesn ' t ( yet ) have permission to use .
* This is intended to be used only when enumerating support to userspace ,
* e . g . in KVM_GET_SUPPORTED_CPUID and KVM_CAP_XSAVE2 , it does NOT need to be
* used to check / restrict guest behavior as KVM rejects KVM_SET_CPUID { 2 } if
* userspace attempts to enable unpermitted features .
*/
static inline u64 kvm_get_filtered_xcr0 ( void )
{
2023-04-04 17:45:16 -07:00
u64 permitted_xcr0 = kvm_caps . supported_xcr0 ;
BUILD_BUG_ON ( XFEATURE_MASK_USER_DYNAMIC ! = XFEATURE_MASK_XTILE_DATA ) ;
if ( permitted_xcr0 & XFEATURE_MASK_USER_DYNAMIC ) {
permitted_xcr0 & = xstate_get_guest_group_perm ( ) ;
/*
* Treat XTILE_CFG as unsupported if the current process isn ' t
* allowed to use XTILE_DATA , as attempting to set XTILE_CFG in
* XCR0 without setting XTILE_DATA is architecturally illegal .
*/
if ( ! ( permitted_xcr0 & XFEATURE_MASK_XTILE_DATA ) )
permitted_xcr0 & = ~ XFEATURE_MASK_XTILE_CFG ;
}
return permitted_xcr0 ;
2023-04-04 17:45:15 -07:00
}
2020-03-02 15:56:25 -08:00
static inline bool kvm_mpx_supported ( void )
{
2022-05-24 21:56:23 +08:00
return ( kvm_caps . supported_xcr0 & ( XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR ) )
2020-03-02 15:56:25 -08:00
= = ( XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR ) ;
}
2014-01-06 12:00:02 -02:00
extern unsigned int min_timer_period_us ;
2018-03-12 13:12:47 +02:00
extern bool enable_vmware_backdoor ;
2019-07-06 09:26:51 +08:00
extern int pi_inject_timer ;
2021-01-08 09:36:55 +08:00
extern bool report_ignored_msrs ;
2022-01-19 23:07:37 +00:00
extern bool eager_page_split ;
2023-01-24 23:49:01 +00:00
static inline void kvm_pr_unimpl_wrmsr ( struct kvm_vcpu * vcpu , u32 msr , u64 data )
{
if ( report_ignored_msrs )
vcpu_unimpl ( vcpu , " Unhandled WRMSR(0x%x) = 0x%llx \n " , msr , data ) ;
}
static inline void kvm_pr_unimpl_rdmsr ( struct kvm_vcpu * vcpu , u32 msr )
{
if ( report_ignored_msrs )
vcpu_unimpl ( vcpu , " Unhandled RDMSR(0x%x) \n " , msr ) ;
}
2016-06-20 22:28:02 -03:00
static inline u64 nsec_to_cycles ( struct kvm_vcpu * vcpu , u64 nsec )
{
return pvclock_scale_delta ( nsec , vcpu - > arch . virtual_tsc_mult ,
vcpu - > arch . virtual_tsc_shift ) ;
}
2016-01-22 11:39:22 +01:00
/* Same "calling convention" as do_div:
* - divide ( n < < 32 ) by base
* - put result in n
* - return remainder
*/
# define do_shl32_div32(n, base) \
( { \
u32 __quot , __rem ; \
asm ( " divl %2 " : " =a " ( __quot ) , " =d " ( __rem ) \
: " rm " ( base ) , " 0 " ( 0 ) , " 1 " ( ( u32 ) n ) ) ; \
n = __quot ; \
__rem ; \
} )
2018-03-12 04:53:02 -07:00
static inline bool kvm_mwait_in_guest ( struct kvm * kvm )
2017-04-21 12:27:17 +02:00
{
2018-03-12 04:53:02 -07:00
return kvm - > arch . mwait_in_guest ;
2017-04-21 12:27:17 +02:00
}
2018-03-12 04:53:03 -07:00
static inline bool kvm_hlt_in_guest ( struct kvm * kvm )
{
return kvm - > arch . hlt_in_guest ;
}
2018-03-12 04:53:04 -07:00
static inline bool kvm_pause_in_guest ( struct kvm * kvm )
{
return kvm - > arch . pause_in_guest ;
}
2019-05-21 14:06:53 +08:00
static inline bool kvm_cstate_in_guest ( struct kvm * kvm )
{
return kvm - > arch . cstate_in_guest ;
}
2022-05-24 21:56:24 +08:00
static inline bool kvm_notify_vmexit_enabled ( struct kvm * kvm )
{
return kvm - > arch . notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED ;
}
2022-12-13 06:09:12 +00:00
static __always_inline void kvm_before_interrupt ( struct kvm_vcpu * vcpu ,
enum kvm_intr_type intr )
2017-07-25 17:20:32 -07:00
{
2021-11-11 02:07:32 +00:00
WRITE_ONCE ( vcpu - > arch . handling_intr_from_guest , ( u8 ) intr ) ;
2017-07-25 17:20:32 -07:00
}
2022-12-13 06:09:12 +00:00
static __always_inline void kvm_after_interrupt ( struct kvm_vcpu * vcpu )
2017-07-25 17:20:32 -07:00
{
2021-11-11 02:07:31 +00:00
WRITE_ONCE ( vcpu - > arch . handling_intr_from_guest , 0 ) ;
2017-07-25 17:20:32 -07:00
}
2021-11-11 02:07:31 +00:00
static inline bool kvm_handling_nmi_from_guest ( struct kvm_vcpu * vcpu )
{
2021-11-11 02:07:32 +00:00
return vcpu - > arch . handling_intr_from_guest = = KVM_HANDLING_NMI ;
2021-11-11 02:07:31 +00:00
}
2019-04-10 11:41:40 +02:00
static inline bool kvm_pat_valid ( u64 data )
{
if ( data & 0xF8F8F8F8F8F8F8F8ull )
return false ;
/* 0, 1, 4, 5, 6, 7 are valid values. */
return ( data | ( ( data & 0x0202020202020202ull ) < < 1 ) ) = = data ;
}
2020-01-24 15:07:22 -08:00
static inline bool kvm_dr7_valid ( u64 data )
2020-01-15 19:54:32 -05:00
{
/* Bits [63:32] are reserved */
return ! ( data > > 32 ) ;
}
2020-05-22 18:19:51 -04:00
static inline bool kvm_dr6_valid ( u64 data )
{
/* Bits [63:32] are reserved */
return ! ( data > > 32 ) ;
}
2020-01-15 19:54:32 -05:00
2020-10-29 14:56:00 +01:00
/*
* Trigger machine check on the host . We assume all the MSRs are already set up
* by the CPU and that we still run on the same CPU as the MCE occurred on .
* We pass a fake environment to the machine check handler because we want
* the guest to be always treated like user space , no matter what context
* it used internally .
*/
static inline void kvm_machine_check ( void )
{
# if defined(CONFIG_X86_MCE)
struct pt_regs regs = {
. cs = 3 , /* Fake ring 3 no matter what the guest ran on */
. flags = X86_EFLAGS_IF ,
} ;
do_machine_check ( & regs ) ;
# endif
}
2019-10-21 16:30:25 -07:00
void kvm_load_guest_xsave_state ( struct kvm_vcpu * vcpu ) ;
void kvm_load_host_xsave_state ( struct kvm_vcpu * vcpu ) ;
2020-07-08 14:57:31 +03:00
int kvm_spec_ctrl_test_value ( u64 value ) ;
KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits
Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits
checks, into a separate helper, __kvm_is_valid_cr4(), and export only the
inner helper to vendor code in order to prevent nested VMX from calling
back into vmx_is_valid_cr4() via kvm_is_valid_cr4().
On SVM, this is a nop as SVM doesn't place any additional restrictions on
CR4.
On VMX, this is also currently a nop, but only because nested VMX is
missing checks on reserved CR4 bits for nested VM-Enter. That bug will
be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is,
but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's
restrictions on CR0/CR4. The cleanest and most intuitive way to fix the
VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant
routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do
the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's
restrictions if and only if the vCPU is post-VMXON.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220607213604.3346000-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07 21:35:50 +00:00
bool __kvm_is_valid_cr4 ( struct kvm_vcpu * vcpu , unsigned long cr4 ) ;
2020-09-11 14:29:05 -05:00
int kvm_handle_memory_failure ( struct kvm_vcpu * vcpu , int r ,
struct x86_exception * e ) ;
2020-09-11 14:29:12 -05:00
int kvm_handle_invpcid ( struct kvm_vcpu * vcpu , unsigned long type , gva_t gva ) ;
2020-09-25 16:34:17 +02:00
bool kvm_msr_allowed ( struct kvm_vcpu * vcpu , u32 index , u32 type ) ;
2019-04-10 11:41:40 +02:00
2020-11-01 13:55:23 +02:00
/*
* Internal error codes that are used to indicate that MSR emulation encountered
* an error that should result in # GP in the guest , unless userspace
* handles it .
*/
# define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */
# define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */
2020-06-22 18:04:41 -04:00
2020-07-08 00:39:55 +00:00
# define __cr4_reserved_bits(__cpu_has, __c) \
( { \
u64 __reserved_bits = CR4_RESERVED_BITS ; \
\
if ( ! __cpu_has ( __c , X86_FEATURE_XSAVE ) ) \
__reserved_bits | = X86_CR4_OSXSAVE ; \
if ( ! __cpu_has ( __c , X86_FEATURE_SMEP ) ) \
__reserved_bits | = X86_CR4_SMEP ; \
if ( ! __cpu_has ( __c , X86_FEATURE_SMAP ) ) \
__reserved_bits | = X86_CR4_SMAP ; \
if ( ! __cpu_has ( __c , X86_FEATURE_FSGSBASE ) ) \
__reserved_bits | = X86_CR4_FSGSBASE ; \
if ( ! __cpu_has ( __c , X86_FEATURE_PKU ) ) \
__reserved_bits | = X86_CR4_PKE ; \
if ( ! __cpu_has ( __c , X86_FEATURE_LA57 ) ) \
__reserved_bits | = X86_CR4_LA57 ; \
if ( ! __cpu_has ( __c , X86_FEATURE_UMIP ) ) \
__reserved_bits | = X86_CR4_UMIP ; \
2020-07-08 07:02:50 -04:00
if ( ! __cpu_has ( __c , X86_FEATURE_VMX ) ) \
__reserved_bits | = X86_CR4_VMXE ; \
2021-02-01 15:28:43 +01:00
if ( ! __cpu_has ( __c , X86_FEATURE_PCID ) ) \
__reserved_bits | = X86_CR4_PCIDE ; \
KVM: x86: Virtualize LAM for supervisor pointer
Add support to allow guests to set the new CR4 control bit for LAM and add
implementation to get untagged address for supervisor pointers.
LAM modifies the canonicality check applied to 64-bit linear addresses for
data accesses, allowing software to use of the untranslated address bits for
metadata and masks the metadata bits before using them as linear addresses
to access memory. LAM uses CR4.LAM_SUP (bit 28) to configure and enable LAM
for supervisor pointers. It also changes VMENTER to allow the bit to be set
in VMCS's HOST_CR4 and GUEST_CR4 to support virtualization. Note CR4.LAM_SUP
is allowed to be set even not in 64-bit mode, but it will not take effect
since LAM only applies to 64-bit linear addresses.
Move CR4.LAM_SUP out of CR4_RESERVED_BITS, its reservation depends on vcpu
supporting LAM or not. Leave it intercepted to prevent guest from setting
the bit if LAM is not exposed to guest as well as to avoid vmread every time
when KVM fetches its value, with the expectation that guest won't toggle the
bit frequently.
Set CR4.LAM_SUP bit in the emulated IA32_VMX_CR4_FIXED1 MSR for guests to
allow guests to enable LAM for supervisor pointers in nested VMX operation.
Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, KVM
doesn't need to emulate TLB flush based on it. There's no other features
or vmx_exec_controls connection, and no other code needed in
{kvm,vmx}_set_cr4().
Skip address untag for instruction fetches (which includes branch targets),
operand of INVLPG instructions, and implicit system accesses, all of which
are not subject to untagging. Note, get_untagged_addr() isn't invoked for
implicit system accesses as there is no reason to do so, but check the
flag anyways for documentation purposes.
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-11-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-13 20:42:21 +08:00
if ( ! __cpu_has ( __c , X86_FEATURE_LAM ) ) \
__reserved_bits | = X86_CR4_LAM_SUP ; \
2020-07-08 00:39:55 +00:00
__reserved_bits ; \
} )
2020-12-10 11:09:53 -06:00
int kvm_sev_es_mmio_write ( struct kvm_vcpu * vcpu , gpa_t src , unsigned int bytes ,
void * dst ) ;
int kvm_sev_es_mmio_read ( struct kvm_vcpu * vcpu , gpa_t src , unsigned int bytes ,
void * dst ) ;
2020-12-10 11:09:54 -06:00
int kvm_sev_es_string_io ( struct kvm_vcpu * vcpu , unsigned int size ,
unsigned int port , void * data , unsigned int count ,
int in ) ;
2020-12-10 11:09:53 -06:00
2008-07-03 14:59:22 +03:00
# endif