2019-06-04 10:11:32 +02:00
// SPDX-License-Identifier: GPL-2.0-only
2007-09-12 10:58:04 +03:00
/*
* Local APIC virtualization
*
* Copyright ( C ) 2006 Qumranet , Inc .
* Copyright ( C ) 2007 Novell
* Copyright ( C ) 2007 Intel
2010-10-06 14:23:22 +02:00
* Copyright 2009 Red Hat , Inc . and / or its affiliates .
2007-09-12 10:58:04 +03:00
*
* Authors :
* Dor Laor < dor . laor @ qumranet . com >
* Gregory Haskins < ghaskins @ novell . com >
* Yaozu ( Eddie ) Dong < eddie . dong @ intel . com >
*
* Based on Xen 3.1 code , Copyright ( c ) 2004 , Intel Corporation .
*/
2007-12-16 11:02:48 +02:00
# include <linux/kvm_host.h>
2007-09-12 10:58:04 +03:00
# include <linux/kvm.h>
# include <linux/mm.h>
# include <linux/highmem.h>
# include <linux/smp.h>
# include <linux/hrtimer.h>
# include <linux/io.h>
2016-07-13 20:19:00 -04:00
# include <linux/export.h>
2008-05-01 04:34:28 -07:00
# include <linux/math64.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2007-09-12 10:58:04 +03:00
# include <asm/processor.h>
# include <asm/msr.h>
# include <asm/page.h>
# include <asm/current.h>
# include <asm/apicdef.h>
2014-12-16 09:08:15 -05:00
# include <asm/delay.h>
2011-07-26 16:09:06 -07:00
# include <linux/atomic.h>
2012-08-05 15:58:30 +03:00
# include <linux/jump_label.h>
2008-06-27 14:58:02 -03:00
# include "kvm_cache_regs.h"
2007-09-12 10:58:04 +03:00
# include "irq.h"
2020-05-21 05:57:49 +00:00
# include "ioapic.h"
2009-06-17 09:22:14 -03:00
# include "trace.h"
2009-07-05 17:39:35 +03:00
# include "x86.h"
2011-11-23 16:30:32 +02:00
# include "cpuid.h"
2015-11-10 15:36:34 +03:00
# include "hyperv.h"
2007-09-12 10:58:04 +03:00
2009-02-10 20:41:41 -02:00
# ifndef CONFIG_X86_64
# define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
# else
# define mod_64(x, y) ((x) % (y))
# endif
2007-09-12 10:58:04 +03:00
# define PRId64 "d"
# define PRIx64 "llx"
# define PRIu64 "u"
# define PRIo64 "o"
/* 14 is the version for Xeon and Pentium 8.4.8*/
2016-05-04 14:09:40 -05:00
# define APIC_VERSION (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
2007-09-12 10:58:04 +03:00
# define LAPIC_MMIO_LENGTH (1 << 12)
/* followed define is not in apicdef.h */
# define MAX_APIC_VECTOR 256
2012-09-05 19:30:01 +09:00
# define APIC_VECTORS_PER_REG 32
2007-09-12 10:58:04 +03:00
2019-09-17 16:16:26 +08:00
static bool lapic_timer_advance_dynamic __read_mostly ;
2019-09-26 08:54:03 +08:00
# define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100 /* clock cycles */
# define LAPIC_TIMER_ADVANCE_ADJUST_MAX 10000 /* clock cycles */
# define LAPIC_TIMER_ADVANCE_NS_INIT 1000
# define LAPIC_TIMER_ADVANCE_NS_MAX 5000
KVM: LAPIC: Tune lapic_timer_advance_ns automatically
In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU
generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat
is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel,
both preemption_timer=N, Skylake server).
This patch adds the capability to automatically tune lapic_timer_advance_ns
step by step, the initial value is 1000ns as 'commit d0659d946be0 ("KVM: x86:
add option to advance tscdeadline hrtimer expiration")' recommended, it will be
reduced when it is too early, and increased when it is too late. The guest_tsc
and tsc_deadline are hard to equal, so we assume we are done when the delta
is within a small scope e.g. 100 cycles. This patch reduces latency
(kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer enabled)
from ~2600 cyles to ~1200 cyles on our Skylake server.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-09 09:02:08 +08:00
/* step-by-step approximation to mitigate fluctuation */
# define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
2012-04-11 18:49:55 +03:00
static inline int apic_test_vector ( int vec , void * bitmap )
{
return test_bit ( VEC_POS ( vec ) , ( bitmap ) + REG_POS ( vec ) ) ;
}
2013-04-11 19:21:38 +08:00
bool kvm_apic_pending_eoi ( struct kvm_vcpu * vcpu , int vector )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
return apic_test_vector ( vector , apic - > regs + APIC_ISR ) | |
apic_test_vector ( vector , apic - > regs + APIC_IRR ) ;
}
2012-06-24 19:24:26 +03:00
static inline int __apic_test_and_set_vector ( int vec , void * bitmap )
{
return __test_and_set_bit ( VEC_POS ( vec ) , ( bitmap ) + REG_POS ( vec ) ) ;
}
static inline int __apic_test_and_clear_vector ( int vec , void * bitmap )
{
return __test_and_clear_bit ( VEC_POS ( vec ) , ( bitmap ) + REG_POS ( vec ) ) ;
}
2021-01-11 23:24:35 +08:00
__read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE ( apic_hw_disabled , HZ ) ;
__read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE ( apic_sw_disabled , HZ ) ;
2012-08-05 15:58:31 +03:00
2007-09-12 10:58:04 +03:00
static inline int apic_enabled ( struct kvm_lapic * apic )
{
2012-08-05 15:58:33 +03:00
return kvm_apic_sw_enabled ( apic ) & & kvm_apic_hw_enabled ( apic ) ;
2012-08-05 15:58:32 +03:00
}
2007-09-12 10:58:04 +03:00
# define LVT_MASK \
( APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK )
# define LINT_MASK \
( LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER )
2016-12-15 18:06:46 +01:00
static inline u32 kvm_x2apic_id ( struct kvm_lapic * apic )
{
return apic - > vcpu - > vcpu_id ;
}
2020-05-05 06:45:35 -04:00
static bool kvm_can_post_timer_interrupt ( struct kvm_vcpu * vcpu )
2019-07-06 09:26:51 +08:00
{
return pi_inject_timer & & kvm_vcpu_apicv_active ( vcpu ) ;
}
2020-05-05 06:45:35 -04:00
bool kvm_can_use_hv_timer ( struct kvm_vcpu * vcpu )
{
return kvm_x86_ops . set_hv_timer
& & ! ( kvm_mwait_in_guest ( vcpu - > kvm ) | |
kvm_can_post_timer_interrupt ( vcpu ) ) ;
}
EXPORT_SYMBOL_GPL ( kvm_can_use_hv_timer ) ;
2019-07-06 09:26:51 +08:00
static bool kvm_use_posted_timer_interrupt ( struct kvm_vcpu * vcpu )
{
return kvm_can_post_timer_interrupt ( vcpu ) & & vcpu - > mode = = IN_GUEST_MODE ;
}
2016-07-12 22:09:19 +02:00
static inline bool kvm_apic_map_get_logical_dest ( struct kvm_apic_map * map ,
u32 dest_id , struct kvm_lapic * * * cluster , u16 * mask ) {
switch ( map - > mode ) {
case KVM_APIC_MODE_X2APIC : {
u32 offset = ( dest_id > > 16 ) * 16 ;
2016-07-12 22:09:20 +02:00
u32 max_apic_id = map - > max_apic_id ;
2016-07-12 22:09:19 +02:00
if ( offset < = max_apic_id ) {
u8 cluster_size = min ( max_apic_id - offset + 1 , 16U ) ;
2019-04-11 11:16:47 +02:00
offset = array_index_nospec ( offset , map - > max_apic_id + 1 ) ;
2016-07-12 22:09:19 +02:00
* cluster = & map - > phys_map [ offset ] ;
* mask = dest_id & ( 0xffff > > ( 16 - cluster_size ) ) ;
} else {
* mask = 0 ;
}
2015-02-12 19:41:34 +01:00
2016-07-12 22:09:19 +02:00
return true ;
}
case KVM_APIC_MODE_XAPIC_FLAT :
* cluster = map - > xapic_flat_map ;
* mask = dest_id & 0xff ;
return true ;
case KVM_APIC_MODE_XAPIC_CLUSTER :
2016-11-22 20:20:14 +01:00
* cluster = map - > xapic_cluster_map [ ( dest_id > > 4 ) & 0xf ] ;
2016-07-12 22:09:19 +02:00
* mask = dest_id & 0xf ;
return true ;
default :
/* Not optimized. */
return false ;
}
2015-02-12 19:41:33 +01:00
}
2016-07-12 22:09:30 +02:00
static void kvm_apic_map_free ( struct rcu_head * rcu )
2015-02-12 19:41:34 +01:00
{
2016-07-12 22:09:30 +02:00
struct kvm_apic_map * map = container_of ( rcu , struct kvm_apic_map , rcu ) ;
2015-02-12 19:41:34 +01:00
2016-07-12 22:09:30 +02:00
kvfree ( map ) ;
2015-02-12 19:41:34 +01:00
}
2020-06-22 16:37:42 +02:00
/*
* CLEAN - > DIRTY and UPDATE_IN_PROGRESS - > DIRTY changes happen without a lock .
*
* DIRTY - > UPDATE_IN_PROGRESS and UPDATE_IN_PROGRESS - > CLEAN happen with
* apic_map_lock_held .
*/
enum {
CLEAN ,
UPDATE_IN_PROGRESS ,
DIRTY
} ;
2020-02-26 10:41:02 +08:00
void kvm_recalculate_apic_map ( struct kvm * kvm )
2012-09-13 17:19:24 +03:00
{
struct kvm_apic_map * new , * old = NULL ;
struct kvm_vcpu * vcpu ;
int i ;
2016-12-15 18:06:46 +01:00
u32 max_id = 255 ; /* enough space for any xAPIC ID */
2012-09-13 17:19:24 +03:00
2020-06-22 16:37:42 +02:00
/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map. */
if ( atomic_read_acquire ( & kvm - > arch . apic_map_dirty ) = = CLEAN )
2020-02-26 10:41:02 +08:00
return ;
2012-09-13 17:19:24 +03:00
mutex_lock ( & kvm - > arch . apic_map_lock ) ;
2020-06-22 16:37:42 +02:00
/*
* Read kvm - > arch . apic_map_dirty before kvm - > arch . apic_map
* ( if clean ) or the APIC registers ( if dirty ) .
*/
if ( atomic_cmpxchg_acquire ( & kvm - > arch . apic_map_dirty ,
DIRTY , UPDATE_IN_PROGRESS ) = = CLEAN ) {
2020-02-26 10:41:02 +08:00
/* Someone else has updated the map. */
mutex_unlock ( & kvm - > arch . apic_map_lock ) ;
return ;
}
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:20 +02:00
kvm_for_each_vcpu ( i , vcpu , kvm )
if ( kvm_apic_present ( vcpu ) )
2016-12-15 18:06:46 +01:00
max_id = max ( max_id , kvm_x2apic_id ( vcpu - > arch . apic ) ) ;
2016-07-12 22:09:20 +02:00
2017-05-08 15:57:09 -07:00
new = kvzalloc ( sizeof ( struct kvm_apic_map ) +
2019-02-11 11:02:50 -08:00
sizeof ( struct kvm_lapic * ) * ( ( u64 ) max_id + 1 ) ,
GFP_KERNEL_ACCOUNT ) ;
2016-07-12 22:09:20 +02:00
2012-09-13 17:19:24 +03:00
if ( ! new )
goto out ;
2016-07-12 22:09:20 +02:00
new - > max_apic_id = max_id ;
2014-11-02 11:54:54 +02:00
kvm_for_each_vcpu ( i , vcpu , kvm ) {
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-07-12 22:09:19 +02:00
struct kvm_lapic * * cluster ;
u16 mask ;
2016-12-15 18:06:48 +01:00
u32 ldr ;
u8 xapic_id ;
u32 x2apic_id ;
2012-09-13 17:19:24 +03:00
2015-01-29 22:33:35 +01:00
if ( ! kvm_apic_present ( vcpu ) )
continue ;
2016-12-15 18:06:48 +01:00
xapic_id = kvm_xapic_id ( apic ) ;
x2apic_id = kvm_x2apic_id ( apic ) ;
/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
if ( ( apic_x2apic_mode ( apic ) | | x2apic_id > 0xff ) & &
x2apic_id < = new - > max_apic_id )
new - > phys_map [ x2apic_id ] = apic ;
/*
* . . . xAPIC ID of VCPUs with APIC ID > 0xff will wrap - around ,
* prevent them from masking VCPUs with APIC ID < = 0xff .
*/
if ( ! apic_x2apic_mode ( apic ) & & ! new - > phys_map [ xapic_id ] )
new - > phys_map [ xapic_id ] = apic ;
2015-02-12 19:41:33 +01:00
2019-08-13 23:37:37 -04:00
if ( ! kvm_apic_sw_enabled ( apic ) )
continue ;
2016-12-15 18:06:46 +01:00
ldr = kvm_lapic_get_reg ( apic , APIC_LDR ) ;
2015-02-12 19:41:34 +01:00
if ( apic_x2apic_mode ( apic ) ) {
new - > mode | = KVM_APIC_MODE_X2APIC ;
} else if ( ldr ) {
ldr = GET_APIC_LOGICAL_ID ( ldr ) ;
2016-05-04 14:09:41 -05:00
if ( kvm_lapic_get_reg ( apic , APIC_DFR ) = = APIC_DFR_FLAT )
2015-02-12 19:41:34 +01:00
new - > mode | = KVM_APIC_MODE_XAPIC_FLAT ;
else
new - > mode | = KVM_APIC_MODE_XAPIC_CLUSTER ;
}
2016-07-12 22:09:19 +02:00
if ( ! kvm_apic_map_get_logical_dest ( new , ldr , & cluster , & mask ) )
2015-02-12 19:41:33 +01:00
continue ;
2016-07-12 22:09:19 +02:00
if ( mask )
cluster [ ffs ( mask ) - 1 ] = apic ;
2012-09-13 17:19:24 +03:00
}
out :
old = rcu_dereference_protected ( kvm - > arch . apic_map ,
lockdep_is_held ( & kvm - > arch . apic_map_lock ) ) ;
rcu_assign_pointer ( kvm - > arch . apic_map , new ) ;
2020-02-26 10:41:02 +08:00
/*
2020-06-22 16:37:42 +02:00
* Write kvm - > arch . apic_map before clearing apic - > apic_map_dirty .
* If another update has come in , leave it DIRTY .
2020-02-26 10:41:02 +08:00
*/
2020-06-22 16:37:42 +02:00
atomic_cmpxchg_release ( & kvm - > arch . apic_map_dirty ,
UPDATE_IN_PROGRESS , CLEAN ) ;
2012-09-13 17:19:24 +03:00
mutex_unlock ( & kvm - > arch . apic_map_lock ) ;
if ( old )
2016-07-12 22:09:30 +02:00
call_rcu ( & old - > rcu , kvm_apic_map_free ) ;
2013-01-25 10:18:51 +08:00
2015-07-29 23:32:35 -07:00
kvm_make_scan_ioapic_request ( kvm ) ;
2012-09-13 17:19:24 +03:00
}
2014-08-19 00:03:00 +03:00
static inline void apic_set_spiv ( struct kvm_lapic * apic , u32 val )
{
2014-10-30 15:06:45 +01:00
bool enabled = val & APIC_SPIV_APIC_ENABLED ;
2014-08-19 00:03:00 +03:00
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_SPIV , val ) ;
2014-10-30 15:06:45 +01:00
if ( enabled ! = apic - > sw_enabled ) {
apic - > sw_enabled = enabled ;
2018-12-04 17:42:50 +08:00
if ( enabled )
2021-01-11 23:24:35 +08:00
static_branch_slow_dec_deferred ( & apic_sw_disabled ) ;
2018-12-04 17:42:50 +08:00
else
2021-01-11 23:24:35 +08:00
static_branch_inc ( & apic_sw_disabled . key ) ;
2019-08-13 23:37:37 -04:00
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2014-08-19 00:03:00 +03:00
}
}
2016-07-12 22:09:22 +02:00
static inline void kvm_apic_set_xapic_id ( struct kvm_lapic * apic , u8 id )
2012-09-13 17:19:24 +03:00
{
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_ID , id < < 24 ) ;
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2012-09-13 17:19:24 +03:00
}
static inline void kvm_apic_set_ldr ( struct kvm_lapic * apic , u32 id )
{
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LDR , id ) ;
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2012-09-13 17:19:24 +03:00
}
2020-08-19 16:55:26 +08:00
static inline void kvm_apic_set_dfr ( struct kvm_lapic * apic , u32 val )
{
kvm_lapic_set_reg ( apic , APIC_DFR , val ) ;
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
}
2017-11-17 11:52:49 +00:00
static inline u32 kvm_apic_calc_x2apic_ldr ( u32 id )
{
return ( ( id > > 4 ) < < 16 ) | ( 1 < < ( id & 0xf ) ) ;
}
2016-07-12 22:09:22 +02:00
static inline void kvm_apic_set_x2apic_id ( struct kvm_lapic * apic , u32 id )
2015-05-22 18:45:11 +02:00
{
2017-11-17 11:52:49 +00:00
u32 ldr = kvm_apic_calc_x2apic_ldr ( id ) ;
2015-05-22 18:45:11 +02:00
2016-12-15 18:06:46 +01:00
WARN_ON_ONCE ( id ! = apic - > vcpu - > vcpu_id ) ;
2016-07-12 22:09:22 +02:00
kvm_lapic_set_reg ( apic , APIC_ID , id ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LDR , ldr ) ;
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2015-05-22 18:45:11 +02:00
}
2007-09-12 10:58:04 +03:00
static inline int apic_lvt_enabled ( struct kvm_lapic * apic , int lvt_type )
{
2016-05-04 14:09:41 -05:00
return ! ( kvm_lapic_get_reg ( apic , lvt_type ) & APIC_LVT_MASKED ) ;
2007-09-12 10:58:04 +03:00
}
2011-09-22 16:55:52 +08:00
static inline int apic_lvtt_oneshot ( struct kvm_lapic * apic )
{
2014-10-30 15:06:47 +01:00
return apic - > lapic_timer . timer_mode = = APIC_LVT_TIMER_ONESHOT ;
2011-09-22 16:55:52 +08:00
}
2007-09-12 10:58:04 +03:00
static inline int apic_lvtt_period ( struct kvm_lapic * apic )
{
2014-10-30 15:06:47 +01:00
return apic - > lapic_timer . timer_mode = = APIC_LVT_TIMER_PERIODIC ;
2011-09-22 16:55:52 +08:00
}
static inline int apic_lvtt_tscdeadline ( struct kvm_lapic * apic )
{
2014-10-30 15:06:47 +01:00
return apic - > lapic_timer . timer_mode = = APIC_LVT_TIMER_TSCDEADLINE ;
2007-09-12 10:58:04 +03:00
}
2008-10-20 10:20:03 +02:00
static inline int apic_lvt_nmi_mode ( u32 lvt_val )
{
return ( lvt_val & ( APIC_MODE_MASK | APIC_LVT_MASKED ) ) = = APIC_DM_NMI ;
}
2009-07-05 17:39:35 +03:00
void kvm_apic_set_version ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
u32 v = APIC_VERSION ;
2016-01-08 13:48:51 +01:00
if ( ! lapic_in_kernel ( vcpu ) )
2009-07-05 17:39:35 +03:00
return ;
2018-02-09 14:01:33 +01:00
/*
* KVM emulates 82093 AA datasheet ( with in - kernel IOAPIC implementation )
* which doesn ' t have EOI register ; Some buggy OSes ( e . g . Windows with
* Hyper - V role ) disable EOI broadcast in lapic not checking for IOAPIC
* version first and level - triggered interrupts never get EOIed in
* IOAPIC .
*/
2020-07-08 14:50:53 +08:00
if ( guest_cpuid_has ( vcpu , X86_FEATURE_X2APIC ) & &
2018-02-09 14:01:33 +01:00
! ioapic_in_kernel ( vcpu - > kvm ) )
2009-07-05 17:39:35 +03:00
v | = APIC_LVR_DIRECTED_EOI ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LVR , v ) ;
2009-07-05 17:39:35 +03:00
}
2016-05-04 14:09:40 -05:00
static const unsigned int apic_lvt_mask [ KVM_APIC_LVT_NUM ] = {
2011-09-22 16:55:52 +08:00
LVT_MASK , /* part LVTT mask, timer mode mask added at runtime */
2007-09-12 10:58:04 +03:00
LVT_MASK | APIC_MODE_MASK , /* LVTTHMR */
LVT_MASK | APIC_MODE_MASK , /* LVTPC */
LINT_MASK , LINT_MASK , /* LVT0-1 */
LVT_MASK /* LVTERR */
} ;
static int find_highest_vector ( void * bitmap )
{
2012-09-05 19:30:01 +09:00
int vec ;
u32 * reg ;
2007-09-12 10:58:04 +03:00
2012-09-05 19:30:01 +09:00
for ( vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG ;
vec > = 0 ; vec - = APIC_VECTORS_PER_REG ) {
reg = bitmap + REG_POS ( vec ) ;
if ( * reg )
2016-12-19 13:05:46 +01:00
return __fls ( * reg ) + vec ;
2012-09-05 19:30:01 +09:00
}
2007-09-12 10:58:04 +03:00
2012-09-05 19:30:01 +09:00
return - 1 ;
2007-09-12 10:58:04 +03:00
}
2012-06-24 19:24:26 +03:00
static u8 count_vectors ( void * bitmap )
{
2012-09-05 19:30:01 +09:00
int vec ;
u32 * reg ;
2012-06-24 19:24:26 +03:00
u8 count = 0 ;
2012-09-05 19:30:01 +09:00
for ( vec = 0 ; vec < MAX_APIC_VECTOR ; vec + = APIC_VECTORS_PER_REG ) {
reg = bitmap + REG_POS ( vec ) ;
count + = hweight32 ( * reg ) ;
}
2012-06-24 19:24:26 +03:00
return count ;
}
2017-12-24 18:12:54 +02:00
bool __kvm_apic_update_irr ( u32 * pir , void * regs , int * max_irr )
2013-04-11 19:25:15 +08:00
{
2016-12-19 13:05:46 +01:00
u32 i , vec ;
2017-12-24 18:12:54 +02:00
u32 pir_val , irr_val , prev_irr_val ;
int max_updated_irr ;
max_updated_irr = - 1 ;
* max_irr = - 1 ;
2013-04-11 19:25:15 +08:00
2016-12-19 13:05:46 +01:00
for ( i = vec = 0 ; i < = 7 ; i + + , vec + = 32 ) {
2016-09-20 16:15:05 +02:00
pir_val = READ_ONCE ( pir [ i ] ) ;
2016-12-19 13:05:46 +01:00
irr_val = * ( ( u32 * ) ( regs + APIC_IRR + i * 0x10 ) ) ;
2016-09-20 16:15:05 +02:00
if ( pir_val ) {
2017-12-24 18:12:54 +02:00
prev_irr_val = irr_val ;
2016-12-19 13:05:46 +01:00
irr_val | = xchg ( & pir [ i ] , 0 ) ;
* ( ( u32 * ) ( regs + APIC_IRR + i * 0x10 ) ) = irr_val ;
2017-12-24 18:12:54 +02:00
if ( prev_irr_val ! = irr_val ) {
max_updated_irr =
__fls ( irr_val ^ prev_irr_val ) + vec ;
}
2016-09-20 16:15:05 +02:00
}
2016-12-19 13:05:46 +01:00
if ( irr_val )
2017-12-24 18:12:54 +02:00
* max_irr = __fls ( irr_val ) + vec ;
2013-04-11 19:25:15 +08:00
}
2016-12-19 13:05:46 +01:00
2017-12-24 18:12:54 +02:00
return ( ( max_updated_irr ! = - 1 ) & &
( max_updated_irr = = * max_irr ) ) ;
2013-04-11 19:25:15 +08:00
}
2015-02-03 23:58:17 +08:00
EXPORT_SYMBOL_GPL ( __kvm_apic_update_irr ) ;
2017-12-24 18:12:54 +02:00
bool kvm_apic_update_irr ( struct kvm_vcpu * vcpu , u32 * pir , int * max_irr )
2015-02-03 23:58:17 +08:00
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2017-12-24 18:12:54 +02:00
return __kvm_apic_update_irr ( pir , apic - > regs , max_irr ) ;
2015-02-03 23:58:17 +08:00
}
2013-04-11 19:25:15 +08:00
EXPORT_SYMBOL_GPL ( kvm_apic_update_irr ) ;
2009-06-11 11:06:51 +03:00
static inline int apic_search_irr ( struct kvm_lapic * apic )
2007-09-12 10:58:04 +03:00
{
2009-06-11 11:06:51 +03:00
return find_highest_vector ( apic - > regs + APIC_IRR ) ;
2007-09-12 10:58:04 +03:00
}
static inline int apic_find_highest_irr ( struct kvm_lapic * apic )
{
int result ;
2013-01-25 10:18:51 +08:00
/*
* Note that irr_pending is just a hint . It will be always
* true with virtual interrupt delivery enabled .
*/
2009-06-11 11:06:51 +03:00
if ( ! apic - > irr_pending )
return - 1 ;
result = apic_search_irr ( apic ) ;
2007-09-12 10:58:04 +03:00
ASSERT ( result = = - 1 | | result > = 16 ) ;
return result ;
}
2009-06-11 11:06:51 +03:00
static inline void apic_clear_irr ( int vec , struct kvm_lapic * apic )
{
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
struct kvm_vcpu * vcpu ;
vcpu = apic - > vcpu ;
2015-11-10 15:36:33 +03:00
if ( unlikely ( vcpu - > arch . apicv_active ) ) {
kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.
Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.
However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.
Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.
Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.
This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 13:57:33 +01:00
/* need to update RVI */
2019-03-31 19:17:22 -07:00
kvm_lapic_clear_vector ( vec , apic - > regs + APIC_IRR ) ;
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_hwapic_irr_update ) ( vcpu ,
kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.
Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.
However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.
Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.
Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.
This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 13:57:33 +01:00
apic_find_highest_irr ( apic ) ) ;
KVM: x86: Fix lost interrupt on irr_pending race
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set. If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.
commit 56cc2406d68c ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:
apic_set_irr apic_clear_irr
------------ --------------
apic->irr_pending = true;
apic_clear_vector(...);
vec = apic_search_irr(apic);
// => vec == -1
apic_set_vector(...);
apic->irr_pending = (vec != -1);
// => apic->irr_pending == false
Nonetheless, it appears the race might even occur prior to this commit:
apic_set_irr apic_clear_irr
------------ --------------
apic->irr_pending = true;
apic->irr_pending = false;
apic_clear_vector(...);
if (apic_search_irr(apic) != -1)
apic->irr_pending = true;
// => apic->irr_pending == false
apic_set_vector(...);
Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-16 23:49:07 +02:00
} else {
apic - > irr_pending = false ;
2019-03-31 19:17:22 -07:00
kvm_lapic_clear_vector ( vec , apic - > regs + APIC_IRR ) ;
KVM: x86: Fix lost interrupt on irr_pending race
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set. If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.
commit 56cc2406d68c ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:
apic_set_irr apic_clear_irr
------------ --------------
apic->irr_pending = true;
apic_clear_vector(...);
vec = apic_search_irr(apic);
// => vec == -1
apic_set_vector(...);
apic->irr_pending = (vec != -1);
// => apic->irr_pending == false
Nonetheless, it appears the race might even occur prior to this commit:
apic_set_irr apic_clear_irr
------------ --------------
apic->irr_pending = true;
apic->irr_pending = false;
apic_clear_vector(...);
if (apic_search_irr(apic) != -1)
apic->irr_pending = true;
// => apic->irr_pending == false
apic_set_vector(...);
Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-16 23:49:07 +02:00
if ( apic_search_irr ( apic ) ! = - 1 )
apic - > irr_pending = true ;
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
}
2009-06-11 11:06:51 +03:00
}
KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PI
On successful nested VM-Enter, check for pending interrupts and convert
the highest priority interrupt to a pending posted interrupt if it
matches L2's notification vector. If the vCPU receives a notification
interrupt before nested VM-Enter (assuming L1 disables IRQs before doing
VM-Enter), the pending interrupt (for L1) should be recognized and
processed as a posted interrupt when interrupts become unblocked after
VM-Enter to L2.
This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is
trying to inject an interrupt into L2 by setting the appropriate bit in
L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's
method of manually moving the vector from PIR->vIRR/RVI). KVM will
observe the IPI while the vCPU is in L1 context and so won't immediately
morph it to a posted interrupt for L2. The pending interrupt will be
seen by vmx_check_nested_events(), cause KVM to force an immediate exit
after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit.
After handling the VM-Exit, L1 will see that L2 has a pending interrupt
in PIR, send another IPI, and repeat until L2 is killed.
Note, posted interrupts require virtual interrupt deliveriy, and virtual
interrupt delivery requires exit-on-interrupt, ergo interrupts will be
unconditionally unmasked on VM-Enter if posted interrupts are enabled.
Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200812175129.12172-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-12 10:51:29 -07:00
void kvm_apic_clear_irr ( struct kvm_vcpu * vcpu , int vec )
{
apic_clear_irr ( vec , vcpu - > arch . apic ) ;
}
EXPORT_SYMBOL_GPL ( kvm_apic_clear_irr ) ;
2012-06-24 19:24:26 +03:00
static inline void apic_set_isr ( int vec , struct kvm_lapic * apic )
{
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
struct kvm_vcpu * vcpu ;
if ( __apic_test_and_set_vector ( vec , apic - > regs + APIC_ISR ) )
return ;
vcpu = apic - > vcpu ;
2014-05-14 17:40:58 +02:00
2012-06-24 19:24:26 +03:00
/*
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
* With APIC virtualization enabled , all caching is disabled
* because the processor can modify ISR under the hood . Instead
* just set SVI .
2012-06-24 19:24:26 +03:00
*/
2015-11-10 15:36:33 +03:00
if ( unlikely ( vcpu - > arch . apicv_active ) )
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_hwapic_isr_update ) ( vcpu , vec ) ;
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
else {
+ + apic - > isr_count ;
BUG_ON ( apic - > isr_count > MAX_APIC_VECTOR ) ;
/*
* ISR ( in service register ) bit is set when injecting an interrupt .
* The highest vector is injected . Thus the latest bit set matches
* the highest bit in ISR .
*/
apic - > highest_isr_cache = vec ;
}
2012-06-24 19:24:26 +03:00
}
2014-05-14 17:40:58 +02:00
static inline int apic_find_highest_isr ( struct kvm_lapic * apic )
{
int result ;
/*
* Note that isr_count is always 1 , and highest_isr_cache
* is always - 1 , with APIC virtualization enabled .
*/
if ( ! apic - > isr_count )
return - 1 ;
if ( likely ( apic - > highest_isr_cache ! = - 1 ) )
return apic - > highest_isr_cache ;
result = find_highest_vector ( apic - > regs + APIC_ISR ) ;
ASSERT ( result = = - 1 | | result > = 16 ) ;
return result ;
}
2012-06-24 19:24:26 +03:00
static inline void apic_clear_isr ( int vec , struct kvm_lapic * apic )
{
2014-05-14 17:40:58 +02:00
struct kvm_vcpu * vcpu ;
if ( ! __apic_test_and_clear_vector ( vec , apic - > regs + APIC_ISR ) )
return ;
vcpu = apic - > vcpu ;
/*
* We do get here for APIC virtualization enabled if the guest
* uses the Hyper - V APIC enlightenment . In this case we may need
* to trigger a new interrupt delivery by writing the SVI field ;
* on the other hand isr_count and highest_isr_cache are unused
* and must be left alone .
*/
2015-11-10 15:36:33 +03:00
if ( unlikely ( vcpu - > arch . apicv_active ) )
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_hwapic_isr_update ) ( vcpu ,
apic_find_highest_isr ( apic ) ) ;
2014-05-14 17:40:58 +02:00
else {
2012-06-24 19:24:26 +03:00
- - apic - > isr_count ;
2014-05-14 17:40:58 +02:00
BUG_ON ( apic - > isr_count < 0 ) ;
apic - > highest_isr_cache = - 1 ;
}
2012-06-24 19:24:26 +03:00
}
2007-09-12 18:03:11 +08:00
int kvm_lapic_find_highest_irr ( struct kvm_vcpu * vcpu )
{
2009-06-11 11:06:51 +03:00
/* This may race with setting of irr in __apic_accept_irq() and
* value returned may be wrong , but kvm_vcpu_kick ( ) in __apic_accept_irq
* will cause vmexit immediately and the value will be recalculated
* on the next vmentry .
*/
2016-01-08 13:42:24 +01:00
return apic_find_highest_irr ( vcpu - > arch . apic ) ;
2007-09-12 18:03:11 +08:00
}
2016-12-19 17:17:11 +01:00
EXPORT_SYMBOL_GPL ( kvm_lapic_find_highest_irr ) ;
2007-09-12 18:03:11 +08:00
2009-03-05 16:34:44 +02:00
static int __apic_accept_irq ( struct kvm_lapic * apic , int delivery_mode ,
2013-04-11 19:21:37 +08:00
int vector , int level , int trig_mode ,
2016-02-29 16:04:43 +01:00
struct dest_map * dest_map ) ;
2009-03-05 16:34:44 +02:00
2013-04-11 19:21:37 +08:00
int kvm_apic_set_irq ( struct kvm_vcpu * vcpu , struct kvm_lapic_irq * irq ,
2016-02-29 16:04:43 +01:00
struct dest_map * dest_map )
2007-09-12 10:58:04 +03:00
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-12-02 22:35:57 +08:00
2009-03-05 16:35:04 +02:00
return __apic_accept_irq ( apic , irq - > delivery_mode , irq - > vector ,
2013-04-11 19:21:37 +08:00
irq - > level , irq - > trig_mode , dest_map ) ;
2007-09-12 10:58:04 +03:00
}
2019-11-09 17:46:49 +08:00
static int __pv_send_ipi ( unsigned long * ipi_bitmap , struct kvm_apic_map * map ,
struct kvm_lapic_irq * irq , u32 min )
{
int i , count = 0 ;
struct kvm_vcpu * vcpu ;
if ( min > map - > max_apic_id )
return 0 ;
for_each_set_bit ( i , ipi_bitmap ,
min ( ( u32 ) BITS_PER_LONG , ( map - > max_apic_id - min + 1 ) ) ) {
if ( map - > phys_map [ min + i ] ) {
vcpu = map - > phys_map [ min + i ] - > vcpu ;
count + = kvm_apic_set_irq ( vcpu , irq , NULL ) ;
}
}
return count ;
}
KVM: X86: Implement "send IPI" hypercall
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
x2apic cluster mode, vanilla
Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns
x2apic cluster mode, pv-ipi
Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns
x2apic physical mode, vanilla
Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns
x2apic physical mode, pv-ipi
Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 14:39:54 +08:00
int kvm_pv_send_ipi ( struct kvm * kvm , unsigned long ipi_bitmap_low ,
2018-08-30 10:03:30 +08:00
unsigned long ipi_bitmap_high , u32 min ,
KVM: X86: Implement "send IPI" hypercall
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
x2apic cluster mode, vanilla
Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns
x2apic cluster mode, pv-ipi
Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns
x2apic physical mode, vanilla
Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns
x2apic physical mode, pv-ipi
Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 14:39:54 +08:00
unsigned long icr , int op_64_bit )
{
struct kvm_apic_map * map ;
struct kvm_lapic_irq irq = { 0 } ;
int cluster_size = op_64_bit ? 64 : 32 ;
2019-11-09 17:46:49 +08:00
int count ;
if ( icr & ( APIC_DEST_MASK | APIC_SHORT_MASK ) )
return - KVM_EINVAL ;
KVM: X86: Implement "send IPI" hypercall
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
x2apic cluster mode, vanilla
Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns
x2apic cluster mode, pv-ipi
Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns
x2apic physical mode, vanilla
Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns
x2apic physical mode, pv-ipi
Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 14:39:54 +08:00
irq . vector = icr & APIC_VECTOR_MASK ;
irq . delivery_mode = icr & APIC_MODE_MASK ;
irq . level = ( icr & APIC_INT_ASSERT ) ! = 0 ;
irq . trig_mode = icr & APIC_INT_LEVELTRIG ;
rcu_read_lock ( ) ;
map = rcu_dereference ( kvm - > arch . apic_map ) ;
2019-11-09 17:46:49 +08:00
count = - EOPNOTSUPP ;
if ( likely ( map ) ) {
count = __pv_send_ipi ( & ipi_bitmap_low , map , & irq , min ) ;
min + = cluster_size ;
count + = __pv_send_ipi ( & ipi_bitmap_high , map , & irq , min ) ;
KVM: X86: Implement "send IPI" hypercall
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
x2apic cluster mode, vanilla
Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns
x2apic cluster mode, pv-ipi
Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns
x2apic physical mode, vanilla
Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns
x2apic physical mode, pv-ipi
Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 14:39:54 +08:00
}
rcu_read_unlock ( ) ;
return count ;
}
2012-06-24 19:25:07 +03:00
static int pv_eoi_put_user ( struct kvm_vcpu * vcpu , u8 val )
{
2017-05-02 16:20:18 +02:00
return kvm_write_guest_cached ( vcpu - > kvm , & vcpu - > arch . pv_eoi . data , & val ,
sizeof ( val ) ) ;
2012-06-24 19:25:07 +03:00
}
static int pv_eoi_get_user ( struct kvm_vcpu * vcpu , u8 * val )
{
2017-05-02 16:20:18 +02:00
return kvm_read_guest_cached ( vcpu - > kvm , & vcpu - > arch . pv_eoi . data , val ,
sizeof ( * val ) ) ;
2012-06-24 19:25:07 +03:00
}
static inline bool pv_eoi_enabled ( struct kvm_vcpu * vcpu )
{
return vcpu - > arch . pv_eoi . msr_val & KVM_MSR_ENABLED ;
}
static bool pv_eoi_get_pending ( struct kvm_vcpu * vcpu )
{
u8 val ;
2020-02-21 22:04:46 +08:00
if ( pv_eoi_get_user ( vcpu , & val ) < 0 ) {
2019-07-06 01:08:48 +08:00
printk ( KERN_WARNING " Can't read EOI MSR value: 0x%llx \n " ,
2014-01-02 17:14:11 +08:00
( unsigned long long ) vcpu - > arch . pv_eoi . msr_val ) ;
2020-02-21 22:04:46 +08:00
return false ;
}
2020-12-18 15:51:37 +08:00
return val & KVM_PV_EOI_ENABLED ;
2012-06-24 19:25:07 +03:00
}
static void pv_eoi_set_pending ( struct kvm_vcpu * vcpu )
{
if ( pv_eoi_put_user ( vcpu , KVM_PV_EOI_ENABLED ) < 0 ) {
2019-07-06 01:08:48 +08:00
printk ( KERN_WARNING " Can't set EOI MSR value: 0x%llx \n " ,
2014-01-02 17:14:11 +08:00
( unsigned long long ) vcpu - > arch . pv_eoi . msr_val ) ;
2012-06-24 19:25:07 +03:00
return ;
}
__set_bit ( KVM_APIC_PV_EOI_PENDING , & vcpu - > arch . apic_attention ) ;
}
static void pv_eoi_clr_pending ( struct kvm_vcpu * vcpu )
{
if ( pv_eoi_put_user ( vcpu , KVM_PV_EOI_DISABLED ) < 0 ) {
2019-07-06 01:08:48 +08:00
printk ( KERN_WARNING " Can't clear EOI MSR value: 0x%llx \n " ,
2014-01-02 17:14:11 +08:00
( unsigned long long ) vcpu - > arch . pv_eoi . msr_val ) ;
2012-06-24 19:25:07 +03:00
return ;
}
__clear_bit ( KVM_APIC_PV_EOI_PENDING , & vcpu - > arch . apic_attention ) ;
}
2016-12-18 21:47:54 +01:00
static int apic_has_interrupt_for_ppr ( struct kvm_lapic * apic , u32 ppr )
{
2016-12-19 13:29:03 +01:00
int highest_irr ;
2017-12-24 18:12:53 +02:00
if ( apic - > vcpu - > arch . apicv_active )
2021-01-14 22:27:56 -05:00
highest_irr = static_call ( kvm_x86_sync_pir_to_irr ) ( apic - > vcpu ) ;
2016-12-19 17:17:11 +01:00
else
highest_irr = apic_find_highest_irr ( apic ) ;
2016-12-18 21:47:54 +01:00
if ( highest_irr = = - 1 | | ( highest_irr & 0xF0 ) < = ppr )
return - 1 ;
return highest_irr ;
}
static bool __apic_update_ppr ( struct kvm_lapic * apic , u32 * new_ppr )
2007-09-12 10:58:04 +03:00
{
2010-07-27 12:30:24 +03:00
u32 tpr , isrv , ppr , old_ppr ;
2007-09-12 10:58:04 +03:00
int isr ;
2016-05-04 14:09:41 -05:00
old_ppr = kvm_lapic_get_reg ( apic , APIC_PROCPRI ) ;
tpr = kvm_lapic_get_reg ( apic , APIC_TASKPRI ) ;
2007-09-12 10:58:04 +03:00
isr = apic_find_highest_isr ( apic ) ;
isrv = ( isr ! = - 1 ) ? isr : 0 ;
if ( ( tpr & 0xf0 ) > = ( isrv & 0xf0 ) )
ppr = tpr & 0xff ;
else
ppr = isrv & 0xf0 ;
2016-12-18 21:47:54 +01:00
* new_ppr = ppr ;
if ( old_ppr ! = ppr )
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_PROCPRI , ppr ) ;
2016-12-18 21:47:54 +01:00
return ppr < old_ppr ;
}
static void apic_update_ppr ( struct kvm_lapic * apic )
{
u32 ppr ;
2016-12-18 13:54:58 +01:00
if ( __apic_update_ppr ( apic , & ppr ) & &
apic_has_interrupt_for_ppr ( apic , ppr ) ! = - 1 )
2016-12-18 21:47:54 +01:00
kvm_make_request ( KVM_REQ_EVENT , apic - > vcpu ) ;
2007-09-12 10:58:04 +03:00
}
2016-12-18 14:02:21 +01:00
void kvm_apic_update_ppr ( struct kvm_vcpu * vcpu )
{
apic_update_ppr ( vcpu - > arch . apic ) ;
}
EXPORT_SYMBOL_GPL ( kvm_apic_update_ppr ) ;
2007-09-12 10:58:04 +03:00
static void apic_set_tpr ( struct kvm_lapic * apic , u32 tpr )
{
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_TASKPRI , tpr ) ;
2007-09-12 10:58:04 +03:00
apic_update_ppr ( apic ) ;
}
2015-02-12 19:41:31 +01:00
static bool kvm_apic_broadcast ( struct kvm_lapic * apic , u32 mda )
2014-10-03 00:30:52 +03:00
{
2016-12-15 18:06:47 +01:00
return mda = = ( apic_x2apic_mode ( apic ) ?
X2APIC_BROADCAST : APIC_BROADCAST ) ;
2014-10-03 00:30:52 +03:00
}
2015-02-12 19:41:31 +01:00
static bool kvm_apic_match_physical_addr ( struct kvm_lapic * apic , u32 mda )
2007-09-12 10:58:04 +03:00
{
2015-02-12 19:41:31 +01:00
if ( kvm_apic_broadcast ( apic , mda ) )
return true ;
if ( apic_x2apic_mode ( apic ) )
2016-12-15 18:06:46 +01:00
return mda = = kvm_x2apic_id ( apic ) ;
2015-02-12 19:41:31 +01:00
2016-12-15 18:06:48 +01:00
/*
* Hotplug hack : Make LAPIC in xAPIC mode also accept interrupts as if
* it were in x2APIC mode . Hotplugged VCPUs start in xAPIC mode and
* this allows unique addressing of VCPUs with APIC ID over 0xff .
* The 0xff condition is needed because writeable xAPIC ID .
*/
if ( kvm_x2apic_id ( apic ) > 0xff & & mda = = kvm_x2apic_id ( apic ) )
return true ;
2016-12-15 18:06:47 +01:00
return mda = = kvm_xapic_id ( apic ) ;
2007-09-12 10:58:04 +03:00
}
2015-01-29 22:48:48 +01:00
static bool kvm_apic_match_logical_addr ( struct kvm_lapic * apic , u32 mda )
2007-09-12 10:58:04 +03:00
{
2009-07-05 17:39:36 +03:00
u32 logical_id ;
2014-10-03 00:30:52 +03:00
if ( kvm_apic_broadcast ( apic , mda ) )
2015-01-29 22:48:49 +01:00
return true ;
2014-10-03 00:30:52 +03:00
2016-05-04 14:09:41 -05:00
logical_id = kvm_lapic_get_reg ( apic , APIC_LDR ) ;
2007-09-12 10:58:04 +03:00
2015-01-29 22:48:49 +01:00
if ( apic_x2apic_mode ( apic ) )
2015-01-29 22:48:51 +01:00
return ( ( logical_id > > 16 ) = = ( mda > > 16 ) )
& & ( logical_id & mda & 0xffff ) ! = 0 ;
2007-09-12 10:58:04 +03:00
2015-01-29 22:48:49 +01:00
logical_id = GET_APIC_LOGICAL_ID ( logical_id ) ;
2007-09-12 10:58:04 +03:00
2016-05-04 14:09:41 -05:00
switch ( kvm_lapic_get_reg ( apic , APIC_DFR ) ) {
2007-09-12 10:58:04 +03:00
case APIC_DFR_FLAT :
2015-01-29 22:48:49 +01:00
return ( logical_id & mda ) ! = 0 ;
2007-09-12 10:58:04 +03:00
case APIC_DFR_CLUSTER :
2015-01-29 22:48:49 +01:00
return ( ( logical_id > > 4 ) = = ( mda > > 4 ) )
& & ( logical_id & mda & 0xf ) ! = 0 ;
2007-09-12 10:58:04 +03:00
default :
2015-01-29 22:48:49 +01:00
return false ;
2007-09-12 10:58:04 +03:00
}
}
2016-07-12 22:09:28 +02:00
/* The KVM local APIC implementation has two quirks:
*
2016-12-15 18:06:47 +01:00
* - Real hardware delivers interrupts destined to x2APIC ID > 0xff to LAPICs
* in xAPIC mode if the " destination & 0xff " matches its xAPIC ID .
* KVM doesn ' t do that aliasing .
2016-07-12 22:09:28 +02:00
*
* - in - kernel IOAPIC messages have to be delivered directly to
* x2APIC , because the kernel does not support interrupt remapping .
* In order to support broadcast without interrupt remapping , x2APIC
* rewrites the destination of non - IPI messages from APIC_BROADCAST
* to X2APIC_BROADCAST .
*
* The broadcast quirk can be disabled with KVM_CAP_X2APIC_API . This is
* important when userspace wants to use x2APIC - format MSIs , because
* APIC_BROADCAST ( 0xff ) is a legal route for " cluster 0, CPUs 0-7 " .
2015-02-12 19:41:31 +01:00
*/
2016-07-12 22:09:28 +02:00
static u32 kvm_apic_mda ( struct kvm_vcpu * vcpu , unsigned int dest_id ,
struct kvm_lapic * source , struct kvm_lapic * target )
2015-02-12 19:41:31 +01:00
{
bool ipi = source ! = NULL ;
2016-07-12 22:09:28 +02:00
if ( ! vcpu - > kvm - > arch . x2apic_broadcast_quirk_disabled & &
2016-12-15 18:06:47 +01:00
! ipi & & dest_id = = APIC_BROADCAST & & apic_x2apic_mode ( target ) )
2015-02-12 19:41:31 +01:00
return X2APIC_BROADCAST ;
2016-12-15 18:06:47 +01:00
return dest_id ;
2015-02-12 19:41:31 +01:00
}
2015-01-29 22:48:48 +01:00
bool kvm_apic_match_dest ( struct kvm_vcpu * vcpu , struct kvm_lapic * source ,
2019-12-04 20:07:20 +01:00
int shorthand , unsigned int dest , int dest_mode )
2007-09-12 10:58:04 +03:00
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * target = vcpu - > arch . apic ;
2016-07-12 22:09:28 +02:00
u32 mda = kvm_apic_mda ( vcpu , dest , source , target ) ;
2007-09-12 10:58:04 +03:00
2010-06-14 11:42:15 -10:00
ASSERT ( target ) ;
2019-12-04 20:07:20 +01:00
switch ( shorthand ) {
2007-09-12 10:58:04 +03:00
case APIC_DEST_NOSHORT :
2015-01-29 22:48:50 +01:00
if ( dest_mode = = APIC_DEST_PHYSICAL )
2015-02-12 19:41:31 +01:00
return kvm_apic_match_physical_addr ( target , mda ) ;
2009-03-05 16:34:54 +02:00
else
2015-02-12 19:41:31 +01:00
return kvm_apic_match_logical_addr ( target , mda ) ;
2007-09-12 10:58:04 +03:00
case APIC_DEST_SELF :
2015-01-29 22:48:49 +01:00
return target = = source ;
2007-09-12 10:58:04 +03:00
case APIC_DEST_ALLINC :
2015-01-29 22:48:49 +01:00
return true ;
2007-09-12 10:58:04 +03:00
case APIC_DEST_ALLBUT :
2015-01-29 22:48:49 +01:00
return target ! = source ;
2007-09-12 10:58:04 +03:00
default :
2015-01-29 22:48:49 +01:00
return false ;
2007-09-12 10:58:04 +03:00
}
}
2016-05-04 14:09:40 -05:00
EXPORT_SYMBOL_GPL ( kvm_apic_match_dest ) ;
2007-09-12 10:58:04 +03:00
2016-01-25 16:53:33 +08:00
int kvm_vector_to_index ( u32 vector , u32 dest_vcpus ,
const unsigned long * bitmap , u32 bitmap_size )
{
u32 mod ;
int i , idx = - 1 ;
mod = vector % dest_vcpus ;
for ( i = 0 ; i < = mod ; i + + ) {
idx = find_next_bit ( bitmap , bitmap_size , idx + 1 ) ;
BUG_ON ( idx = = bitmap_size ) ;
}
return idx ;
}
2016-02-12 15:00:15 +01:00
static void kvm_apic_disabled_lapic_found ( struct kvm * kvm )
{
if ( ! kvm - > arch . disabled_lapic_found ) {
kvm - > arch . disabled_lapic_found = true ;
printk ( KERN_INFO
" Disabled LAPIC found during irq injection \n " ) ;
}
}
2016-07-12 22:09:28 +02:00
static bool kvm_apic_is_broadcast_dest ( struct kvm * kvm , struct kvm_lapic * * src ,
struct kvm_lapic_irq * irq , struct kvm_apic_map * map )
2012-09-13 17:19:24 +03:00
{
2016-07-12 22:09:28 +02:00
if ( kvm - > arch . x2apic_broadcast_quirk_disabled ) {
if ( ( irq - > dest_id = = APIC_BROADCAST & &
map - > mode ! = KVM_APIC_MODE_X2APIC ) )
return true ;
if ( irq - > dest_id = = X2APIC_BROADCAST )
return true ;
} else {
bool x2apic_ipi = src & & * src & & apic_x2apic_mode ( * src ) ;
if ( irq - > dest_id = = ( x2apic_ipi ?
X2APIC_BROADCAST : APIC_BROADCAST ) )
return true ;
}
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:28 +02:00
return false ;
}
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:18 +02:00
/* Return true if the interrupt can be handled by using *bitmap as index mask
* for valid destinations in * dst array .
* Return false if kvm_apic_map_get_dest_lapic did nothing useful .
* Note : we may have zero kvm_lapic destinations when we return true , which
* means that the interrupt should be dropped . In this case , * bitmap would be
* zero and * dst undefined .
*/
static inline bool kvm_apic_map_get_dest_lapic ( struct kvm * kvm ,
struct kvm_lapic * * src , struct kvm_lapic_irq * irq ,
struct kvm_apic_map * map , struct kvm_lapic * * * dst ,
unsigned long * bitmap )
{
int i , lowest ;
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:18 +02:00
if ( irq - > shorthand = = APIC_DEST_SELF & & src ) {
* dst = src ;
* bitmap = 1 ;
return true ;
} else if ( irq - > shorthand )
2012-09-13 17:19:24 +03:00
return false ;
2016-07-12 22:09:28 +02:00
if ( ! map | | kvm_apic_is_broadcast_dest ( kvm , src , irq , map ) )
2015-02-12 19:41:32 +01:00
return false ;
2016-07-12 22:09:18 +02:00
if ( irq - > dest_mode = = APIC_DEST_PHYSICAL ) {
2016-07-12 22:09:20 +02:00
if ( irq - > dest_id > map - > max_apic_id ) {
2016-07-12 22:09:18 +02:00
* bitmap = 0 ;
} else {
2019-04-11 11:16:47 +02:00
u32 dest_id = array_index_nospec ( irq - > dest_id , map - > max_apic_id + 1 ) ;
* dst = & map - > phys_map [ dest_id ] ;
2016-07-12 22:09:18 +02:00
* bitmap = 1 ;
}
2012-09-13 17:19:24 +03:00
return true ;
2015-04-13 15:40:02 +02:00
}
2014-11-27 20:03:14 +01:00
2016-07-12 22:09:19 +02:00
* bitmap = 0 ;
if ( ! kvm_apic_map_get_logical_dest ( map , irq - > dest_id , dst ,
( u16 * ) bitmap ) )
2012-09-13 17:19:24 +03:00
return false ;
2014-11-27 20:03:12 +01:00
2016-07-12 22:09:18 +02:00
if ( ! kvm_lowest_prio_delivery ( irq ) )
return true ;
2015-02-12 19:41:33 +01:00
2016-07-12 22:09:18 +02:00
if ( ! kvm_vector_hashing_enabled ( ) ) {
lowest = - 1 ;
for_each_set_bit ( i , bitmap , 16 ) {
if ( ! ( * dst ) [ i ] )
continue ;
if ( lowest < 0 )
lowest = i ;
else if ( kvm_apic_compare_prio ( ( * dst ) [ i ] - > vcpu ,
( * dst ) [ lowest ] - > vcpu ) < 0 )
lowest = i ;
2015-02-12 19:41:33 +01:00
}
2016-07-12 22:09:18 +02:00
} else {
if ( ! * bitmap )
return true ;
2015-02-12 19:41:33 +01:00
2016-07-12 22:09:18 +02:00
lowest = kvm_vector_to_index ( irq - > vector , hweight16 ( * bitmap ) ,
bitmap , 16 ) ;
2014-11-27 20:03:13 +01:00
2016-07-12 22:09:18 +02:00
if ( ! ( * dst ) [ lowest ] ) {
kvm_apic_disabled_lapic_found ( kvm ) ;
* bitmap = 0 ;
return true ;
}
}
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:18 +02:00
* bitmap = ( lowest > = 0 ) ? 1 < < lowest : 0 ;
2012-09-13 17:19:24 +03:00
2016-07-12 22:09:18 +02:00
return true ;
}
2016-01-25 16:53:33 +08:00
2016-07-12 22:09:18 +02:00
bool kvm_irq_delivery_to_apic_fast ( struct kvm * kvm , struct kvm_lapic * src ,
struct kvm_lapic_irq * irq , int * r , struct dest_map * dest_map )
{
struct kvm_apic_map * map ;
unsigned long bitmap ;
struct kvm_lapic * * dst = NULL ;
int i ;
bool ret ;
2016-01-25 16:53:33 +08:00
2016-07-12 22:09:18 +02:00
* r = - 1 ;
2016-01-25 16:53:33 +08:00
2016-07-12 22:09:18 +02:00
if ( irq - > shorthand = = APIC_DEST_SELF ) {
* r = kvm_apic_set_irq ( src - > vcpu , irq , dest_map ) ;
return true ;
}
2016-01-25 16:53:33 +08:00
2016-07-12 22:09:18 +02:00
rcu_read_lock ( ) ;
map = rcu_dereference ( kvm - > arch . apic_map ) ;
2016-01-25 16:53:33 +08:00
2016-07-12 22:09:18 +02:00
ret = kvm_apic_map_get_dest_lapic ( kvm , & src , irq , map , & dst , & bitmap ) ;
2018-10-01 16:07:18 +02:00
if ( ret ) {
* r = 0 ;
2016-07-12 22:09:18 +02:00
for_each_set_bit ( i , & bitmap , 16 ) {
if ( ! dst [ i ] )
continue ;
* r + = kvm_apic_set_irq ( dst [ i ] - > vcpu , irq , dest_map ) ;
2012-09-13 17:19:24 +03:00
}
2018-10-01 16:07:18 +02:00
}
2012-09-13 17:19:24 +03:00
rcu_read_unlock ( ) ;
return ret ;
}
2016-01-25 16:53:34 +08:00
/*
2019-12-11 14:26:23 +08:00
* This routine tries to handle interrupts in posted mode , here is how
2016-01-25 16:53:34 +08:00
* it deals with different cases :
* - For single - destination interrupts , handle it in posted mode
* - Else if vector hashing is enabled and it is a lowest - priority
* interrupt , handle it in posted mode and use the following mechanism
2019-12-11 14:26:22 +08:00
* to find the destination vCPU .
2016-01-25 16:53:34 +08:00
* 1. For lowest - priority interrupts , store all the possible
* destination vCPUs in an array .
* 2. Use " guest vector % max number of destination vCPUs " to find
* the right destination vCPU in the array for the lowest - priority
* interrupt .
* - Otherwise , use remapped mode to inject the interrupt .
*/
2015-09-18 22:29:47 +08:00
bool kvm_intr_is_single_vcpu_fast ( struct kvm * kvm , struct kvm_lapic_irq * irq ,
struct kvm_vcpu * * dest_vcpu )
{
struct kvm_apic_map * map ;
2016-07-12 22:09:18 +02:00
unsigned long bitmap ;
struct kvm_lapic * * dst = NULL ;
2015-09-18 22:29:47 +08:00
bool ret = false ;
if ( irq - > shorthand )
return false ;
rcu_read_lock ( ) ;
map = rcu_dereference ( kvm - > arch . apic_map ) ;
2016-07-12 22:09:18 +02:00
if ( kvm_apic_map_get_dest_lapic ( kvm , NULL , irq , map , & dst , & bitmap ) & &
hweight16 ( bitmap ) = = 1 ) {
unsigned long i = find_first_bit ( & bitmap , 16 ) ;
2016-01-25 16:53:34 +08:00
2016-07-12 22:09:18 +02:00
if ( dst [ i ] ) {
* dest_vcpu = dst [ i ] - > vcpu ;
ret = true ;
2016-01-25 16:53:34 +08:00
}
2015-09-18 22:29:47 +08:00
}
rcu_read_unlock ( ) ;
return ret ;
}
2007-09-12 10:58:04 +03:00
/*
* Add a pending IRQ into lapic .
* Return 1 if successfully added and 0 if discarded .
*/
static int __apic_accept_irq ( struct kvm_lapic * apic , int delivery_mode ,
2013-04-11 19:21:37 +08:00
int vector , int level , int trig_mode ,
2016-02-29 16:04:43 +01:00
struct dest_map * dest_map )
2007-09-12 10:58:04 +03:00
{
2009-03-05 16:34:44 +02:00
int result = 0 ;
2007-09-03 17:07:41 +03:00
struct kvm_vcpu * vcpu = apic - > vcpu ;
2007-09-12 10:58:04 +03:00
2014-09-11 11:51:02 +02:00
trace_kvm_apic_accept_irq ( vcpu - > vcpu_id , delivery_mode ,
trig_mode , vector ) ;
2007-09-12 10:58:04 +03:00
switch ( delivery_mode ) {
case APIC_DM_LOWEST :
2009-03-05 16:34:59 +02:00
vcpu - > arch . apic_arb_prio + + ;
2020-08-23 17:36:59 -05:00
fallthrough ;
2009-03-05 16:34:59 +02:00
case APIC_DM_FIXED :
2015-07-29 15:03:06 +02:00
if ( unlikely ( trig_mode & & ! level ) )
break ;
2007-09-12 10:58:04 +03:00
/* FIXME add logic for vcpu on reset */
if ( unlikely ( ! apic_enabled ( apic ) ) )
break ;
2013-07-25 09:58:45 +02:00
result = 1 ;
2016-02-29 16:04:44 +01:00
if ( dest_map ) {
2016-02-29 16:04:43 +01:00
__set_bit ( vcpu - > vcpu_id , dest_map - > map ) ;
2016-02-29 16:04:44 +01:00
dest_map - > vectors [ vcpu - > vcpu_id ] = vector ;
}
2009-12-29 12:42:16 +02:00
2015-07-29 15:03:06 +02:00
if ( apic_test_vector ( vector , apic - > regs + APIC_TMR ) ! = ! ! trig_mode ) {
if ( trig_mode )
2019-03-31 19:17:22 -07:00
kvm_lapic_set_vector ( vector ,
apic - > regs + APIC_TMR ) ;
2015-07-29 15:03:06 +02:00
else
2019-03-31 19:17:22 -07:00
kvm_lapic_clear_vector ( vector ,
apic - > regs + APIC_TMR ) ;
2015-07-29 15:03:06 +02:00
}
2021-01-14 22:27:56 -05:00
if ( static_call ( kvm_x86_deliver_posted_interrupt ) ( vcpu , vector ) ) {
2016-05-04 14:09:40 -05:00
kvm_lapic_set_irr ( vector , apic ) ;
2013-04-11 19:25:16 +08:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
kvm_vcpu_kick ( vcpu ) ;
}
2007-09-12 10:58:04 +03:00
break ;
case APIC_DM_REMRD :
2013-08-26 14:18:35 +05:30
result = 1 ;
vcpu - > arch . pv . pv_unhalted = 1 ;
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
kvm_vcpu_kick ( vcpu ) ;
2007-09-12 10:58:04 +03:00
break ;
case APIC_DM_SMI :
2015-05-07 11:36:11 +02:00
result = 1 ;
kvm_make_request ( KVM_REQ_SMI , vcpu ) ;
kvm_vcpu_kick ( vcpu ) ;
2007-09-12 10:58:04 +03:00
break ;
2008-05-15 09:52:48 +08:00
2007-09-12 10:58:04 +03:00
case APIC_DM_NMI :
2009-03-05 16:34:44 +02:00
result = 1 ;
2008-05-15 09:52:48 +08:00
kvm_inject_nmi ( vcpu ) ;
2008-09-26 09:30:54 +02:00
kvm_vcpu_kick ( vcpu ) ;
2007-09-12 10:58:04 +03:00
break ;
case APIC_DM_INIT :
2012-01-16 14:02:20 +01:00
if ( ! trig_mode | | level ) {
2009-03-05 16:34:44 +02:00
result = 1 ;
2013-03-13 12:42:34 +01:00
/* assumes that there are only KVM_APIC_INIT/SIPI */
apic - > pending_events = ( 1UL < < KVM_APIC_INIT ) ;
2010-07-27 12:30:24 +03:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2007-09-03 17:07:41 +03:00
kvm_vcpu_kick ( vcpu ) ;
}
2007-09-12 10:58:04 +03:00
break ;
case APIC_DM_STARTUP :
2013-03-13 12:42:34 +01:00
result = 1 ;
apic - > sipi_vector = vector ;
/* make sure sipi_vector is visible for the receiver */
smp_wmb ( ) ;
set_bit ( KVM_APIC_SIPI , & apic - > pending_events ) ;
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
kvm_vcpu_kick ( vcpu ) ;
2007-09-12 10:58:04 +03:00
break ;
2008-09-26 09:30:52 +02:00
case APIC_DM_EXTINT :
/*
* Should only be called by kvm_apic_local_deliver ( ) with LVT0 ,
* before NMI watchdog was enabled . Already handled by
* kvm_apic_accept_pic_intr ( ) .
*/
break ;
2007-09-12 10:58:04 +03:00
default :
printk ( KERN_ERR " TODO: unsupported delivery mode %x \n " ,
delivery_mode ) ;
break ;
}
return result ;
}
2019-11-07 07:53:43 -05:00
/*
* This routine identifies the destination vcpus mask meant to receive the
* IOAPIC interrupts . It either uses kvm_apic_map_get_dest_lapic ( ) to find
* out the destination vcpus array and set the bitmap or it traverses to
* each available vcpu to identify the same .
*/
void kvm_bitmap_or_dest_vcpus ( struct kvm * kvm , struct kvm_lapic_irq * irq ,
unsigned long * vcpu_bitmap )
{
struct kvm_lapic * * dest_vcpu = NULL ;
struct kvm_lapic * src = NULL ;
struct kvm_apic_map * map ;
struct kvm_vcpu * vcpu ;
unsigned long bitmap ;
int i , vcpu_idx ;
bool ret ;
rcu_read_lock ( ) ;
map = rcu_dereference ( kvm - > arch . apic_map ) ;
ret = kvm_apic_map_get_dest_lapic ( kvm , & src , irq , map , & dest_vcpu ,
& bitmap ) ;
if ( ret ) {
for_each_set_bit ( i , & bitmap , 16 ) {
if ( ! dest_vcpu [ i ] )
continue ;
vcpu_idx = dest_vcpu [ i ] - > vcpu - > vcpu_idx ;
__set_bit ( vcpu_idx , vcpu_bitmap ) ;
}
} else {
kvm_for_each_vcpu ( i , vcpu , kvm ) {
if ( ! kvm_apic_present ( vcpu ) )
continue ;
if ( ! kvm_apic_match_dest ( vcpu , NULL ,
2019-12-04 20:07:16 +01:00
irq - > shorthand ,
2019-11-07 07:53:43 -05:00
irq - > dest_id ,
irq - > dest_mode ) )
continue ;
__set_bit ( i , vcpu_bitmap ) ;
}
}
rcu_read_unlock ( ) ;
}
2009-03-05 16:34:59 +02:00
int kvm_apic_compare_prio ( struct kvm_vcpu * vcpu1 , struct kvm_vcpu * vcpu2 )
2007-12-02 22:35:57 +08:00
{
2009-03-05 16:34:59 +02:00
return vcpu1 - > arch . apic_arb_prio - vcpu2 - > arch . apic_arb_prio ;
2007-12-02 22:35:57 +08:00
}
2015-07-29 10:43:18 +02:00
static bool kvm_ioapic_handles_vector ( struct kvm_lapic * apic , int vector )
{
2015-11-10 15:36:32 +03:00
return test_bit ( vector , apic - > vcpu - > arch . ioapic_handled_vectors ) ;
2015-07-29 10:43:18 +02:00
}
2013-01-25 10:18:51 +08:00
static void kvm_ioapic_send_eoi ( struct kvm_lapic * apic , int vector )
{
2015-07-29 23:21:41 -07:00
int trigger_mode ;
/* Eoi the ioapic only if the ioapic doesn't own the vector. */
if ( ! kvm_ioapic_handles_vector ( apic , vector ) )
return ;
2015-07-29 10:43:18 +02:00
2015-07-29 23:21:41 -07:00
/* Request a KVM exit to inform the userspace IOAPIC. */
if ( irqchip_split ( apic - > vcpu - > kvm ) ) {
apic - > vcpu - > arch . pending_ioapic_eoi = vector ;
kvm_make_request ( KVM_REQ_IOAPIC_EOI_EXIT , apic - > vcpu ) ;
return ;
2013-01-25 10:18:51 +08:00
}
2015-07-29 23:21:41 -07:00
if ( apic_test_vector ( vector , apic - > regs + APIC_TMR ) )
trigger_mode = IOAPIC_LEVEL_TRIG ;
else
trigger_mode = IOAPIC_EDGE_TRIG ;
kvm_ioapic_update_eoi ( apic - > vcpu , vector , trigger_mode ) ;
2013-01-25 10:18:51 +08:00
}
2012-06-24 19:25:07 +03:00
static int apic_set_eoi ( struct kvm_lapic * apic )
2007-09-12 10:58:04 +03:00
{
int vector = apic_find_highest_isr ( apic ) ;
2012-06-24 19:25:07 +03:00
trace_kvm_eoi ( apic , vector ) ;
2007-09-12 10:58:04 +03:00
/*
* Not every write EOI will has corresponding ISR ,
* one example is when Kernel check timer on setup_IO_APIC
*/
if ( vector = = - 1 )
2012-06-24 19:25:07 +03:00
return vector ;
2007-09-12 10:58:04 +03:00
2012-06-24 19:24:26 +03:00
apic_clear_isr ( vector , apic ) ;
2007-09-12 10:58:04 +03:00
apic_update_ppr ( apic ) ;
2021-01-26 14:48:12 +01:00
if ( to_hv_vcpu ( apic - > vcpu ) & &
test_bit ( vector , to_hv_synic ( apic - > vcpu ) - > vec_bitmap ) )
2015-11-10 15:36:34 +03:00
kvm_hv_synic_send_eoi ( apic - > vcpu , vector ) ;
2013-01-25 10:18:51 +08:00
kvm_ioapic_send_eoi ( apic , vector ) ;
2010-07-27 12:30:24 +03:00
kvm_make_request ( KVM_REQ_EVENT , apic - > vcpu ) ;
2012-06-24 19:25:07 +03:00
return vector ;
2007-09-12 10:58:04 +03:00
}
2013-01-25 10:18:51 +08:00
/*
* this interface assumes a trap - like exit , which has already finished
* desired side effect including vISR and vPPR update .
*/
void kvm_apic_set_eoi_accelerated ( struct kvm_vcpu * vcpu , int vector )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
trace_kvm_eoi ( apic , vector ) ;
kvm_ioapic_send_eoi ( apic , vector ) ;
kvm_make_request ( KVM_REQ_EVENT , apic - > vcpu ) ;
}
EXPORT_SYMBOL_GPL ( kvm_apic_set_eoi_accelerated ) ;
2020-03-26 10:20:02 +08:00
void kvm_apic_send_ipi ( struct kvm_lapic * apic , u32 icr_low , u32 icr_high )
2007-09-12 10:58:04 +03:00
{
2009-03-05 16:35:04 +02:00
struct kvm_lapic_irq irq ;
2007-09-12 10:58:04 +03:00
2009-03-05 16:35:04 +02:00
irq . vector = icr_low & APIC_VECTOR_MASK ;
irq . delivery_mode = icr_low & APIC_MODE_MASK ;
irq . dest_mode = icr_low & APIC_DEST_MASK ;
2015-04-21 14:57:05 +02:00
irq . level = ( icr_low & APIC_INT_ASSERT ) ! = 0 ;
2009-03-05 16:35:04 +02:00
irq . trig_mode = icr_low & APIC_INT_LEVELTRIG ;
irq . shorthand = icr_low & APIC_SHORT_MASK ;
2015-03-18 19:26:03 -06:00
irq . msi_redir_hint = false ;
2009-07-05 17:39:36 +03:00
if ( apic_x2apic_mode ( apic ) )
irq . dest_id = icr_high ;
else
irq . dest_id = GET_APIC_DEST_FIELD ( icr_high ) ;
2007-09-12 10:58:04 +03:00
2009-07-07 16:00:57 +03:00
trace_kvm_apic_ipi ( icr_low , irq . dest_id ) ;
2013-04-11 19:21:37 +08:00
kvm_irq_delivery_to_apic ( apic - > vcpu - > kvm , apic , & irq , NULL ) ;
2007-09-12 10:58:04 +03:00
}
static u32 apic_get_tmcct ( struct kvm_lapic * apic )
{
2016-10-24 18:23:13 +08:00
ktime_t remaining , now ;
2009-02-10 20:41:41 -02:00
s64 ns ;
2007-10-21 08:55:50 +02:00
u32 tmcct ;
2007-09-12 10:58:04 +03:00
ASSERT ( apic ! = NULL ) ;
2007-10-21 08:55:50 +02:00
/* if initial count is 0, current count should also be 0 */
2016-05-04 14:09:41 -05:00
if ( kvm_lapic_get_reg ( apic , APIC_TMICT ) = = 0 | |
2013-11-19 14:12:18 -08:00
apic - > lapic_timer . period = = 0 )
2007-10-21 08:55:50 +02:00
return 0 ;
2016-10-25 15:23:49 +02:00
now = ktime_get ( ) ;
2016-10-24 18:23:13 +08:00
remaining = ktime_sub ( apic - > lapic_timer . target_expiration , now ) ;
2009-02-10 20:41:41 -02:00
if ( ktime_to_ns ( remaining ) < 0 )
2016-12-25 12:30:41 +01:00
remaining = 0 ;
2009-02-10 20:41:41 -02:00
2009-02-23 10:57:41 -03:00
ns = mod_64 ( ktime_to_ns ( remaining ) , apic - > lapic_timer . period ) ;
tmcct = div64_u64 ( ns ,
( APIC_BUS_CYCLE_NS * apic - > divide_count ) ) ;
2007-09-12 10:58:04 +03:00
return tmcct ;
}
2007-10-22 16:50:39 +02:00
static void __report_tpr_access ( struct kvm_lapic * apic , bool write )
{
struct kvm_vcpu * vcpu = apic - > vcpu ;
struct kvm_run * run = vcpu - > run ;
2010-05-10 12:34:53 +03:00
kvm_make_request ( KVM_REQ_REPORT_TPR_ACCESS , vcpu ) ;
2008-06-27 14:58:02 -03:00
run - > tpr_access . rip = kvm_rip_read ( vcpu ) ;
2007-10-22 16:50:39 +02:00
run - > tpr_access . is_write = write ;
}
static inline void report_tpr_access ( struct kvm_lapic * apic , bool write )
{
if ( apic - > vcpu - > arch . tpr_access_reporting )
__report_tpr_access ( apic , write ) ;
}
2007-09-12 10:58:04 +03:00
static u32 __apic_read ( struct kvm_lapic * apic , unsigned int offset )
{
u32 val = 0 ;
if ( offset > = LAPIC_MMIO_LENGTH )
return 0 ;
switch ( offset ) {
case APIC_ARBPRI :
break ;
case APIC_TMCCT : /* Timer CCR */
2011-09-22 16:55:52 +08:00
if ( apic_lvtt_tscdeadline ( apic ) )
return 0 ;
2007-09-12 10:58:04 +03:00
val = apic_get_tmcct ( apic ) ;
break ;
2012-07-22 17:41:00 +03:00
case APIC_PROCPRI :
apic_update_ppr ( apic ) ;
2016-05-04 14:09:41 -05:00
val = kvm_lapic_get_reg ( apic , offset ) ;
2012-07-22 17:41:00 +03:00
break ;
2007-10-22 16:50:39 +02:00
case APIC_TASKPRI :
report_tpr_access ( apic , false ) ;
2020-08-23 17:36:59 -05:00
fallthrough ;
2007-09-12 10:58:04 +03:00
default :
2016-05-04 14:09:41 -05:00
val = kvm_lapic_get_reg ( apic , offset ) ;
2007-09-12 10:58:04 +03:00
break ;
}
return val ;
}
2009-06-01 12:54:50 -04:00
static inline struct kvm_lapic * to_lapic ( struct kvm_io_device * dev )
{
return container_of ( dev , struct kvm_lapic , dev ) ;
}
2019-07-05 14:57:58 +02:00
# define APIC_REG_MASK(reg) (1ull << ((reg) >> 4))
# define APIC_REGS_MASK(first, count) \
( APIC_REG_MASK ( first ) * ( ( 1ull < < ( count ) ) - 1 ) )
2016-05-04 14:09:40 -05:00
int kvm_lapic_reg_read ( struct kvm_lapic * apic , u32 offset , int len ,
2009-07-05 17:39:36 +03:00
void * data )
2007-09-12 10:58:04 +03:00
{
unsigned char alignment = offset & 0xf ;
u32 result ;
2012-06-28 15:22:57 +08:00
/* this bitmask has a bit cleared for each reserved register */
2019-07-05 14:57:58 +02:00
u64 valid_reg_mask =
APIC_REG_MASK ( APIC_ID ) |
APIC_REG_MASK ( APIC_LVR ) |
APIC_REG_MASK ( APIC_TASKPRI ) |
APIC_REG_MASK ( APIC_PROCPRI ) |
APIC_REG_MASK ( APIC_LDR ) |
APIC_REG_MASK ( APIC_DFR ) |
APIC_REG_MASK ( APIC_SPIV ) |
APIC_REGS_MASK ( APIC_ISR , APIC_ISR_NR ) |
APIC_REGS_MASK ( APIC_TMR , APIC_ISR_NR ) |
APIC_REGS_MASK ( APIC_IRR , APIC_ISR_NR ) |
APIC_REG_MASK ( APIC_ESR ) |
APIC_REG_MASK ( APIC_ICR ) |
APIC_REG_MASK ( APIC_ICR2 ) |
APIC_REG_MASK ( APIC_LVTT ) |
APIC_REG_MASK ( APIC_LVTTHMR ) |
APIC_REG_MASK ( APIC_LVTPC ) |
APIC_REG_MASK ( APIC_LVT0 ) |
APIC_REG_MASK ( APIC_LVT1 ) |
APIC_REG_MASK ( APIC_LVTERR ) |
APIC_REG_MASK ( APIC_TMICT ) |
APIC_REG_MASK ( APIC_TMCCT ) |
APIC_REG_MASK ( APIC_TDCR ) ;
/* ARBPRI is not valid on x2APIC */
if ( ! apic_x2apic_mode ( apic ) )
valid_reg_mask | = APIC_REG_MASK ( APIC_ARBPRI ) ;
2009-07-05 17:39:36 +03:00
2019-07-06 01:08:48 +08:00
if ( offset > 0x3f0 | | ! ( valid_reg_mask & APIC_REG_MASK ( offset ) ) )
2009-07-05 17:39:36 +03:00
return 1 ;
2007-09-12 10:58:04 +03:00
result = __apic_read ( apic , offset & ~ 0xf ) ;
2009-06-17 09:22:14 -03:00
trace_kvm_apic_read ( offset , result ) ;
2007-09-12 10:58:04 +03:00
switch ( len ) {
case 1 :
case 2 :
case 4 :
memcpy ( data , ( char * ) & result + alignment , len ) ;
break ;
default :
printk ( KERN_ERR " Local APIC read with len = %x, "
" should be 1,2, or 4 instead \n " , len ) ;
break ;
}
2009-06-29 22:24:32 +03:00
return 0 ;
2007-09-12 10:58:04 +03:00
}
2016-05-04 14:09:40 -05:00
EXPORT_SYMBOL_GPL ( kvm_lapic_reg_read ) ;
2007-09-12 10:58:04 +03:00
2009-07-05 17:39:36 +03:00
static int apic_mmio_in_range ( struct kvm_lapic * apic , gpa_t addr )
{
2018-08-02 17:08:16 +02:00
return addr > = apic - > base_address & &
addr < apic - > base_address + LAPIC_MMIO_LENGTH ;
2009-07-05 17:39:36 +03:00
}
2015-03-26 14:39:28 +00:00
static int apic_mmio_read ( struct kvm_vcpu * vcpu , struct kvm_io_device * this ,
2009-07-05 17:39:36 +03:00
gpa_t address , int len , void * data )
{
struct kvm_lapic * apic = to_lapic ( this ) ;
u32 offset = address - apic - > base_address ;
if ( ! apic_mmio_in_range ( apic , address ) )
return - EOPNOTSUPP ;
2018-08-02 17:08:16 +02:00
if ( ! kvm_apic_hw_enabled ( apic ) | | apic_x2apic_mode ( apic ) ) {
if ( ! kvm_check_has_quirk ( vcpu - > kvm ,
KVM_X86_QUIRK_LAPIC_MMIO_HOLE ) )
return - EOPNOTSUPP ;
memset ( data , 0xff , len ) ;
return 0 ;
}
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_read ( apic , offset , len , data ) ;
2009-07-05 17:39:36 +03:00
return 0 ;
}
2007-09-12 10:58:04 +03:00
static void update_divide_count ( struct kvm_lapic * apic )
{
u32 tmp1 , tmp2 , tdcr ;
2016-05-04 14:09:41 -05:00
tdcr = kvm_lapic_get_reg ( apic , APIC_TDCR ) ;
2007-09-12 10:58:04 +03:00
tmp1 = tdcr & 0xf ;
tmp2 = ( ( tmp1 & 0x3 ) | ( ( tmp1 & 0x8 ) > > 1 ) ) + 1 ;
2009-02-23 10:57:41 -03:00
apic - > divide_count = 0x1 < < ( tmp2 & 0x7 ) ;
2007-09-12 10:58:04 +03:00
}
2017-10-05 18:54:24 -07:00
static void limit_periodic_timer_frequency ( struct kvm_lapic * apic )
{
/*
* Do not allow the guest to program periodic timers with small
* interval , since the hrtimers are not throttled by the host
* scheduler .
*/
2017-10-05 18:54:25 -07:00
if ( apic_lvtt_period ( apic ) & & apic - > lapic_timer . period ) {
2017-10-05 18:54:24 -07:00
s64 min_period = min_timer_period_us * 1000LL ;
if ( apic - > lapic_timer . period < min_period ) {
pr_info_ratelimited (
" kvm: vcpu %i: requested %lld ns "
" lapic timer period limited to %lld ns \n " ,
apic - > vcpu - > vcpu_id ,
apic - > lapic_timer . period , min_period ) ;
apic - > lapic_timer . period = min_period ;
}
}
}
2020-03-24 14:32:10 +08:00
static void cancel_hv_timer ( struct kvm_lapic * apic ) ;
2015-06-05 20:57:41 +02:00
static void apic_update_lvtt ( struct kvm_lapic * apic )
{
2016-05-04 14:09:41 -05:00
u32 timer_mode = kvm_lapic_get_reg ( apic , APIC_LVTT ) &
2015-06-05 20:57:41 +02:00
apic - > lapic_timer . timer_mode_mask ;
if ( apic - > lapic_timer . timer_mode ! = timer_mode ) {
2017-10-05 03:53:51 -07:00
if ( apic_lvtt_tscdeadline ( apic ) ! = ( timer_mode = =
2017-10-05 18:54:25 -07:00
APIC_LVT_TIMER_TSCDEADLINE ) ) {
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
2020-03-24 14:32:10 +08:00
preempt_disable ( ) ;
if ( apic - > lapic_timer . hv_timer_in_use )
cancel_hv_timer ( apic ) ;
preempt_enable ( ) ;
2017-10-06 19:25:55 +02:00
kvm_lapic_set_reg ( apic , APIC_TMICT , 0 ) ;
apic - > lapic_timer . period = 0 ;
apic - > lapic_timer . tscdeadline = 0 ;
2017-10-05 18:54:25 -07:00
}
2015-06-05 20:57:41 +02:00
apic - > lapic_timer . timer_mode = timer_mode ;
2017-10-05 18:54:25 -07:00
limit_periodic_timer_frequency ( apic ) ;
2015-06-05 20:57:41 +02:00
}
}
2014-12-16 09:08:15 -05:00
/*
* On APICv , this test will cause a busy wait
* during a higher - priority task .
*/
static bool lapic_timer_int_injected ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-05-04 14:09:41 -05:00
u32 reg = kvm_lapic_get_reg ( apic , APIC_LVTT ) ;
2014-12-16 09:08:15 -05:00
if ( kvm_apic_hw_enabled ( apic ) ) {
int vec = reg & APIC_VECTOR_MASK ;
2015-02-02 15:26:08 -02:00
void * bitmap = apic - > regs + APIC_ISR ;
2014-12-16 09:08:15 -05:00
2015-11-10 15:36:33 +03:00
if ( vcpu - > arch . apicv_active )
2015-02-02 15:26:08 -02:00
bitmap = apic - > regs + APIC_IRR ;
if ( apic_test_vector ( vec , bitmap ) )
return true ;
2014-12-16 09:08:15 -05:00
}
return false ;
}
KVM: lapic: Convert guest TSC to host time domain if necessary
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.
Currently, KVM manually converts the delay from nanoseconds to clock
cycles. But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain. This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.
When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing. Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:34 -07:00
static inline void __wait_lapic_expire ( struct kvm_vcpu * vcpu , u64 guest_cycles )
{
u64 timer_advance_ns = vcpu - > arch . apic - > lapic_timer . timer_advance_ns ;
/*
* If the guest TSC is running at a different ratio than the host , then
* convert the delay to nanoseconds to achieve an accurate delay . Note
* that __delay ( ) uses delay_tsc whenever the hardware has TSC , thus
* always for VMX enabled hardware .
*/
if ( vcpu - > arch . tsc_scaling_ratio = = kvm_default_tsc_scaling_ratio ) {
__delay ( min ( guest_cycles ,
nsec_to_cycles ( vcpu , timer_advance_ns ) ) ) ;
} else {
u64 delay_ns = guest_cycles * 1000000ULL ;
do_div ( delay_ns , vcpu - > arch . virtual_tsc_khz ) ;
ndelay ( min_t ( u32 , delay_ns , timer_advance_ns ) ) ;
}
}
2019-05-20 16:18:05 +08:00
static inline void adjust_lapic_timer_advance ( struct kvm_vcpu * vcpu ,
2019-05-20 16:18:08 +08:00
s64 advance_expire_delta )
2014-12-16 09:08:15 -05:00
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2019-04-17 10:15:32 -07:00
u32 timer_advance_ns = apic - > lapic_timer . timer_advance_ns ;
2019-05-20 16:18:05 +08:00
u64 ns ;
2019-09-17 16:16:26 +08:00
/* Do not adjust for tiny fluctuations or large random spikes. */
if ( abs ( advance_expire_delta ) > LAPIC_TIMER_ADVANCE_ADJUST_MAX | |
abs ( advance_expire_delta ) < LAPIC_TIMER_ADVANCE_ADJUST_MIN )
return ;
2019-05-20 16:18:05 +08:00
/* too early */
2019-05-20 16:18:08 +08:00
if ( advance_expire_delta < 0 ) {
ns = - advance_expire_delta * 1000000ULL ;
2019-05-20 16:18:05 +08:00
do_div ( ns , vcpu - > arch . virtual_tsc_khz ) ;
2019-09-17 16:16:26 +08:00
timer_advance_ns - = ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP ;
2019-05-20 16:18:05 +08:00
} else {
/* too late */
2019-05-20 16:18:08 +08:00
ns = advance_expire_delta * 1000000ULL ;
2019-05-20 16:18:05 +08:00
do_div ( ns , vcpu - > arch . virtual_tsc_khz ) ;
2019-09-17 16:16:26 +08:00
timer_advance_ns + = ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP ;
2019-05-20 16:18:05 +08:00
}
2019-09-26 08:54:03 +08:00
if ( unlikely ( timer_advance_ns > LAPIC_TIMER_ADVANCE_NS_MAX ) )
timer_advance_ns = LAPIC_TIMER_ADVANCE_NS_INIT ;
2019-05-20 16:18:05 +08:00
apic - > lapic_timer . timer_advance_ns = timer_advance_ns ;
}
2019-07-06 09:26:51 +08:00
static void __kvm_wait_lapic_expire ( struct kvm_vcpu * vcpu )
2019-05-20 16:18:05 +08:00
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
u64 guest_tsc , tsc_deadline ;
2014-12-16 09:08:15 -05:00
tsc_deadline = apic - > lapic_timer . expired_tscdeadline ;
apic - > lapic_timer . expired_tscdeadline = 0 ;
2015-10-20 15:39:07 +08:00
guest_tsc = kvm_read_l1_tsc ( vcpu , rdtsc ( ) ) ;
2019-05-20 16:18:08 +08:00
apic - > lapic_timer . advance_expire_delta = guest_tsc - tsc_deadline ;
2014-12-16 09:08:15 -05:00
if ( guest_tsc < tsc_deadline )
KVM: lapic: Convert guest TSC to host time domain if necessary
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.
Currently, KVM manually converts the delay from nanoseconds to clock
cycles. But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain. This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.
When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing. Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:34 -07:00
__wait_lapic_expire ( vcpu , tsc_deadline - guest_tsc ) ;
KVM: LAPIC: Tune lapic_timer_advance_ns automatically
In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU
generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat
is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel,
both preemption_timer=N, Skylake server).
This patch adds the capability to automatically tune lapic_timer_advance_ns
step by step, the initial value is 1000ns as 'commit d0659d946be0 ("KVM: x86:
add option to advance tscdeadline hrtimer expiration")' recommended, it will be
reduced when it is too early, and increased when it is too late. The guest_tsc
and tsc_deadline are hard to equal, so we assume we are done when the delta
is within a small scope e.g. 100 cycles. This patch reduces latency
(kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer enabled)
from ~2600 cyles to ~1200 cyles on our Skylake server.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-09 09:02:08 +08:00
2019-09-17 16:16:26 +08:00
if ( lapic_timer_advance_dynamic )
2019-05-20 16:18:08 +08:00
adjust_lapic_timer_advance ( vcpu , apic - > lapic_timer . advance_expire_delta ) ;
2014-10-10 19:15:08 +02:00
}
2019-07-06 09:26:51 +08:00
void kvm_wait_lapic_expire ( struct kvm_vcpu * vcpu )
{
2020-09-10 17:50:41 +08:00
if ( lapic_in_kernel ( vcpu ) & &
vcpu - > arch . apic - > lapic_timer . expired_tscdeadline & &
vcpu - > arch . apic - > lapic_timer . timer_advance_ns & &
lapic_timer_int_injected ( vcpu ) )
2019-07-06 09:26:51 +08:00
__kvm_wait_lapic_expire ( vcpu ) ;
}
2019-05-20 16:18:09 +08:00
EXPORT_SYMBOL_GPL ( kvm_wait_lapic_expire ) ;
2014-10-10 19:15:08 +02:00
2019-07-06 09:26:51 +08:00
static void kvm_apic_inject_pending_timer_irqs ( struct kvm_lapic * apic )
{
struct kvm_timer * ktimer = & apic - > lapic_timer ;
kvm_apic_local_deliver ( apic , APIC_LVTT ) ;
2020-01-16 16:50:21 +08:00
if ( apic_lvtt_tscdeadline ( apic ) ) {
2019-07-06 09:26:51 +08:00
ktimer - > tscdeadline = 0 ;
2020-01-16 16:50:21 +08:00
} else if ( apic_lvtt_oneshot ( apic ) ) {
2019-07-06 09:26:51 +08:00
ktimer - > tscdeadline = 0 ;
ktimer - > target_expiration = 0 ;
}
}
2020-04-28 14:23:28 +08:00
static void apic_timer_expired ( struct kvm_lapic * apic , bool from_timer_fn )
2019-07-06 09:26:51 +08:00
{
struct kvm_vcpu * vcpu = apic - > vcpu ;
struct kvm_timer * ktimer = & apic - > lapic_timer ;
if ( atomic_read ( & apic - > lapic_timer . pending ) )
return ;
if ( apic_lvtt_tscdeadline ( apic ) | | ktimer - > hv_timer_in_use )
ktimer - > expired_tscdeadline = ktimer - > tscdeadline ;
2020-04-28 14:23:28 +08:00
if ( ! from_timer_fn & & vcpu - > arch . apicv_active ) {
WARN_ON ( kvm_get_running_vcpu ( ) ! = vcpu ) ;
kvm_apic_inject_pending_timer_irqs ( apic ) ;
return ;
}
2019-07-06 09:26:51 +08:00
if ( kvm_use_posted_timer_interrupt ( apic - > vcpu ) ) {
2021-03-04 18:18:08 -08:00
/*
* Ensure the guest ' s timer has truly expired before posting an
* interrupt . Open code the relevant checks to avoid querying
* lapic_timer_int_injected ( ) , which will be false since the
* interrupt isn ' t yet injected . Waiting until after injecting
* is not an option since that won ' t help a posted interrupt .
*/
if ( vcpu - > arch . apic - > lapic_timer . expired_tscdeadline & &
vcpu - > arch . apic - > lapic_timer . timer_advance_ns )
__kvm_wait_lapic_expire ( vcpu ) ;
2019-07-06 09:26:51 +08:00
kvm_apic_inject_pending_timer_irqs ( apic ) ;
return ;
}
atomic_inc ( & apic - > lapic_timer . pending ) ;
2020-09-10 17:50:40 +08:00
kvm_make_request ( KVM_REQ_PENDING_TIMER , vcpu ) ;
if ( from_timer_fn )
kvm_vcpu_kick ( vcpu ) ;
2019-07-06 09:26:51 +08:00
}
2016-06-13 14:20:00 -07:00
static void start_sw_tscdeadline ( struct kvm_lapic * apic )
{
2019-04-17 10:15:32 -07:00
struct kvm_timer * ktimer = & apic - > lapic_timer ;
u64 guest_tsc , tscdeadline = ktimer - > tscdeadline ;
2016-06-13 14:20:00 -07:00
u64 ns = 0 ;
ktime_t expire ;
struct kvm_vcpu * vcpu = apic - > vcpu ;
unsigned long this_tsc_khz = vcpu - > arch . virtual_tsc_khz ;
unsigned long flags ;
ktime_t now ;
if ( unlikely ( ! tscdeadline | | ! this_tsc_khz ) )
return ;
local_irq_save ( flags ) ;
2016-10-25 15:23:49 +02:00
now = ktime_get ( ) ;
2016-06-13 14:20:00 -07:00
guest_tsc = kvm_read_l1_tsc ( vcpu , rdtsc ( ) ) ;
2019-04-16 20:36:34 +03:00
ns = ( tscdeadline - guest_tsc ) * 1000000ULL ;
do_div ( ns , this_tsc_khz ) ;
if ( likely ( tscdeadline > guest_tsc ) & &
2019-04-17 10:15:32 -07:00
likely ( ns > apic - > lapic_timer . timer_advance_ns ) ) {
2016-06-13 14:20:00 -07:00
expire = ktime_add_ns ( now , ns ) ;
2019-04-17 10:15:32 -07:00
expire = ktime_sub_ns ( expire , ktimer - > timer_advance_ns ) ;
2019-07-26 20:30:55 +02:00
hrtimer_start ( & ktimer - > timer , expire , HRTIMER_MODE_ABS_HARD ) ;
2016-06-13 14:20:00 -07:00
} else
2020-04-28 14:23:28 +08:00
apic_timer_expired ( apic , false ) ;
2016-06-13 14:20:00 -07:00
local_irq_restore ( flags ) ;
}
2018-10-10 15:56:53 -07:00
static inline u64 tmict_to_ns ( struct kvm_lapic * apic , u32 tmict )
{
return ( u64 ) tmict * APIC_BUS_CYCLE_NS * ( u64 ) apic - > divide_count ;
}
2017-10-06 07:38:32 -07:00
static void update_target_expiration ( struct kvm_lapic * apic , uint32_t old_divisor )
{
ktime_t now , remaining ;
u64 ns_remaining_old , ns_remaining_new ;
2018-10-10 15:56:53 -07:00
apic - > lapic_timer . period =
tmict_to_ns ( apic , kvm_lapic_get_reg ( apic , APIC_TMICT ) ) ;
2017-10-06 07:38:32 -07:00
limit_periodic_timer_frequency ( apic ) ;
now = ktime_get ( ) ;
remaining = ktime_sub ( apic - > lapic_timer . target_expiration , now ) ;
if ( ktime_to_ns ( remaining ) < 0 )
remaining = 0 ;
ns_remaining_old = ktime_to_ns ( remaining ) ;
ns_remaining_new = mul_u64_u32_div ( ns_remaining_old ,
apic - > divide_count , old_divisor ) ;
apic - > lapic_timer . tscdeadline + =
nsec_to_cycles ( apic - > vcpu , ns_remaining_new ) -
nsec_to_cycles ( apic - > vcpu , ns_remaining_old ) ;
apic - > lapic_timer . target_expiration = ktime_add_ns ( now , ns_remaining_new ) ;
}
2018-10-10 15:56:53 -07:00
static bool set_target_expiration ( struct kvm_lapic * apic , u32 count_reg )
2016-10-24 18:23:09 +08:00
{
ktime_t now ;
2016-10-24 18:23:13 +08:00
u64 tscl = rdtsc ( ) ;
2018-10-10 15:56:53 -07:00
s64 deadline ;
2016-10-24 18:23:09 +08:00
2016-10-25 15:23:49 +02:00
now = ktime_get ( ) ;
2018-10-10 15:56:53 -07:00
apic - > lapic_timer . period =
tmict_to_ns ( apic , kvm_lapic_get_reg ( apic , APIC_TMICT ) ) ;
2016-10-24 18:23:09 +08:00
2017-10-06 19:25:54 +02:00
if ( ! apic - > lapic_timer . period ) {
apic - > lapic_timer . tscdeadline = 0 ;
2016-10-24 18:23:13 +08:00
return false ;
2016-10-24 18:23:09 +08:00
}
2017-10-05 18:54:24 -07:00
limit_periodic_timer_frequency ( apic ) ;
2018-10-10 15:56:53 -07:00
deadline = apic - > lapic_timer . period ;
if ( apic_lvtt_period ( apic ) | | apic_lvtt_oneshot ( apic ) ) {
if ( unlikely ( count_reg ! = APIC_TMICT ) ) {
deadline = tmict_to_ns ( apic ,
kvm_lapic_get_reg ( apic , count_reg ) ) ;
if ( unlikely ( deadline < = 0 ) )
deadline = apic - > lapic_timer . period ;
else if ( unlikely ( deadline > apic - > lapic_timer . period ) ) {
pr_info_ratelimited (
" kvm: vcpu %i: requested lapic timer restore with "
" starting count register %#x=%u (%lld ns) > initial count (%lld ns). "
" Using initial count to start timer. \n " ,
apic - > vcpu - > vcpu_id ,
count_reg ,
kvm_lapic_get_reg ( apic , count_reg ) ,
deadline , apic - > lapic_timer . period ) ;
kvm_lapic_set_reg ( apic , count_reg , 0 ) ;
deadline = apic - > lapic_timer . period ;
}
}
}
2016-10-24 18:23:09 +08:00
2016-10-24 18:23:13 +08:00
apic - > lapic_timer . tscdeadline = kvm_read_l1_tsc ( apic - > vcpu , tscl ) +
2018-10-10 15:56:53 -07:00
nsec_to_cycles ( apic - > vcpu , deadline ) ;
apic - > lapic_timer . target_expiration = ktime_add_ns ( now , deadline ) ;
2016-10-24 18:23:13 +08:00
return true ;
}
static void advance_periodic_target_expiration ( struct kvm_lapic * apic )
{
2018-05-18 16:55:46 +01:00
ktime_t now = ktime_get ( ) ;
u64 tscl = rdtsc ( ) ;
ktime_t delta ;
/*
* Synchronize both deadlines to the same time source or
* differences in the periods ( caused by differences in the
* underlying clocks or numerical approximation errors ) will
* cause the two to drift apart over time as the errors
* accumulate .
*/
2016-10-24 18:23:13 +08:00
apic - > lapic_timer . target_expiration =
ktime_add_ns ( apic - > lapic_timer . target_expiration ,
apic - > lapic_timer . period ) ;
2018-05-18 16:55:46 +01:00
delta = ktime_sub ( apic - > lapic_timer . target_expiration , now ) ;
apic - > lapic_timer . tscdeadline = kvm_read_l1_tsc ( apic - > vcpu , tscl ) +
nsec_to_cycles ( apic - > vcpu , delta ) ;
2016-10-24 18:23:09 +08:00
}
2018-04-29 22:05:58 +00:00
static void start_sw_period ( struct kvm_lapic * apic )
{
if ( ! apic - > lapic_timer . period )
return ;
if ( ktime_after ( ktime_get ( ) ,
apic - > lapic_timer . target_expiration ) ) {
2020-04-28 14:23:28 +08:00
apic_timer_expired ( apic , false ) ;
2018-04-29 22:05:58 +00:00
if ( apic_lvtt_oneshot ( apic ) )
return ;
advance_periodic_target_expiration ( apic ) ;
}
hrtimer_start ( & apic - > lapic_timer . timer ,
apic - > lapic_timer . target_expiration ,
2020-03-20 15:06:07 +08:00
HRTIMER_MODE_ABS_HARD ) ;
2018-04-29 22:05:58 +00:00
}
2016-06-13 14:20:01 -07:00
bool kvm_lapic_hv_timer_in_use ( struct kvm_vcpu * vcpu )
{
2016-08-03 12:04:12 +08:00
if ( ! lapic_in_kernel ( vcpu ) )
return false ;
2016-06-13 14:20:01 -07:00
return vcpu - > arch . apic - > lapic_timer . hv_timer_in_use ;
}
EXPORT_SYMBOL_GPL ( kvm_lapic_hv_timer_in_use ) ;
2016-10-24 18:23:12 +08:00
static void cancel_hv_timer ( struct kvm_lapic * apic )
2016-06-30 08:52:49 +08:00
{
2017-07-25 00:43:15 -07:00
WARN_ON ( preemptible ( ) ) ;
2017-06-29 17:14:50 +02:00
WARN_ON ( ! apic - > lapic_timer . hv_timer_in_use ) ;
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_cancel_hv_timer ) ( apic - > vcpu ) ;
2016-06-30 08:52:49 +08:00
apic - > lapic_timer . hv_timer_in_use = false ;
}
2017-06-29 17:14:50 +02:00
static bool start_hv_timer ( struct kvm_lapic * apic )
2016-06-28 14:54:19 +08:00
{
2017-06-29 17:14:50 +02:00
struct kvm_timer * ktimer = & apic - > lapic_timer ;
2019-04-16 13:32:46 -07:00
struct kvm_vcpu * vcpu = apic - > vcpu ;
bool expired ;
2016-06-28 14:54:19 +08:00
2017-07-25 00:43:15 -07:00
WARN_ON ( preemptible ( ) ) ;
2020-05-05 06:45:35 -04:00
if ( ! kvm_can_use_hv_timer ( vcpu ) )
2017-06-29 17:14:50 +02:00
return false ;
2017-10-06 19:25:53 +02:00
if ( ! ktimer - > tscdeadline )
return false ;
2021-01-14 22:27:56 -05:00
if ( static_call ( kvm_x86_set_hv_timer ) ( vcpu , ktimer - > tscdeadline , & expired ) )
2017-06-29 17:14:50 +02:00
return false ;
ktimer - > hv_timer_in_use = true ;
hrtimer_cancel ( & ktimer - > timer ) ;
2016-06-28 14:54:19 +08:00
2017-06-29 17:14:50 +02:00
/*
2019-04-16 13:32:45 -07:00
* To simplify handling the periodic timer , leave the hv timer running
* even if the deadline timer has expired , i . e . rely on the resulting
* VM - Exit to recompute the periodic timer ' s target expiration .
2017-06-29 17:14:50 +02:00
*/
2019-04-16 13:32:45 -07:00
if ( ! apic_lvtt_period ( apic ) ) {
/*
* Cancel the hv timer if the sw timer fired while the hv timer
* was being programmed , or if the hv timer itself expired .
*/
if ( atomic_read ( & ktimer - > pending ) ) {
cancel_hv_timer ( apic ) ;
2019-04-16 13:32:46 -07:00
} else if ( expired ) {
2020-04-28 14:23:28 +08:00
apic_timer_expired ( apic , false ) ;
2019-04-16 13:32:45 -07:00
cancel_hv_timer ( apic ) ;
}
2017-06-29 06:28:09 -07:00
}
2017-06-29 17:14:50 +02:00
2019-04-16 13:32:46 -07:00
trace_kvm_hv_timer_state ( vcpu - > vcpu_id , ktimer - > hv_timer_in_use ) ;
2019-04-16 13:32:45 -07:00
2017-06-29 17:14:50 +02:00
return true ;
}
2017-06-29 17:14:50 +02:00
static void start_sw_timer ( struct kvm_lapic * apic )
2017-06-29 17:14:50 +02:00
{
2017-06-29 17:14:50 +02:00
struct kvm_timer * ktimer = & apic - > lapic_timer ;
2017-07-25 00:43:15 -07:00
WARN_ON ( preemptible ( ) ) ;
2017-06-29 17:14:50 +02:00
if ( apic - > lapic_timer . hv_timer_in_use )
cancel_hv_timer ( apic ) ;
if ( ! apic_lvtt_period ( apic ) & & atomic_read ( & ktimer - > pending ) )
return ;
if ( apic_lvtt_period ( apic ) | | apic_lvtt_oneshot ( apic ) )
start_sw_period ( apic ) ;
else if ( apic_lvtt_tscdeadline ( apic ) )
start_sw_tscdeadline ( apic ) ;
trace_kvm_hv_timer_state ( apic - > vcpu - > vcpu_id , false ) ;
}
2017-06-29 17:14:50 +02:00
2017-06-29 17:14:50 +02:00
static void restart_apic_timer ( struct kvm_lapic * apic )
{
2017-07-25 00:43:15 -07:00
preempt_disable ( ) ;
2019-04-16 13:32:47 -07:00
if ( ! apic_lvtt_period ( apic ) & & atomic_read ( & apic - > lapic_timer . pending ) )
goto out ;
2017-06-29 17:14:50 +02:00
if ( ! start_hv_timer ( apic ) )
start_sw_timer ( apic ) ;
2019-04-16 13:32:47 -07:00
out :
2017-07-25 00:43:15 -07:00
preempt_enable ( ) ;
2016-06-28 14:54:19 +08:00
}
2016-10-24 18:23:13 +08:00
void kvm_lapic_expired_hv_timer ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2017-07-25 00:43:15 -07:00
preempt_disable ( ) ;
/* If the preempt notifier has already run, it also called apic_timer_expired */
if ( ! apic - > lapic_timer . hv_timer_in_use )
goto out ;
2020-04-23 22:48:37 -07:00
WARN_ON ( rcuwait_active ( & vcpu - > wait ) ) ;
2016-10-24 18:23:13 +08:00
cancel_hv_timer ( apic ) ;
2020-04-28 14:23:28 +08:00
apic_timer_expired ( apic , false ) ;
2016-10-24 18:23:13 +08:00
if ( apic_lvtt_period ( apic ) & & apic - > lapic_timer . period ) {
advance_periodic_target_expiration ( apic ) ;
2017-06-29 17:14:50 +02:00
restart_apic_timer ( apic ) ;
2016-10-24 18:23:13 +08:00
}
2017-07-25 00:43:15 -07:00
out :
preempt_enable ( ) ;
2016-10-24 18:23:13 +08:00
}
EXPORT_SYMBOL_GPL ( kvm_lapic_expired_hv_timer ) ;
2016-06-13 14:20:01 -07:00
void kvm_lapic_switch_to_hv_timer ( struct kvm_vcpu * vcpu )
{
2017-06-29 17:14:50 +02:00
restart_apic_timer ( vcpu - > arch . apic ) ;
2016-06-13 14:20:01 -07:00
}
EXPORT_SYMBOL_GPL ( kvm_lapic_switch_to_hv_timer ) ;
void kvm_lapic_switch_to_sw_timer ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2017-07-25 00:43:15 -07:00
preempt_disable ( ) ;
2016-06-13 14:20:01 -07:00
/* Possibly the TSC deadline timer is not enabled yet */
2017-06-29 17:14:50 +02:00
if ( apic - > lapic_timer . hv_timer_in_use )
start_sw_timer ( apic ) ;
2017-07-25 00:43:15 -07:00
preempt_enable ( ) ;
2017-06-29 17:14:50 +02:00
}
EXPORT_SYMBOL_GPL ( kvm_lapic_switch_to_sw_timer ) ;
2016-06-13 14:20:01 -07:00
2017-06-29 17:14:50 +02:00
void kvm_lapic_restart_hv_timer ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-06-13 14:20:01 -07:00
2017-06-29 17:14:50 +02:00
WARN_ON ( ! apic - > lapic_timer . hv_timer_in_use ) ;
restart_apic_timer ( apic ) ;
2016-06-13 14:20:01 -07:00
}
2018-10-10 15:56:53 -07:00
static void __start_apic_timer ( struct kvm_lapic * apic , u32 count_reg )
2007-09-12 10:58:04 +03:00
{
2009-02-23 10:57:41 -03:00
atomic_set ( & apic - > lapic_timer . pending , 0 ) ;
2008-02-24 14:37:50 +02:00
2017-06-29 17:14:50 +02:00
if ( ( apic_lvtt_period ( apic ) | | apic_lvtt_oneshot ( apic ) )
2018-10-10 15:56:53 -07:00
& & ! set_target_expiration ( apic , count_reg ) )
2017-06-29 17:14:50 +02:00
return ;
restart_apic_timer ( apic ) ;
2007-09-12 10:58:04 +03:00
}
2018-10-10 15:56:53 -07:00
static void start_apic_timer ( struct kvm_lapic * apic )
{
__start_apic_timer ( apic , APIC_TMICT ) ;
}
2008-10-20 10:20:03 +02:00
static void apic_manage_nmi_watchdog ( struct kvm_lapic * apic , u32 lvt0_val )
{
2015-06-30 22:19:16 +02:00
bool lvt0_in_nmi_mode = apic_lvt_nmi_mode ( lvt0_val ) ;
2008-10-20 10:20:03 +02:00
2015-06-30 22:19:16 +02:00
if ( apic - > lvt0_in_nmi_mode ! = lvt0_in_nmi_mode ) {
apic - > lvt0_in_nmi_mode = lvt0_in_nmi_mode ;
if ( lvt0_in_nmi_mode ) {
2015-07-01 15:31:49 +02:00
atomic_inc ( & apic - > vcpu - > kvm - > arch . vapics_in_nmi_mode ) ;
2015-06-30 22:19:16 +02:00
} else
atomic_dec ( & apic - > vcpu - > kvm - > arch . vapics_in_nmi_mode ) ;
}
2008-10-20 10:20:03 +02:00
}
2016-05-04 14:09:40 -05:00
int kvm_lapic_reg_write ( struct kvm_lapic * apic , u32 reg , u32 val )
2007-09-12 10:58:04 +03:00
{
2009-07-05 17:39:36 +03:00
int ret = 0 ;
2007-09-12 10:58:04 +03:00
2009-07-05 17:39:36 +03:00
trace_kvm_apic_write ( reg , val ) ;
2007-09-12 10:58:04 +03:00
2009-07-05 17:39:36 +03:00
switch ( reg ) {
2007-09-12 10:58:04 +03:00
case APIC_ID : /* Local APIC ID */
2009-07-05 17:39:36 +03:00
if ( ! apic_x2apic_mode ( apic ) )
2016-07-12 22:09:22 +02:00
kvm_apic_set_xapic_id ( apic , val > > 24 ) ;
2009-07-05 17:39:36 +03:00
else
ret = 1 ;
2007-09-12 10:58:04 +03:00
break ;
case APIC_TASKPRI :
2007-10-22 16:50:39 +02:00
report_tpr_access ( apic , true ) ;
2007-09-12 10:58:04 +03:00
apic_set_tpr ( apic , val & 0xff ) ;
break ;
case APIC_EOI :
apic_set_eoi ( apic ) ;
break ;
case APIC_LDR :
2009-07-05 17:39:36 +03:00
if ( ! apic_x2apic_mode ( apic ) )
2012-09-13 17:19:24 +03:00
kvm_apic_set_ldr ( apic , val & APIC_LDR_MASK ) ;
2009-07-05 17:39:36 +03:00
else
ret = 1 ;
2007-09-12 10:58:04 +03:00
break ;
case APIC_DFR :
2020-08-19 16:55:26 +08:00
if ( ! apic_x2apic_mode ( apic ) )
kvm_apic_set_dfr ( apic , val | 0x0FFFFFFF ) ;
else
2009-07-05 17:39:36 +03:00
ret = 1 ;
2007-09-12 10:58:04 +03:00
break ;
2009-07-05 17:39:35 +03:00
case APIC_SPIV : {
u32 mask = 0x3ff ;
2016-05-04 14:09:41 -05:00
if ( kvm_lapic_get_reg ( apic , APIC_LVR ) & APIC_LVR_DIRECTED_EOI )
2009-07-05 17:39:35 +03:00
mask | = APIC_SPIV_DIRECTED_EOI ;
2012-08-05 15:58:31 +03:00
apic_set_spiv ( apic , val & mask ) ;
2007-09-12 10:58:04 +03:00
if ( ! ( val & APIC_SPIV_APIC_ENABLED ) ) {
int i ;
u32 lvt_val ;
2016-05-04 14:09:40 -05:00
for ( i = 0 ; i < KVM_APIC_LVT_NUM ; i + + ) {
2016-05-04 14:09:41 -05:00
lvt_val = kvm_lapic_get_reg ( apic ,
2007-09-12 10:58:04 +03:00
APIC_LVTT + 0x10 * i ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LVTT + 0x10 * i ,
2007-09-12 10:58:04 +03:00
lvt_val | APIC_LVT_MASKED ) ;
}
2015-06-05 20:57:41 +02:00
apic_update_lvtt ( apic ) ;
2009-02-23 10:57:41 -03:00
atomic_set ( & apic - > lapic_timer . pending , 0 ) ;
2007-09-12 10:58:04 +03:00
}
break ;
2009-07-05 17:39:35 +03:00
}
2007-09-12 10:58:04 +03:00
case APIC_ICR :
/* No delay here, so we always clear the pending bit */
2019-09-05 14:26:27 +08:00
val & = ~ ( 1 < < 12 ) ;
2020-03-26 10:20:02 +08:00
kvm_apic_send_ipi ( apic , val , kvm_lapic_get_reg ( apic , APIC_ICR2 ) ) ;
2019-09-05 14:26:27 +08:00
kvm_lapic_set_reg ( apic , APIC_ICR , val ) ;
2007-09-12 10:58:04 +03:00
break ;
case APIC_ICR2 :
2009-07-05 17:39:36 +03:00
if ( ! apic_x2apic_mode ( apic ) )
val & = 0xff000000 ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_ICR2 , val ) ;
2007-09-12 10:58:04 +03:00
break ;
2008-09-26 09:30:52 +02:00
case APIC_LVT0 :
2008-10-20 10:20:03 +02:00
apic_manage_nmi_watchdog ( apic , val ) ;
2020-08-23 17:36:59 -05:00
fallthrough ;
2007-09-12 10:58:04 +03:00
case APIC_LVTTHMR :
case APIC_LVTPC :
case APIC_LVT1 :
2019-12-11 12:47:46 -08:00
case APIC_LVTERR : {
2007-09-12 10:58:04 +03:00
/* TODO: Check vector */
2019-12-11 12:47:46 -08:00
size_t size ;
u32 index ;
2012-08-05 15:58:33 +03:00
if ( ! kvm_apic_sw_enabled ( apic ) )
2007-09-12 10:58:04 +03:00
val | = APIC_LVT_MASKED ;
2019-12-11 12:47:46 -08:00
size = ARRAY_SIZE ( apic_lvt_mask ) ;
index = array_index_nospec (
( reg - APIC_LVTT ) > > 4 , size ) ;
val & = apic_lvt_mask [ index ] ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , reg , val ) ;
2007-09-12 10:58:04 +03:00
break ;
2019-12-11 12:47:46 -08:00
}
2007-09-12 10:58:04 +03:00
2015-06-05 20:57:41 +02:00
case APIC_LVTT :
2012-08-05 15:58:33 +03:00
if ( ! kvm_apic_sw_enabled ( apic ) )
2011-09-22 16:55:52 +08:00
val | = APIC_LVT_MASKED ;
val & = ( apic_lvt_mask [ 0 ] | apic - > lapic_timer . timer_mode_mask ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LVTT , val ) ;
2015-06-05 20:57:41 +02:00
apic_update_lvtt ( apic ) ;
2011-09-22 16:55:52 +08:00
break ;
2007-09-12 10:58:04 +03:00
case APIC_TMICT :
2011-09-22 16:55:52 +08:00
if ( apic_lvtt_tscdeadline ( apic ) )
break ;
2009-02-23 10:57:41 -03:00
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_TMICT , val ) ;
2007-09-12 10:58:04 +03:00
start_apic_timer ( apic ) ;
2009-07-05 17:39:36 +03:00
break ;
2007-09-12 10:58:04 +03:00
2017-10-06 07:38:32 -07:00
case APIC_TDCR : {
uint32_t old_divisor = apic - > divide_count ;
KVM: LAPIC: Set the TDCR settable bits
It is a little different between Intel and AMD, Intel's bit 2
is 0 and AMD is reserved. On bare-metal, Intel will refuse to set
APIC_TDCR once bits except 0, 1, 3 are setting, however, AMD will
accept bits 0, 1, 3 and ignore other bits setting as patch does.
Before the patch, we can get back anything what we set to the
APIC_TDCR, this patch improves it.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1596165141-28874-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-31 11:12:20 +08:00
kvm_lapic_set_reg ( apic , APIC_TDCR , val & 0xb ) ;
2007-09-12 10:58:04 +03:00
update_divide_count ( apic ) ;
2017-10-06 07:38:32 -07:00
if ( apic - > divide_count ! = old_divisor & &
apic - > lapic_timer . period ) {
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
update_target_expiration ( apic , old_divisor ) ;
restart_apic_timer ( apic ) ;
}
2007-09-12 10:58:04 +03:00
break ;
2017-10-06 07:38:32 -07:00
}
2009-07-05 17:39:36 +03:00
case APIC_ESR :
2019-07-06 01:08:48 +08:00
if ( apic_x2apic_mode ( apic ) & & val ! = 0 )
2009-07-05 17:39:36 +03:00
ret = 1 ;
break ;
case APIC_SELF_IPI :
if ( apic_x2apic_mode ( apic ) ) {
2020-07-21 16:23:54 +08:00
kvm_lapic_reg_write ( apic , APIC_ICR ,
APIC_DEST_SELF | ( val & APIC_VECTOR_MASK ) ) ;
2009-07-05 17:39:36 +03:00
} else
ret = 1 ;
break ;
2007-09-12 10:58:04 +03:00
default :
2009-07-05 17:39:36 +03:00
ret = 1 ;
2007-09-12 10:58:04 +03:00
break ;
}
2019-07-06 01:08:48 +08:00
2020-02-26 10:41:02 +08:00
kvm_recalculate_apic_map ( apic - > vcpu - > kvm ) ;
2009-07-05 17:39:36 +03:00
return ret ;
}
2016-05-04 14:09:40 -05:00
EXPORT_SYMBOL_GPL ( kvm_lapic_reg_write ) ;
2009-07-05 17:39:36 +03:00
2015-03-26 14:39:28 +00:00
static int apic_mmio_write ( struct kvm_vcpu * vcpu , struct kvm_io_device * this ,
2009-07-05 17:39:36 +03:00
gpa_t address , int len , const void * data )
{
struct kvm_lapic * apic = to_lapic ( this ) ;
unsigned int offset = address - apic - > base_address ;
u32 val ;
if ( ! apic_mmio_in_range ( apic , address ) )
return - EOPNOTSUPP ;
2018-08-02 17:08:16 +02:00
if ( ! kvm_apic_hw_enabled ( apic ) | | apic_x2apic_mode ( apic ) ) {
if ( ! kvm_check_has_quirk ( vcpu - > kvm ,
KVM_X86_QUIRK_LAPIC_MMIO_HOLE ) )
return - EOPNOTSUPP ;
return 0 ;
}
2009-07-05 17:39:36 +03:00
/*
* APIC register must be aligned on 128 - bits boundary .
* 32 / 64 / 128 bits registers must be accessed thru 32 bits .
* Refer SDM 8.4 .1
*/
2019-07-06 01:08:48 +08:00
if ( len ! = 4 | | ( offset & 0xf ) )
2009-07-06 11:05:39 +08:00
return 0 ;
2009-07-05 17:39:36 +03:00
val = * ( u32 * ) data ;
2019-07-06 01:08:48 +08:00
kvm_lapic_reg_write ( apic , offset & 0xff0 , val ) ;
2009-07-05 17:39:36 +03:00
2009-06-29 22:24:32 +03:00
return 0 ;
2007-09-12 10:58:04 +03:00
}
2011-08-30 13:56:17 +03:00
void kvm_lapic_set_eoi ( struct kvm_vcpu * vcpu )
{
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_write ( vcpu - > arch . apic , APIC_EOI , 0 ) ;
2011-08-30 13:56:17 +03:00
}
EXPORT_SYMBOL_GPL ( kvm_lapic_set_eoi ) ;
2013-01-25 10:18:49 +08:00
/* emulate APIC access in a trap manner */
void kvm_apic_write_nodecode ( struct kvm_vcpu * vcpu , u32 offset )
{
u32 val = 0 ;
/* hw has done the conditional check and inst decode */
offset & = 0xff0 ;
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_read ( vcpu - > arch . apic , offset , 4 , & val ) ;
2013-01-25 10:18:49 +08:00
/* TODO: optimize to just emulate side effect w/o one more write */
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_write ( vcpu - > arch . apic , offset , val ) ;
2013-01-25 10:18:49 +08:00
}
EXPORT_SYMBOL_GPL ( kvm_apic_write_nodecode ) ;
2007-10-08 10:48:30 +10:00
void kvm_free_lapic ( struct kvm_vcpu * vcpu )
2007-09-12 10:58:04 +03:00
{
2012-08-05 15:58:31 +03:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-12-13 23:50:52 +08:00
if ( ! vcpu - > arch . apic )
2007-09-12 10:58:04 +03:00
return ;
2012-08-05 15:58:31 +03:00
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
2007-09-12 10:58:04 +03:00
2012-08-05 15:58:30 +03:00
if ( ! ( vcpu - > arch . apic_base & MSR_IA32_APICBASE_ENABLE ) )
2021-01-11 23:24:35 +08:00
static_branch_slow_dec_deferred ( & apic_hw_disabled ) ;
2012-08-05 15:58:30 +03:00
2014-10-30 15:06:45 +01:00
if ( ! apic - > sw_enabled )
2021-01-11 23:24:35 +08:00
static_branch_slow_dec_deferred ( & apic_sw_disabled ) ;
2007-09-12 10:58:04 +03:00
2012-08-05 15:58:31 +03:00
if ( apic - > regs )
free_page ( ( unsigned long ) apic - > regs ) ;
kfree ( apic ) ;
2007-09-12 10:58:04 +03:00
}
/*
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
* LAPIC interface
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*/
2011-09-22 16:55:52 +08:00
u64 kvm_get_lapic_tscdeadline_msr ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2020-09-10 17:50:36 +08:00
if ( ! kvm_apic_present ( vcpu ) | | ! apic_lvtt_tscdeadline ( apic ) )
2011-09-22 16:55:52 +08:00
return 0 ;
return apic - > lapic_timer . tscdeadline ;
}
void kvm_set_lapic_tscdeadline_msr ( struct kvm_vcpu * vcpu , u64 data )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2020-09-10 17:50:37 +08:00
if ( ! kvm_apic_present ( vcpu ) | | ! apic_lvtt_tscdeadline ( apic ) )
2011-09-22 16:55:52 +08:00
return ;
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
apic - > lapic_timer . tscdeadline = data ;
start_apic_timer ( apic ) ;
}
2007-09-12 10:58:04 +03:00
void kvm_lapic_set_tpr ( struct kvm_vcpu * vcpu , unsigned long cr8 )
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-09-12 10:58:04 +03:00
2007-10-25 16:52:32 +02:00
apic_set_tpr ( apic , ( ( cr8 & 0x0f ) < < 4 )
2016-05-04 14:09:41 -05:00
| ( kvm_lapic_get_reg ( apic , APIC_TASKPRI ) & 4 ) ) ;
2007-09-12 10:58:04 +03:00
}
u64 kvm_lapic_get_cr8 ( struct kvm_vcpu * vcpu )
{
u64 tpr ;
2016-05-04 14:09:41 -05:00
tpr = ( u64 ) kvm_lapic_get_reg ( vcpu - > arch . apic , APIC_TASKPRI ) ;
2007-09-12 10:58:04 +03:00
return ( tpr & 0xf0 ) > > 4 ;
}
void kvm_lapic_set_base ( struct kvm_vcpu * vcpu , u64 value )
{
2013-01-25 10:18:50 +08:00
u64 old_value = vcpu - > arch . apic_base ;
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-09-12 10:58:04 +03:00
2016-11-09 09:50:11 -08:00
if ( ! apic )
2007-09-12 10:58:04 +03:00
value | = MSR_IA32_APICBASE_BSP ;
2009-06-09 15:56:26 +03:00
2013-12-29 02:29:30 +01:00
vcpu - > arch . apic_base = value ;
2016-11-09 09:50:11 -08:00
if ( ( old_value ^ value ) & MSR_IA32_APICBASE_ENABLE )
2020-07-09 12:34:23 +08:00
kvm_update_cpuid_runtime ( vcpu ) ;
2016-11-09 09:50:11 -08:00
if ( ! apic )
return ;
2012-08-05 15:58:30 +03:00
/* update jump label if enable bit changes */
2014-01-15 13:39:59 +01:00
if ( ( old_value ^ value ) & MSR_IA32_APICBASE_ENABLE ) {
2016-07-12 22:09:23 +02:00
if ( value & MSR_IA32_APICBASE_ENABLE ) {
kvm_apic_set_xapic_id ( apic , vcpu - > vcpu_id ) ;
2021-01-11 23:24:35 +08:00
static_branch_slow_dec_deferred ( & apic_hw_disabled ) ;
2016-08-03 12:04:13 +08:00
} else {
2021-01-11 23:24:35 +08:00
static_branch_inc ( & apic_hw_disabled . key ) ;
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2016-08-03 12:04:13 +08:00
}
2012-08-05 15:58:30 +03:00
}
2018-05-09 16:56:05 -04:00
if ( ( ( old_value ^ value ) & X2APIC_ENABLE ) & & ( value & X2APIC_ENABLE ) )
kvm_apic_set_x2apic_id ( apic , vcpu - > vcpu_id ) ;
if ( ( old_value ^ value ) & ( MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE ) )
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_set_virtual_apic_mode ) ( vcpu ) ;
2013-01-25 10:18:50 +08:00
2007-12-13 23:50:52 +08:00
apic - > base_address = apic - > vcpu - > arch . apic_base &
2007-09-12 10:58:04 +03:00
MSR_IA32_APICBASE_BASE ;
2014-11-02 11:54:59 +02:00
if ( ( value & MSR_IA32_APICBASE_ENABLE ) & &
apic - > base_address ! = APIC_DEFAULT_PHYS_BASE )
pr_warn_once ( " APIC base relocation is unsupported by KVM " ) ;
2007-09-12 10:58:04 +03:00
}
2019-11-14 14:15:04 -06:00
void kvm_apic_update_apicv ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
if ( vcpu - > arch . apicv_active ) {
/* irr_pending is always true when apicv is activated. */
apic - > irr_pending = true ;
apic - > isr_count = 1 ;
} else {
apic - > irr_pending = ( apic_search_irr ( apic ) ! = - 1 ) ;
apic - > isr_count = count_vectors ( apic - > regs + APIC_ISR ) ;
}
}
EXPORT_SYMBOL_GPL ( kvm_apic_update_apicv ) ;
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 14:34:08 +03:00
void kvm_lapic_reset ( struct kvm_vcpu * vcpu , bool init_event )
2007-09-12 10:58:04 +03:00
{
2018-03-01 15:24:25 +01:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-09-12 10:58:04 +03:00
int i ;
2018-03-01 15:24:25 +01:00
if ( ! apic )
return ;
2007-09-12 10:58:04 +03:00
/* Stop the timer in case it's a reset to an active apic */
2009-02-23 10:57:41 -03:00
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
2007-09-12 10:58:04 +03:00
2016-07-12 22:09:25 +02:00
if ( ! init_event ) {
kvm_lapic_set_base ( vcpu , APIC_DEFAULT_PHYS_BASE |
MSR_IA32_APICBASE_ENABLE ) ;
2016-07-12 22:09:22 +02:00
kvm_apic_set_xapic_id ( apic , vcpu - > vcpu_id ) ;
2016-07-12 22:09:25 +02:00
}
2009-07-05 17:39:35 +03:00
kvm_apic_set_version ( apic - > vcpu ) ;
2007-09-12 10:58:04 +03:00
2016-05-04 14:09:40 -05:00
for ( i = 0 ; i < KVM_APIC_LVT_NUM ; i + + )
kvm_lapic_set_reg ( apic , APIC_LVTT + 0x10 * i , APIC_LVT_MASKED ) ;
2015-06-05 20:57:41 +02:00
apic_update_lvtt ( apic ) ;
2017-05-20 13:24:32 +02:00
if ( kvm_vcpu_is_reset_bsp ( vcpu ) & &
kvm_check_has_quirk ( vcpu - > kvm , KVM_X86_QUIRK_LINT0_REENABLED ) )
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_LVT0 ,
2015-04-13 01:53:41 +03:00
SET_APIC_DELIVERY_MODE ( 0 , APIC_MODE_EXTINT ) ) ;
2016-05-04 14:09:41 -05:00
apic_manage_nmi_watchdog ( apic , kvm_lapic_get_reg ( apic , APIC_LVT0 ) ) ;
2007-09-12 10:58:04 +03:00
2020-08-19 16:55:26 +08:00
kvm_apic_set_dfr ( apic , 0xffffffffU ) ;
2012-08-05 15:58:31 +03:00
apic_set_spiv ( apic , 0xff ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_TASKPRI , 0 ) ;
2015-05-22 19:22:10 +02:00
if ( ! apic_x2apic_mode ( apic ) )
kvm_apic_set_ldr ( apic , 0 ) ;
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_ESR , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_ICR , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_ICR2 , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_TDCR , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_TMICT , 0 ) ;
2007-09-12 10:58:04 +03:00
for ( i = 0 ; i < 8 ; i + + ) {
2016-05-04 14:09:40 -05:00
kvm_lapic_set_reg ( apic , APIC_IRR + 0x10 * i , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_ISR + 0x10 * i , 0 ) ;
kvm_lapic_set_reg ( apic , APIC_TMR + 0x10 * i , 0 ) ;
2007-09-12 10:58:04 +03:00
}
2019-11-14 14:15:04 -06:00
kvm_apic_update_apicv ( vcpu ) ;
2012-06-24 19:24:26 +03:00
apic - > highest_isr_cache = - 1 ;
2007-10-21 08:54:53 +02:00
update_divide_count ( apic ) ;
2009-02-23 10:57:41 -03:00
atomic_set ( & apic - > lapic_timer . pending , 0 ) ;
2009-06-09 15:56:26 +03:00
if ( kvm_vcpu_is_bsp ( vcpu ) )
2012-08-05 15:58:27 +03:00
kvm_lapic_set_base ( vcpu ,
vcpu - > arch . apic_base | MSR_IA32_APICBASE_BSP ) ;
2012-06-24 19:25:07 +03:00
vcpu - > arch . pv_eoi . msr_val = 0 ;
2007-09-12 10:58:04 +03:00
apic_update_ppr ( apic ) ;
2017-10-25 16:43:27 +02:00
if ( vcpu - > arch . apicv_active ) {
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_apicv_post_state_restore ) ( vcpu ) ;
static_call ( kvm_x86_hwapic_irr_update ) ( vcpu , - 1 ) ;
static_call ( kvm_x86_hwapic_isr_update ) ( vcpu , - 1 ) ;
2017-10-25 16:43:27 +02:00
}
2007-09-12 10:58:04 +03:00
2009-03-05 16:34:59 +02:00
vcpu - > arch . apic_arb_prio = 0 ;
2012-04-19 14:06:29 +03:00
vcpu - > arch . apic_attention = 0 ;
2020-02-26 10:41:02 +08:00
kvm_recalculate_apic_map ( vcpu - > kvm ) ;
2007-09-12 10:58:04 +03:00
}
/*
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
* timer interface
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*/
2007-09-03 16:56:58 +03:00
2012-07-26 18:01:51 +03:00
static bool lapic_is_periodic ( struct kvm_lapic * apic )
2007-09-12 10:58:04 +03:00
{
2009-02-23 10:57:41 -03:00
return apic_lvtt_period ( apic ) ;
2007-09-12 10:58:04 +03:00
}
2008-04-11 14:53:26 -03:00
int apic_has_pending_timer ( struct kvm_vcpu * vcpu )
{
2012-08-05 15:58:32 +03:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2008-04-11 14:53:26 -03:00
2016-01-08 13:41:16 +01:00
if ( apic_enabled ( apic ) & & apic_lvt_enabled ( apic , APIC_LVTT ) )
2012-08-05 15:58:32 +03:00
return atomic_read ( & apic - > lapic_timer . pending ) ;
2008-04-11 14:53:26 -03:00
return 0 ;
}
2011-11-10 14:57:21 +02:00
int kvm_apic_local_deliver ( struct kvm_lapic * apic , int lvt_type )
2007-09-03 16:56:58 +03:00
{
2016-05-04 14:09:41 -05:00
u32 reg = kvm_lapic_get_reg ( apic , lvt_type ) ;
2008-09-26 09:30:52 +02:00
int vector , mode , trig_mode ;
2012-08-05 15:58:33 +03:00
if ( kvm_apic_hw_enabled ( apic ) & & ! ( reg & APIC_LVT_MASKED ) ) {
2008-09-26 09:30:52 +02:00
vector = reg & APIC_VECTOR_MASK ;
mode = reg & APIC_MODE_MASK ;
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER ;
2013-04-11 19:21:37 +08:00
return __apic_accept_irq ( apic , mode , vector , 1 , trig_mode ,
NULL ) ;
2008-09-26 09:30:52 +02:00
}
return 0 ;
}
2007-09-03 16:56:58 +03:00
2008-10-20 10:20:02 +02:00
void kvm_apic_nmi_wd_deliver ( struct kvm_vcpu * vcpu )
2008-09-26 09:30:52 +02:00
{
2008-10-20 10:20:02 +02:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
if ( apic )
kvm_apic_local_deliver ( apic , APIC_LVT0 ) ;
2007-09-03 16:56:58 +03:00
}
2009-06-01 12:54:50 -04:00
static const struct kvm_io_device_ops apic_mmio_ops = {
. read = apic_mmio_read ,
. write = apic_mmio_write ,
} ;
2012-07-26 18:01:50 +03:00
static enum hrtimer_restart apic_timer_fn ( struct hrtimer * data )
{
struct kvm_timer * ktimer = container_of ( data , struct kvm_timer , timer ) ;
2012-07-26 18:01:51 +03:00
struct kvm_lapic * apic = container_of ( ktimer , struct kvm_lapic , lapic_timer ) ;
2012-07-26 18:01:50 +03:00
2020-04-28 14:23:28 +08:00
apic_timer_expired ( apic , true ) ;
2012-07-26 18:01:50 +03:00
2012-07-26 18:01:51 +03:00
if ( lapic_is_periodic ( apic ) ) {
2016-10-24 18:23:13 +08:00
advance_periodic_target_expiration ( apic ) ;
2012-07-26 18:01:50 +03:00
hrtimer_add_expires_ns ( & ktimer - > timer , ktimer - > period ) ;
return HRTIMER_RESTART ;
} else
return HRTIMER_NORESTART ;
}
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:33 -07:00
int kvm_create_lapic ( struct kvm_vcpu * vcpu , int timer_advance_ns )
2007-09-12 10:58:04 +03:00
{
struct kvm_lapic * apic ;
ASSERT ( vcpu ! = NULL ) ;
2019-02-11 11:02:50 -08:00
apic = kzalloc ( sizeof ( * apic ) , GFP_KERNEL_ACCOUNT ) ;
2007-09-12 10:58:04 +03:00
if ( ! apic )
goto nomem ;
2007-12-13 23:50:52 +08:00
vcpu - > arch . apic = apic ;
2007-09-12 10:58:04 +03:00
2019-02-11 11:02:50 -08:00
apic - > regs = ( void * ) get_zeroed_page ( GFP_KERNEL_ACCOUNT ) ;
2011-03-05 12:40:20 +09:00
if ( ! apic - > regs ) {
2007-09-12 10:58:04 +03:00
printk ( KERN_ERR " malloc apic regs error for vcpu %x \n " ,
vcpu - > vcpu_id ) ;
2007-10-08 10:48:30 +10:00
goto nomem_free_apic ;
2007-09-12 10:58:04 +03:00
}
apic - > vcpu = vcpu ;
2009-02-23 10:57:41 -03:00
hrtimer_init ( & apic - > lapic_timer . timer , CLOCK_MONOTONIC ,
2019-07-26 20:30:55 +02:00
HRTIMER_MODE_ABS_HARD ) ;
2012-07-26 18:01:50 +03:00
apic - > lapic_timer . timer . function = apic_timer_fn ;
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:33 -07:00
if ( timer_advance_ns = = - 1 ) {
2019-09-26 08:54:03 +08:00
apic - > lapic_timer . timer_advance_ns = LAPIC_TIMER_ADVANCE_NS_INIT ;
2019-09-17 16:16:26 +08:00
lapic_timer_advance_dynamic = true ;
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:33 -07:00
} else {
apic - > lapic_timer . timer_advance_ns = timer_advance_ns ;
2019-09-17 16:16:26 +08:00
lapic_timer_advance_dynamic = false ;
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-17 10:15:33 -07:00
}
2012-08-05 15:58:30 +03:00
/*
* APIC is created enabled . This will prevent kvm_lapic_set_base from
2019-03-31 19:17:22 -07:00
* thinking that APIC state has changed .
2012-08-05 15:58:30 +03:00
*/
vcpu - > arch . apic_base = MSR_IA32_APICBASE_ENABLE ;
2021-01-11 23:24:35 +08:00
static_branch_inc ( & apic_sw_disabled . key ) ; /* sw disabled at reset */
2009-06-01 12:54:50 -04:00
kvm_iodevice_init ( & apic - > dev , & apic_mmio_ops ) ;
2007-09-12 10:58:04 +03:00
return 0 ;
2007-10-08 10:48:30 +10:00
nomem_free_apic :
kfree ( apic ) ;
2019-05-06 11:29:16 +03:00
vcpu - > arch . apic = NULL ;
2007-09-12 10:58:04 +03:00
nomem :
return - ENOMEM ;
}
int kvm_apic_has_interrupt ( struct kvm_vcpu * vcpu )
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-12-18 21:47:54 +01:00
u32 ppr ;
2007-09-12 10:58:04 +03:00
2020-11-27 08:53:52 +01:00
if ( ! kvm_apic_present ( vcpu ) )
2007-09-12 10:58:04 +03:00
return - 1 ;
2016-12-18 21:47:54 +01:00
__apic_update_ppr ( apic , & ppr ) ;
return apic_has_interrupt_for_ppr ( apic , ppr ) ;
2007-09-12 10:58:04 +03:00
}
KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PI
On successful nested VM-Enter, check for pending interrupts and convert
the highest priority interrupt to a pending posted interrupt if it
matches L2's notification vector. If the vCPU receives a notification
interrupt before nested VM-Enter (assuming L1 disables IRQs before doing
VM-Enter), the pending interrupt (for L1) should be recognized and
processed as a posted interrupt when interrupts become unblocked after
VM-Enter to L2.
This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is
trying to inject an interrupt into L2 by setting the appropriate bit in
L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's
method of manually moving the vector from PIR->vIRR/RVI). KVM will
observe the IPI while the vCPU is in L1 context and so won't immediately
morph it to a posted interrupt for L2. The pending interrupt will be
seen by vmx_check_nested_events(), cause KVM to force an immediate exit
after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit.
After handling the VM-Exit, L1 will see that L2 has a pending interrupt
in PIR, send another IPI, and repeat until L2 is killed.
Note, posted interrupts require virtual interrupt deliveriy, and virtual
interrupt delivery requires exit-on-interrupt, ergo interrupts will be
unconditionally unmasked on VM-Enter if posted interrupts are enabled.
Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200812175129.12172-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-12 10:51:29 -07:00
EXPORT_SYMBOL_GPL ( kvm_apic_has_interrupt ) ;
2007-09-12 10:58:04 +03:00
2007-09-17 14:47:13 +08:00
int kvm_apic_accept_pic_intr ( struct kvm_vcpu * vcpu )
{
2016-05-04 14:09:41 -05:00
u32 lvt0 = kvm_lapic_get_reg ( vcpu - > arch . apic , APIC_LVT0 ) ;
2007-09-17 14:47:13 +08:00
2012-08-05 15:58:33 +03:00
if ( ! kvm_apic_hw_enabled ( vcpu - > arch . apic ) )
2020-01-18 10:50:37 +08:00
return 1 ;
2010-06-16 17:11:12 -04:00
if ( ( lvt0 & APIC_LVT_MASKED ) = = 0 & &
GET_APIC_DELIVERY_MODE ( lvt0 ) = = APIC_MODE_EXTINT )
2020-01-18 10:50:37 +08:00
return 1 ;
return 0 ;
2007-09-17 14:47:13 +08:00
}
2007-09-03 16:56:58 +03:00
void kvm_inject_apic_timer_irqs ( struct kvm_vcpu * vcpu )
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-09-03 16:56:58 +03:00
2012-08-05 15:58:32 +03:00
if ( atomic_read ( & apic - > lapic_timer . pending ) > 0 ) {
2019-07-06 09:26:51 +08:00
kvm_apic_inject_pending_timer_irqs ( apic ) ;
2013-04-28 14:00:41 +02:00
atomic_set ( & apic - > lapic_timer . pending , 0 ) ;
2007-09-03 16:56:58 +03:00
}
}
2007-09-12 10:58:04 +03:00
int kvm_get_apic_interrupt ( struct kvm_vcpu * vcpu )
{
int vector = kvm_apic_has_interrupt ( vcpu ) ;
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-12-18 21:43:41 +01:00
u32 ppr ;
2007-09-12 10:58:04 +03:00
if ( vector = = - 1 )
return - 1 ;
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-05 12:42:24 +08:00
/*
* We get here even with APIC virtualization enabled , if doing
* nested virtualization and L1 runs with the " acknowledge interrupt
* on exit " mode. Then we cannot inject the interrupt via RVI,
* because the process would deliver it through the IDT .
*/
2007-09-12 10:58:04 +03:00
apic_clear_irr ( vector , apic ) ;
2021-01-26 14:48:12 +01:00
if ( to_hv_vcpu ( vcpu ) & & test_bit ( vector , to_hv_synic ( vcpu ) - > auto_eoi_bitmap ) ) {
2016-12-18 21:43:41 +01:00
/*
* For auto - EOI interrupts , there might be another pending
* interrupt above PPR , so check whether to raise another
* KVM_REQ_EVENT .
*/
2015-11-10 15:36:34 +03:00
apic_update_ppr ( apic ) ;
2016-12-18 21:43:41 +01:00
} else {
/*
* For normal interrupts , PPR has been raised and there cannot
* be a higher - priority pending interrupt - - - except if there was
* a concurrent interrupt injection , but that would have
* triggered KVM_REQ_EVENT already .
*/
apic_set_isr ( vector , apic ) ;
__apic_update_ppr ( apic , & ppr ) ;
2015-11-10 15:36:34 +03:00
}
2007-09-12 10:58:04 +03:00
return vector ;
}
2007-09-06 12:22:56 +03:00
2016-07-12 22:09:22 +02:00
static int kvm_apic_state_fixup ( struct kvm_vcpu * vcpu ,
struct kvm_lapic_state * s , bool set )
{
if ( apic_x2apic_mode ( vcpu - > arch . apic ) ) {
u32 * id = ( u32 * ) ( s - > regs + APIC_ID ) ;
2017-11-17 11:52:50 +00:00
u32 * ldr = ( u32 * ) ( s - > regs + APIC_LDR ) ;
2016-07-12 22:09:22 +02:00
2016-07-12 22:09:27 +02:00
if ( vcpu - > kvm - > arch . x2apic_format ) {
if ( * id ! = vcpu - > vcpu_id )
return - EINVAL ;
} else {
if ( set )
* id > > = 24 ;
else
* id < < = 24 ;
}
2017-11-17 11:52:50 +00:00
/* In x2APIC mode, the LDR is fixed and based on the id */
if ( set )
* ldr = kvm_apic_calc_x2apic_ldr ( * id ) ;
2016-07-12 22:09:22 +02:00
}
return 0 ;
}
int kvm_apic_get_state ( struct kvm_vcpu * vcpu , struct kvm_lapic_state * s )
{
memcpy ( s - > regs , vcpu - > arch . apic - > regs , sizeof ( * s ) ) ;
2018-10-10 15:56:53 -07:00
/*
* Get calculated timer current count for remaining timer period ( if
* any ) and store it in the returned register set .
*/
__kvm_lapic_set_reg ( s - > regs , APIC_TMCCT ,
__apic_read ( vcpu - > arch . apic , APIC_TMCCT ) ) ;
2016-07-12 22:09:22 +02:00
return kvm_apic_state_fixup ( vcpu , s , false ) ;
}
int kvm_apic_set_state ( struct kvm_vcpu * vcpu , struct kvm_lapic_state * s )
2007-09-06 12:22:56 +03:00
{
2007-12-13 23:50:52 +08:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-07-12 22:09:22 +02:00
int r ;
2012-08-05 15:58:27 +03:00
kvm_lapic_set_base ( vcpu , vcpu - > arch . apic_base ) ;
2012-08-08 15:24:36 +03:00
/* set SPIV separately to get count of SW disabled APICs right */
apic_set_spiv ( apic , * ( ( u32 * ) ( s - > regs + APIC_SPIV ) ) ) ;
2016-07-12 22:09:22 +02:00
r = kvm_apic_state_fixup ( vcpu , s , true ) ;
2020-02-26 10:41:02 +08:00
if ( r ) {
kvm_recalculate_apic_map ( vcpu - > kvm ) ;
2016-07-12 22:09:22 +02:00
return r ;
2020-02-26 10:41:02 +08:00
}
2018-10-28 12:58:28 +00:00
memcpy ( vcpu - > arch . apic - > regs , s - > regs , sizeof ( * s ) ) ;
2016-07-12 22:09:22 +02:00
2020-06-22 16:37:42 +02:00
atomic_set_release ( & apic - > vcpu - > kvm - > arch . apic_map_dirty , DIRTY ) ;
2020-02-26 10:41:02 +08:00
kvm_recalculate_apic_map ( vcpu - > kvm ) ;
2009-07-05 17:39:35 +03:00
kvm_apic_set_version ( vcpu ) ;
2007-09-06 12:22:56 +03:00
apic_update_ppr ( apic ) ;
2009-02-23 10:57:41 -03:00
hrtimer_cancel ( & apic - > lapic_timer . timer ) ;
2015-06-05 20:57:41 +02:00
apic_update_lvtt ( apic ) ;
2016-05-04 14:09:41 -05:00
apic_manage_nmi_watchdog ( apic , kvm_lapic_get_reg ( apic , APIC_LVT0 ) ) ;
2007-09-06 12:22:56 +03:00
update_divide_count ( apic ) ;
2018-10-10 15:56:53 -07:00
__start_apic_timer ( apic , APIC_TMCCT ) ;
2019-11-14 14:15:04 -06:00
kvm_apic_update_apicv ( vcpu ) ;
2012-06-24 19:24:26 +03:00
apic - > highest_isr_cache = - 1 ;
2015-11-10 15:36:33 +03:00
if ( vcpu - > arch . apicv_active ) {
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_apicv_post_state_restore ) ( vcpu ) ;
static_call ( kvm_x86_hwapic_irr_update ) ( vcpu ,
2014-11-05 10:53:43 +08:00
apic_find_highest_irr ( apic ) ) ;
2021-01-14 22:27:56 -05:00
static_call ( kvm_x86_hwapic_isr_update ) ( vcpu ,
2014-12-22 10:32:57 +01:00
apic_find_highest_isr ( apic ) ) ;
2015-11-10 15:36:33 +03:00
}
2010-07-27 12:30:24 +03:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2015-07-29 23:21:40 -07:00
if ( ioapic_in_kernel ( vcpu - > kvm ) )
kvm_rtc_eoi_tracking_restore_one ( vcpu ) ;
2015-10-30 15:48:20 +01:00
vcpu - > arch . apic_arb_prio = 0 ;
2016-07-12 22:09:22 +02:00
return 0 ;
2007-09-06 12:22:56 +03:00
}
2007-09-03 16:15:12 +03:00
2008-01-16 12:49:30 +02:00
void __kvm_migrate_apic_timer ( struct kvm_vcpu * vcpu )
2007-09-03 16:15:12 +03:00
{
struct hrtimer * timer ;
2019-07-06 09:26:51 +08:00
if ( ! lapic_in_kernel ( vcpu ) | |
kvm_can_post_timer_interrupt ( vcpu ) )
2007-09-03 16:15:12 +03:00
return ;
2012-08-05 15:58:32 +03:00
timer = & vcpu - > arch . apic - > lapic_timer . timer ;
2007-09-03 16:15:12 +03:00
if ( hrtimer_cancel ( timer ) )
2019-07-26 20:30:55 +02:00
hrtimer_start_expires ( timer , HRTIMER_MODE_ABS_HARD ) ;
2007-09-03 16:15:12 +03:00
}
2007-10-25 16:52:32 +02:00
2012-06-24 19:25:07 +03:00
/*
* apic_sync_pv_eoi_from_guest - called on vmexit or cancel interrupt
*
* Detect whether guest triggered PV EOI since the
* last entry . If yes , set EOI on guests ' s behalf .
* Clear PV EOI in guest memory in any case .
*/
static void apic_sync_pv_eoi_from_guest ( struct kvm_vcpu * vcpu ,
struct kvm_lapic * apic )
{
bool pending ;
int vector ;
/*
* PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
* and KVM_PV_EOI_ENABLED in guest memory as follows :
*
* KVM_APIC_PV_EOI_PENDING is unset :
* - > host disabled PV EOI .
* KVM_APIC_PV_EOI_PENDING is set , KVM_PV_EOI_ENABLED is set :
* - > host enabled PV EOI , guest did not execute EOI yet .
* KVM_APIC_PV_EOI_PENDING is set , KVM_PV_EOI_ENABLED is unset :
* - > host enabled PV EOI , guest executed EOI .
*/
BUG_ON ( ! pv_eoi_enabled ( vcpu ) ) ;
pending = pv_eoi_get_pending ( vcpu ) ;
/*
* Clear pending bit in any case : it will be set again on vmentry .
* While this might not be ideal from performance point of view ,
* this makes sure pv eoi is only enabled when we know it ' s safe .
*/
pv_eoi_clr_pending ( vcpu ) ;
if ( pending )
return ;
vector = apic_set_eoi ( apic ) ;
trace_kvm_pv_eoi ( apic , vector ) ;
}
2007-10-25 16:52:32 +02:00
void kvm_lapic_sync_from_vapic ( struct kvm_vcpu * vcpu )
{
u32 data ;
2012-06-24 19:25:07 +03:00
if ( test_bit ( KVM_APIC_PV_EOI_PENDING , & vcpu - > arch . apic_attention ) )
apic_sync_pv_eoi_from_guest ( vcpu , vcpu - > arch . apic ) ;
2012-04-19 14:06:29 +03:00
if ( ! test_bit ( KVM_APIC_CHECK_VAPIC , & vcpu - > arch . apic_attention ) )
2007-10-25 16:52:32 +02:00
return ;
2017-05-02 16:20:18 +02:00
if ( kvm_read_guest_cached ( vcpu - > kvm , & vcpu - > arch . apic - > vapic_cache , & data ,
sizeof ( u32 ) ) )
2015-08-05 10:44:40 -04:00
return ;
2007-10-25 16:52:32 +02:00
apic_set_tpr ( vcpu - > arch . apic , data & 0xff ) ;
}
2012-06-24 19:25:07 +03:00
/*
* apic_sync_pv_eoi_to_guest - called before vmentry
*
* Detect whether it ' s safe to enable PV EOI and
* if yes do so .
*/
static void apic_sync_pv_eoi_to_guest ( struct kvm_vcpu * vcpu ,
struct kvm_lapic * apic )
{
if ( ! pv_eoi_enabled ( vcpu ) | |
/* IRR set or many bits in ISR: could be nested. */
apic - > irr_pending | |
/* Cache not set: could be safe but we don't bother. */
apic - > highest_isr_cache = = - 1 | |
/* Need EOI to update ioapic. */
2015-07-29 10:43:18 +02:00
kvm_ioapic_handles_vector ( apic , apic - > highest_isr_cache ) ) {
2012-06-24 19:25:07 +03:00
/*
* PV EOI was disabled by apic_sync_pv_eoi_from_guest
* so we need not do anything here .
*/
return ;
}
pv_eoi_set_pending ( apic - > vcpu ) ;
}
2007-10-25 16:52:32 +02:00
void kvm_lapic_sync_to_vapic ( struct kvm_vcpu * vcpu )
{
u32 data , tpr ;
int max_irr , max_isr ;
2012-06-24 19:25:07 +03:00
struct kvm_lapic * apic = vcpu - > arch . apic ;
2007-10-25 16:52:32 +02:00
2012-06-24 19:25:07 +03:00
apic_sync_pv_eoi_to_guest ( vcpu , apic ) ;
2012-04-19 14:06:29 +03:00
if ( ! test_bit ( KVM_APIC_CHECK_VAPIC , & vcpu - > arch . apic_attention ) )
2007-10-25 16:52:32 +02:00
return ;
2016-05-04 14:09:41 -05:00
tpr = kvm_lapic_get_reg ( apic , APIC_TASKPRI ) & 0xff ;
2007-10-25 16:52:32 +02:00
max_irr = apic_find_highest_irr ( apic ) ;
if ( max_irr < 0 )
max_irr = 0 ;
max_isr = apic_find_highest_isr ( apic ) ;
if ( max_isr < 0 )
max_isr = 0 ;
data = ( tpr & 0xff ) | ( ( max_isr & 0xf0 ) < < 8 ) | ( max_irr < < 24 ) ;
2017-05-02 16:20:18 +02:00
kvm_write_guest_cached ( vcpu - > kvm , & vcpu - > arch . apic - > vapic_cache , & data ,
sizeof ( u32 ) ) ;
2007-10-25 16:52:32 +02:00
}
2013-11-20 10:23:22 -08:00
int kvm_lapic_set_vapic_addr ( struct kvm_vcpu * vcpu , gpa_t vapic_addr )
2007-10-25 16:52:32 +02:00
{
2013-11-20 10:23:22 -08:00
if ( vapic_addr ) {
2017-05-02 16:20:18 +02:00
if ( kvm_gfn_to_hva_cache_init ( vcpu - > kvm ,
2013-11-20 10:23:22 -08:00
& vcpu - > arch . apic - > vapic_cache ,
vapic_addr , sizeof ( u32 ) ) )
return - EINVAL ;
2012-04-19 14:06:29 +03:00
__set_bit ( KVM_APIC_CHECK_VAPIC , & vcpu - > arch . apic_attention ) ;
2013-11-20 10:23:22 -08:00
} else {
2012-04-19 14:06:29 +03:00
__clear_bit ( KVM_APIC_CHECK_VAPIC , & vcpu - > arch . apic_attention ) ;
2013-11-20 10:23:22 -08:00
}
vcpu - > arch . apic - > vapic_addr = vapic_addr ;
return 0 ;
2007-10-25 16:52:32 +02:00
}
2009-07-05 17:39:36 +03:00
int kvm_x2apic_msr_write ( struct kvm_vcpu * vcpu , u32 msr , u64 data )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
u32 reg = ( msr - APIC_BASE_MSR ) < < 4 ;
2015-07-29 12:05:37 +02:00
if ( ! lapic_in_kernel ( vcpu ) | | ! apic_x2apic_mode ( apic ) )
2009-07-05 17:39:36 +03:00
return 1 ;
2014-11-26 17:56:25 +02:00
if ( reg = = APIC_ICR2 )
return 1 ;
2009-07-05 17:39:36 +03:00
/* if this is ICR write vector before command */
2014-11-26 17:07:05 +01:00
if ( reg = = APIC_ICR )
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_write ( apic , APIC_ICR2 , ( u32 ) ( data > > 32 ) ) ;
return kvm_lapic_reg_write ( apic , reg , ( u32 ) data ) ;
2009-07-05 17:39:36 +03:00
}
int kvm_x2apic_msr_read ( struct kvm_vcpu * vcpu , u32 msr , u64 * data )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
u32 reg = ( msr - APIC_BASE_MSR ) < < 4 , low , high = 0 ;
2015-07-29 12:05:37 +02:00
if ( ! lapic_in_kernel ( vcpu ) | | ! apic_x2apic_mode ( apic ) )
2009-07-05 17:39:36 +03:00
return 1 ;
2019-07-06 01:08:48 +08:00
if ( reg = = APIC_DFR | | reg = = APIC_ICR2 )
2014-11-26 17:56:25 +02:00
return 1 ;
2016-05-04 14:09:40 -05:00
if ( kvm_lapic_reg_read ( apic , reg , 4 , & low ) )
2009-07-05 17:39:36 +03:00
return 1 ;
2014-11-26 17:07:05 +01:00
if ( reg = = APIC_ICR )
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_read ( apic , APIC_ICR2 , 4 , & high ) ;
2009-07-05 17:39:36 +03:00
* data = ( ( ( u64 ) high ) < < 32 ) | low ;
return 0 ;
}
2010-01-17 15:51:23 +02:00
int kvm_hv_vapic_msr_write ( struct kvm_vcpu * vcpu , u32 reg , u64 data )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2016-01-08 13:48:51 +01:00
if ( ! lapic_in_kernel ( vcpu ) )
2010-01-17 15:51:23 +02:00
return 1 ;
/* if this is ICR write vector before command */
if ( reg = = APIC_ICR )
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_write ( apic , APIC_ICR2 , ( u32 ) ( data > > 32 ) ) ;
return kvm_lapic_reg_write ( apic , reg , ( u32 ) data ) ;
2010-01-17 15:51:23 +02:00
}
int kvm_hv_vapic_msr_read ( struct kvm_vcpu * vcpu , u32 reg , u64 * data )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
u32 low , high = 0 ;
2016-01-08 13:48:51 +01:00
if ( ! lapic_in_kernel ( vcpu ) )
2010-01-17 15:51:23 +02:00
return 1 ;
2016-05-04 14:09:40 -05:00
if ( kvm_lapic_reg_read ( apic , reg , 4 , & low ) )
2010-01-17 15:51:23 +02:00
return 1 ;
if ( reg = = APIC_ICR )
2016-05-04 14:09:40 -05:00
kvm_lapic_reg_read ( apic , APIC_ICR2 , 4 , & high ) ;
2010-01-17 15:51:23 +02:00
* data = ( ( ( u64 ) high ) < < 32 ) | low ;
return 0 ;
}
2012-06-24 19:25:07 +03:00
2018-10-16 18:49:59 +02:00
int kvm_lapic_enable_pv_eoi ( struct kvm_vcpu * vcpu , u64 data , unsigned long len )
2012-06-24 19:25:07 +03:00
{
u64 addr = data & ~ KVM_MSR_ENABLED ;
2018-10-16 18:50:06 +02:00
struct gfn_to_hva_cache * ghc = & vcpu - > arch . pv_eoi . data ;
unsigned long new_len ;
2012-06-24 19:25:07 +03:00
if ( ! IS_ALIGNED ( addr , 4 ) )
return 1 ;
vcpu - > arch . pv_eoi . msr_val = data ;
if ( ! pv_eoi_enabled ( vcpu ) )
return 0 ;
2018-10-16 18:50:06 +02:00
if ( addr = = ghc - > gpa & & len < = ghc - > len )
new_len = ghc - > len ;
else
new_len = len ;
return kvm_gfn_to_hva_cache_init ( vcpu - > kvm , ghc , addr , new_len ) ;
2012-06-24 19:25:07 +03:00
}
2012-08-05 15:58:30 +03:00
2013-03-13 12:42:34 +01:00
void kvm_apic_accept_events ( struct kvm_vcpu * vcpu )
{
struct kvm_lapic * apic = vcpu - > arch . apic ;
2014-11-24 14:35:24 +01:00
u8 sipi_vector ;
2020-11-05 11:20:49 -05:00
int r ;
2013-06-03 11:30:02 +03:00
unsigned long pe ;
2013-03-13 12:42:34 +01:00
2020-11-05 11:20:49 -05:00
if ( ! lapic_in_kernel ( vcpu ) )
return ;
/*
* Read pending events before calling the check_events
* callback .
*/
pe = smp_load_acquire ( & apic - > pending_events ) ;
if ( ! pe )
2013-03-13 12:42:34 +01:00
return ;
2020-11-05 11:20:49 -05:00
if ( is_guest_mode ( vcpu ) ) {
r = kvm_x86_ops . nested_ops - > check_events ( vcpu ) ;
if ( r < 0 )
return ;
/*
* If an event has happened and caused a vmexit ,
* we know INITs are latched and therefore
* we will not incorrectly deliver an APIC
* event instead of a vmexit .
*/
}
2015-06-04 10:41:21 +02:00
/*
2019-08-26 13:24:49 +03:00
* INITs are latched while CPU is in specific states
2020-11-05 11:20:49 -05:00
* ( SMM , VMX root mode , SVM with GIF = 0 ) .
2019-08-26 13:24:49 +03:00
* Because a CPU cannot be in these states immediately
* after it has processed an INIT signal ( and thus in
* KVM_MP_STATE_INIT_RECEIVED state ) , just eat SIPIs
* and leave the INIT pending .
2015-06-04 10:41:21 +02:00
*/
2019-11-11 11:16:40 +02:00
if ( kvm_vcpu_latch_init ( vcpu ) ) {
2015-06-04 10:41:21 +02:00
WARN_ON_ONCE ( vcpu - > arch . mp_state = = KVM_MP_STATE_INIT_RECEIVED ) ;
2020-11-05 11:20:49 -05:00
if ( test_bit ( KVM_APIC_SIPI , & pe ) )
2015-06-04 10:41:21 +02:00
clear_bit ( KVM_APIC_SIPI , & apic - > pending_events ) ;
return ;
}
2013-06-03 11:30:02 +03:00
if ( test_bit ( KVM_APIC_INIT , & pe ) ) {
2020-11-05 11:20:49 -05:00
clear_bit ( KVM_APIC_INIT , & apic - > pending_events ) ;
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 14:34:08 +03:00
kvm_vcpu_reset ( vcpu , true ) ;
2013-03-13 12:42:34 +01:00
if ( kvm_vcpu_is_bsp ( apic - > vcpu ) )
vcpu - > arch . mp_state = KVM_MP_STATE_RUNNABLE ;
else
vcpu - > arch . mp_state = KVM_MP_STATE_INIT_RECEIVED ;
}
2020-12-03 16:33:19 +02:00
if ( test_bit ( KVM_APIC_SIPI , & pe ) ) {
2020-11-05 11:20:49 -05:00
clear_bit ( KVM_APIC_SIPI , & apic - > pending_events ) ;
2020-12-03 16:33:19 +02:00
if ( vcpu - > arch . mp_state = = KVM_MP_STATE_INIT_RECEIVED ) {
/* evaluate pending_events before reading the vector */
smp_rmb ( ) ;
sipi_vector = apic - > sipi_vector ;
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
kvm_x86_ops . vcpu_deliver_sipi_vector ( vcpu , sipi_vector ) ;
2020-12-03 16:33:19 +02:00
vcpu - > arch . mp_state = KVM_MP_STATE_RUNNABLE ;
}
2013-03-13 12:42:34 +01:00
}
}
2016-12-16 14:30:36 -08:00
void kvm_lapic_exit ( void )
{
static_key_deferred_flush ( & apic_hw_disabled ) ;
static_key_deferred_flush ( & apic_sw_disabled ) ;
}