2008-04-16 23:28:09 -05:00
/*
* This program is free software ; you can redistribute it and / or modify
* it under the terms of the GNU General Public License , version 2 , as
* published by the Free Software Foundation .
*
* This program is distributed in the hope that it will be useful ,
* but WITHOUT ANY WARRANTY ; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
* GNU General Public License for more details .
*
* You should have received a copy of the GNU General Public License
* along with this program ; if not , write to the Free Software
* Foundation , 51 Franklin Street , Fifth Floor , Boston , MA 02110 - 1301 , USA .
*
* Copyright IBM Corp . 2007
*
* Authors : Hollis Blanchard < hollisb @ us . ibm . com >
* Christian Ehrhardt < ehrhardt @ linux . vnet . ibm . com >
*/
# include <linux/errno.h>
# include <linux/err.h>
# include <linux/kvm_host.h>
# include <linux/vmalloc.h>
2009-11-02 12:02:31 +00:00
# include <linux/hrtimer.h>
2017-02-02 19:15:33 +01:00
# include <linux/sched/signal.h>
2008-04-16 23:28:09 -05:00
# include <linux/fs.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2013-04-12 14:08:47 +00:00
# include <linux/file.h>
2013-10-07 22:18:01 +05:30
# include <linux/module.h>
2016-08-19 15:35:47 +10:00
# include <linux/irqbypass.h>
# include <linux/kvm_irqfd.h>
2008-04-16 23:28:09 -05:00
# include <asm/cputable.h>
2016-12-24 11:46:01 -08:00
# include <linux/uaccess.h>
2008-04-16 23:28:09 -05:00
# include <asm/kvm_ppc.h>
2008-07-25 13:54:52 -05:00
# include <asm/tlbflush.h>
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:23:08 +00:00
# include <asm/cputhreads.h>
2012-08-13 01:04:19 +02:00
# include <asm/irqflags.h>
2016-03-01 17:54:40 +11:00
# include <asm/iommu.h>
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# include <asm/switch_to.h>
2017-04-05 17:54:56 +10:00
# include <asm/xive.h>
2018-01-15 16:06:47 +11:00
# ifdef CONFIG_PPC_PSERIES
# include <asm/hvcall.h>
# include <asm/plpar_wrappers.h>
# endif
2017-04-05 17:54:56 +10:00
2008-12-02 15:51:57 -06:00
# include "timing.h"
2013-04-17 00:37:57 +02:00
# include "irq.h"
2008-12-23 14:57:26 +11:00
# include "../mm/mmu_decl.h"
2008-04-16 23:28:09 -05:00
2009-06-18 11:47:27 -03:00
# define CREATE_TRACE_POINTS
# include "trace.h"
2013-10-07 22:18:01 +05:30
struct kvmppc_ops * kvmppc_hv_ops ;
EXPORT_SYMBOL_GPL ( kvmppc_hv_ops ) ;
struct kvmppc_ops * kvmppc_pr_ops ;
EXPORT_SYMBOL_GPL ( kvmppc_pr_ops ) ;
2013-10-07 22:17:53 +05:30
2008-04-16 23:28:09 -05:00
int kvm_arch_vcpu_runnable ( struct kvm_vcpu * v )
{
2017-06-04 14:43:52 +02:00
return ! ! ( v - > arch . pending_exceptions ) | | kvm_request_pending ( v ) ;
2008-04-16 23:28:09 -05:00
}
2017-08-08 12:05:32 +08:00
bool kvm_arch_vcpu_in_kernel ( struct kvm_vcpu * vcpu )
{
return false ;
}
2012-03-08 16:44:24 -05:00
int kvm_arch_vcpu_should_kick ( struct kvm_vcpu * vcpu )
{
return 1 ;
}
2012-08-10 12:28:50 +02:00
/*
* Common checks before entering the guest world . Call with interrupts
* disabled .
*
2012-08-13 12:44:41 +02:00
* returns :
*
* = = 1 if we ' re ready to go into guest state
* < = 0 if we need to go back to the host with return value
2012-08-10 12:28:50 +02:00
*/
int kvmppc_prepare_to_enter ( struct kvm_vcpu * vcpu )
{
2014-01-09 19:18:40 -06:00
int r ;
WARN_ON ( irqs_disabled ( ) ) ;
hard_irq_disable ( ) ;
2012-08-10 12:28:50 +02:00
while ( true ) {
if ( need_resched ( ) ) {
local_irq_enable ( ) ;
cond_resched ( ) ;
2014-01-09 19:18:40 -06:00
hard_irq_disable ( ) ;
2012-08-10 12:28:50 +02:00
continue ;
}
if ( signal_pending ( current ) ) {
2012-08-13 12:44:41 +02:00
kvmppc_account_exit ( vcpu , SIGNAL_EXITS ) ;
vcpu - > run - > exit_reason = KVM_EXIT_INTR ;
r = - EINTR ;
2012-08-10 12:28:50 +02:00
break ;
}
2012-08-22 15:03:50 +00:00
vcpu - > mode = IN_GUEST_MODE ;
/*
* Reading vcpu - > requests must happen after setting vcpu - > mode ,
* so we don ' t miss a request because the requester sees
* OUTSIDE_GUEST_MODE and assumes we ' ll be checking requests
* before next entering the guest ( and thus doesn ' t IPI ) .
2016-03-13 11:10:30 +08:00
* This also orders the write to mode from any reads
* to the page tables done while the VCPU is running .
* Please see the comment in kvm_flush_remote_tlbs .
2012-08-22 15:03:50 +00:00
*/
2012-08-10 12:28:50 +02:00
smp_mb ( ) ;
2012-08-22 15:03:50 +00:00
2017-06-04 14:43:52 +02:00
if ( kvm_request_pending ( vcpu ) ) {
2012-08-10 12:28:50 +02:00
/* Make sure we process requests preemptable */
local_irq_enable ( ) ;
trace_kvm_check_requests ( vcpu ) ;
2012-08-13 12:50:35 +02:00
r = kvmppc_core_check_requests ( vcpu ) ;
2014-01-09 19:18:40 -06:00
hard_irq_disable ( ) ;
2012-08-13 12:50:35 +02:00
if ( r > 0 )
continue ;
break ;
2012-08-10 12:28:50 +02:00
}
if ( kvmppc_core_prepare_to_enter ( vcpu ) ) {
/* interrupts got enabled in between, so we
are back at square 1 */
continue ;
}
2016-06-15 15:18:26 +02:00
guest_enter_irqoff ( ) ;
2014-01-09 19:18:40 -06:00
return 1 ;
2012-08-10 12:28:50 +02:00
}
2014-01-09 19:18:40 -06:00
/* return to host */
local_irq_enable ( ) ;
2012-08-10 12:28:50 +02:00
return r ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_prepare_to_enter ) ;
2012-08-10 12:28:50 +02:00
2014-04-24 13:46:24 +02:00
# if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
static void kvmppc_swab_shared ( struct kvm_vcpu * vcpu )
{
struct kvm_vcpu_arch_shared * shared = vcpu - > arch . shared ;
int i ;
shared - > sprg0 = swab64 ( shared - > sprg0 ) ;
shared - > sprg1 = swab64 ( shared - > sprg1 ) ;
shared - > sprg2 = swab64 ( shared - > sprg2 ) ;
shared - > sprg3 = swab64 ( shared - > sprg3 ) ;
shared - > srr0 = swab64 ( shared - > srr0 ) ;
shared - > srr1 = swab64 ( shared - > srr1 ) ;
shared - > dar = swab64 ( shared - > dar ) ;
shared - > msr = swab64 ( shared - > msr ) ;
shared - > dsisr = swab32 ( shared - > dsisr ) ;
shared - > int_pending = swab32 ( shared - > int_pending ) ;
for ( i = 0 ; i < ARRAY_SIZE ( shared - > sr ) ; i + + )
shared - > sr [ i ] = swab32 ( shared - > sr [ i ] ) ;
}
# endif
2010-07-29 14:47:48 +02:00
int kvmppc_kvm_pv ( struct kvm_vcpu * vcpu )
{
int nr = kvmppc_get_gpr ( vcpu , 11 ) ;
int r ;
unsigned long __maybe_unused param1 = kvmppc_get_gpr ( vcpu , 3 ) ;
unsigned long __maybe_unused param2 = kvmppc_get_gpr ( vcpu , 4 ) ;
unsigned long __maybe_unused param3 = kvmppc_get_gpr ( vcpu , 5 ) ;
unsigned long __maybe_unused param4 = kvmppc_get_gpr ( vcpu , 6 ) ;
unsigned long r2 = 0 ;
2014-04-24 13:46:24 +02:00
if ( ! ( kvmppc_get_msr ( vcpu ) & MSR_SF ) ) {
2010-07-29 14:47:48 +02:00
/* 32 bit mode */
param1 & = 0xffffffff ;
param2 & = 0xffffffff ;
param3 & = 0xffffffff ;
param4 & = 0xffffffff ;
}
switch ( nr ) {
2012-07-03 05:48:50 +00:00
case KVM_HCALL_TOKEN ( KVM_HC_PPC_MAP_MAGIC_PAGE ) :
2010-07-29 14:47:55 +02:00
{
2014-04-24 13:46:24 +02:00
# if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
/* Book3S can be little endian, find it out here */
int shared_big_endian = true ;
if ( vcpu - > arch . intr_msr & MSR_LE )
shared_big_endian = false ;
if ( shared_big_endian ! = vcpu - > arch . shared_big_endian )
kvmppc_swab_shared ( vcpu ) ;
vcpu - > arch . shared_big_endian = shared_big_endian ;
# endif
2014-05-12 01:08:32 +02:00
if ( ! ( param2 & MAGIC_PAGE_FLAG_NOT_MAPPED_NX ) ) {
/*
* Older versions of the Linux magic page code had
* a bug where they would map their trampoline code
* NX . If that ' s the case , remove ! PR NX capability .
*/
vcpu - > arch . disable_kernel_nx = true ;
kvm_make_request ( KVM_REQ_TLB_FLUSH , vcpu ) ;
}
vcpu - > arch . magic_page_pa = param1 & ~ 0xfffULL ;
vcpu - > arch . magic_page_ea = param2 & ~ 0xfffULL ;
2010-07-29 14:47:55 +02:00
2014-07-13 16:37:12 +02:00
# ifdef CONFIG_PPC_64K_PAGES
/*
* Make sure our 4 k magic page is in the same window of a 64 k
* page within the guest and within the host ' s page .
*/
if ( ( vcpu - > arch . magic_page_pa & 0xf000 ) ! =
( ( ulong ) vcpu - > arch . shared & 0xf000 ) ) {
void * old_shared = vcpu - > arch . shared ;
ulong shared = ( ulong ) vcpu - > arch . shared ;
void * new_shared ;
shared & = PAGE_MASK ;
shared | = vcpu - > arch . magic_page_pa & 0xf000 ;
new_shared = ( void * ) shared ;
memcpy ( new_shared , old_shared , 0x1000 ) ;
vcpu - > arch . shared = new_shared ;
}
# endif
KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn
This allows additional registers to be accessed by the guest
in PR-mode KVM without trapping.
SPRG4-7 are readable from userspace. On booke, KVM will sync
these registers when it enters the guest, so that accesses from
guest userspace will work. The guest kernel, OTOH, must consistently
use either the real registers or the shared area between exits. This
also applies to the already-paravirted SPRG3.
On non-booke, it's not clear to what extent SPRG4-7 are supported
(they're not architected for book3s, but exist on at least some classic
chips). They are copied in the get/set regs ioctls, but I do not see any
non-booke emulation. I also do not see any syncing with real registers
(in PR-mode) including the user-readable SPRG3. This patch should not
make that situation any worse.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-08 18:23:30 -06:00
r2 = KVM_MAGIC_FEAT_SR | KVM_MAGIC_FEAT_MAS0_TO_SPRG7 ;
2010-08-03 11:32:56 +02:00
2012-07-03 05:48:50 +00:00
r = EV_SUCCESS ;
2010-07-29 14:47:55 +02:00
break ;
}
2012-07-03 05:48:50 +00:00
case KVM_HCALL_TOKEN ( KVM_HC_FEATURES ) :
r = EV_SUCCESS ;
2012-02-15 23:40:00 +00:00
# if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2)
2010-07-29 14:47:55 +02:00
r2 | = ( 1 < < KVM_FEATURE_MAGIC_PAGE ) ;
# endif
2010-07-29 14:47:48 +02:00
/* Second return value is in r4 */
break ;
2012-07-03 05:48:52 +00:00
case EV_HCALL_TOKEN ( EV_IDLE ) :
r = EV_SUCCESS ;
kvm_vcpu_block ( vcpu ) ;
2017-04-26 22:32:19 +02:00
kvm_clear_request ( KVM_REQ_UNHALT , vcpu ) ;
2012-07-03 05:48:52 +00:00
break ;
2010-07-29 14:47:48 +02:00
default :
2012-07-03 05:48:50 +00:00
r = EV_UNIMPLEMENTED ;
2010-07-29 14:47:48 +02:00
break ;
}
2010-08-03 11:32:56 +02:00
kvmppc_set_gpr ( vcpu , 4 , r2 ) ;
2010-07-29 14:47:48 +02:00
return r ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_kvm_pv ) ;
2008-04-16 23:28:09 -05:00
2011-08-10 13:57:08 +02:00
int kvmppc_sanity_check ( struct kvm_vcpu * vcpu )
{
int r = false ;
/* We have to know what CPU to virtualize */
if ( ! vcpu - > arch . pvr )
goto out ;
/* PAPR only works with book3s_64 */
if ( ( vcpu - > arch . cpu_type ! = KVM_CPU_3S_64 ) & & vcpu - > arch . papr_enabled )
goto out ;
/* HV KVM can only do PAPR mode for now */
2013-10-07 22:18:02 +05:30
if ( ! vcpu - > arch . papr_enabled & & is_kvmppc_hv_enabled ( vcpu - > kvm ) )
2011-08-10 13:57:08 +02:00
goto out ;
2011-12-20 15:34:43 +00:00
# ifdef CONFIG_KVM_BOOKE_HV
if ( ! cpu_has_feature ( CPU_FTR_EMB_HV ) )
goto out ;
# endif
2011-08-10 13:57:08 +02:00
r = true ;
out :
vcpu - > arch . sane = r ;
return r ? 0 : - EINVAL ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_sanity_check ) ;
2011-08-10 13:57:08 +02:00
2008-04-16 23:28:09 -05:00
int kvmppc_emulate_mmio ( struct kvm_run * run , struct kvm_vcpu * vcpu )
{
enum emulation_result er ;
int r ;
2014-06-18 14:53:49 +02:00
er = kvmppc_emulate_loadstore ( vcpu ) ;
2008-04-16 23:28:09 -05:00
switch ( er ) {
case EMULATE_DONE :
/* Future optimization: only reload non-volatiles if they were
* actually modified . */
r = RESUME_GUEST_NV ;
break ;
2014-07-23 19:06:21 +03:00
case EMULATE_AGAIN :
r = RESUME_GUEST ;
break ;
2008-04-16 23:28:09 -05:00
case EMULATE_DO_MMIO :
run - > exit_reason = KVM_EXIT_MMIO ;
/* We must reload nonvolatiles because "update" load/store
* instructions modify register state . */
/* Future optimization: only reload non-volatiles if they were
* actually modified . */
r = RESUME_HOST_NV ;
break ;
case EMULATE_FAIL :
2014-07-23 19:06:21 +03:00
{
u32 last_inst ;
2014-09-10 14:37:29 +02:00
kvmppc_get_last_inst ( vcpu , INST_GENERIC , & last_inst ) ;
2008-04-16 23:28:09 -05:00
/* XXX Deliver Program interrupt to guest. */
2014-07-23 19:06:21 +03:00
pr_emerg ( " %s: emulation failed (%08x) \n " , __func__ , last_inst ) ;
2008-04-16 23:28:09 -05:00
r = RESUME_HOST ;
break ;
2014-07-23 19:06:21 +03:00
}
2008-04-16 23:28:09 -05:00
default :
2012-12-14 23:46:03 +01:00
WARN_ON ( 1 ) ;
r = RESUME_GUEST ;
2008-04-16 23:28:09 -05:00
}
return r ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_emulate_mmio ) ;
2008-04-16 23:28:09 -05:00
2014-06-20 13:58:16 +02:00
int kvmppc_st ( struct kvm_vcpu * vcpu , ulong * eaddr , int size , void * ptr ,
bool data )
{
2014-06-20 14:43:36 +02:00
ulong mp_pa = vcpu - > arch . magic_page_pa & KVM_PAM & PAGE_MASK ;
2014-06-20 13:58:16 +02:00
struct kvmppc_pte pte ;
int r ;
vcpu - > stat . st + + ;
r = kvmppc_xlate ( vcpu , * eaddr , data ? XLATE_DATA : XLATE_INST ,
XLATE_WRITE , & pte ) ;
if ( r < 0 )
return r ;
* eaddr = pte . raddr ;
if ( ! pte . may_write )
return - EPERM ;
2014-06-20 14:43:36 +02:00
/* Magic page override */
if ( kvmppc_supports_magic_page ( vcpu ) & & mp_pa & &
( ( pte . raddr & KVM_PAM & PAGE_MASK ) = = mp_pa ) & &
! ( kvmppc_get_msr ( vcpu ) & MSR_PR ) ) {
void * magic = vcpu - > arch . shared ;
magic + = pte . eaddr & 0xfff ;
memcpy ( magic , ptr , size ) ;
return EMULATE_DONE ;
}
2014-06-20 13:58:16 +02:00
if ( kvm_write_guest ( vcpu - > kvm , pte . raddr , ptr , size ) )
return EMULATE_DO_MMIO ;
return EMULATE_DONE ;
}
EXPORT_SYMBOL_GPL ( kvmppc_st ) ;
int kvmppc_ld ( struct kvm_vcpu * vcpu , ulong * eaddr , int size , void * ptr ,
bool data )
{
2014-06-20 14:43:36 +02:00
ulong mp_pa = vcpu - > arch . magic_page_pa & KVM_PAM & PAGE_MASK ;
2014-06-20 13:58:16 +02:00
struct kvmppc_pte pte ;
int rc ;
vcpu - > stat . ld + + ;
rc = kvmppc_xlate ( vcpu , * eaddr , data ? XLATE_DATA : XLATE_INST ,
XLATE_READ , & pte ) ;
if ( rc )
return rc ;
* eaddr = pte . raddr ;
if ( ! pte . may_read )
return - EPERM ;
if ( ! data & & ! pte . may_execute )
return - ENOEXEC ;
2014-06-20 14:43:36 +02:00
/* Magic page override */
if ( kvmppc_supports_magic_page ( vcpu ) & & mp_pa & &
( ( pte . raddr & KVM_PAM & PAGE_MASK ) = = mp_pa ) & &
! ( kvmppc_get_msr ( vcpu ) & MSR_PR ) ) {
void * magic = vcpu - > arch . shared ;
magic + = pte . eaddr & 0xfff ;
memcpy ( ptr , magic , size ) ;
return EMULATE_DONE ;
}
2014-06-20 14:17:30 +02:00
if ( kvm_read_guest ( vcpu - > kvm , pte . raddr , ptr , size ) )
return EMULATE_DO_MMIO ;
2014-06-20 13:58:16 +02:00
return EMULATE_DONE ;
}
EXPORT_SYMBOL_GPL ( kvmppc_ld ) ;
2014-08-28 15:13:03 +02:00
int kvm_arch_hardware_enable ( void )
2008-04-16 23:28:09 -05:00
{
2009-09-15 11:37:46 +02:00
return 0 ;
2008-04-16 23:28:09 -05:00
}
int kvm_arch_hardware_setup ( void )
{
return 0 ;
}
void kvm_arch_check_processor_compat ( void * rtn )
{
2008-11-05 09:36:14 -06:00
* ( int * ) rtn = kvmppc_core_check_processor_compat ( ) ;
2008-04-16 23:28:09 -05:00
}
2012-01-04 10:25:20 +01:00
int kvm_arch_init_vm ( struct kvm * kvm , unsigned long type )
2008-04-16 23:28:09 -05:00
{
2013-10-07 22:18:01 +05:30
struct kvmppc_ops * kvm_ops = NULL ;
/*
* if we have both HV and PR enabled , default is HV
*/
if ( type = = 0 ) {
if ( kvmppc_hv_ops )
kvm_ops = kvmppc_hv_ops ;
else
kvm_ops = kvmppc_pr_ops ;
if ( ! kvm_ops )
goto err_out ;
} else if ( type = = KVM_VM_PPC_HV ) {
if ( ! kvmppc_hv_ops )
goto err_out ;
kvm_ops = kvmppc_hv_ops ;
} else if ( type = = KVM_VM_PPC_PR ) {
if ( ! kvmppc_pr_ops )
goto err_out ;
kvm_ops = kvmppc_pr_ops ;
} else
goto err_out ;
if ( kvm_ops - > owner & & ! try_module_get ( kvm_ops - > owner ) )
return - ENOENT ;
kvm - > arch . kvm_ops = kvm_ops ;
2011-06-29 00:19:22 +00:00
return kvmppc_core_init_vm ( kvm ) ;
2013-10-07 22:18:01 +05:30
err_out :
return - EINVAL ;
2008-04-16 23:28:09 -05:00
}
2016-09-07 14:47:23 -04:00
bool kvm_arch_has_vcpu_debugfs ( void )
{
return false ;
}
int kvm_arch_create_vcpu_debugfs ( struct kvm_vcpu * vcpu )
{
return 0 ;
}
2010-11-09 17:02:49 +01:00
void kvm_arch_destroy_vm ( struct kvm * kvm )
2008-04-16 23:28:09 -05:00
{
unsigned int i ;
2009-06-09 15:56:29 +03:00
struct kvm_vcpu * vcpu ;
2008-04-16 23:28:09 -05:00
2015-12-21 16:22:51 -06:00
# ifdef CONFIG_KVM_XICS
/*
* We call kick_all_cpus_sync ( ) to ensure that all
* CPUs have executed any pending IPIs before we
* continue and free VCPUs structures below .
*/
if ( is_kvmppc_hv_enabled ( kvm ) )
kick_all_cpus_sync ( ) ;
# endif
2009-06-09 15:56:29 +03:00
kvm_for_each_vcpu ( i , vcpu , kvm )
kvm_arch_vcpu_free ( vcpu ) ;
mutex_lock ( & kvm - > lock ) ;
for ( i = 0 ; i < atomic_read ( & kvm - > online_vcpus ) ; i + + )
kvm - > vcpus [ i ] = NULL ;
atomic_set ( & kvm - > online_vcpus , 0 ) ;
2011-06-29 00:19:22 +00:00
kvmppc_core_destroy_vm ( kvm ) ;
2009-06-09 15:56:29 +03:00
mutex_unlock ( & kvm - > lock ) ;
2013-10-07 22:18:01 +05:30
/* drop the module reference */
module_put ( kvm - > arch . kvm_ops - > owner ) ;
2008-04-16 23:28:09 -05:00
}
2014-07-14 18:27:35 +02:00
int kvm_vm_ioctl_check_extension ( struct kvm * kvm , long ext )
2008-04-16 23:28:09 -05:00
{
int r ;
2014-07-14 18:55:19 +02:00
/* Assume we're using HV mode when the HV module is loaded */
2013-10-07 22:18:01 +05:30
int hv_enabled = kvmppc_hv_ops ? 1 : 0 ;
2008-04-16 23:28:09 -05:00
2014-07-14 18:55:19 +02:00
if ( kvm ) {
/*
* Hooray - we know which VM type we ' re running on . Depend on
* that rather than the guess above .
*/
hv_enabled = is_kvmppc_hv_enabled ( kvm ) ;
}
2008-04-16 23:28:09 -05:00
switch ( ext ) {
2011-04-27 17:24:21 -05:00
# ifdef CONFIG_BOOKE
case KVM_CAP_PPC_BOOKE_SREGS :
2012-08-08 20:38:19 +00:00
case KVM_CAP_PPC_BOOKE_WATCHDOG :
2013-01-04 18:12:48 +01:00
case KVM_CAP_PPC_EPR :
2011-04-27 17:24:21 -05:00
# else
2009-11-30 03:02:02 +00:00
case KVM_CAP_PPC_SEGSTATE :
2011-09-14 21:45:23 +02:00
case KVM_CAP_PPC_HIOR :
2011-08-08 17:29:42 +02:00
case KVM_CAP_PPC_PAPR :
2011-04-27 17:24:21 -05:00
# endif
2010-03-24 21:48:18 +01:00
case KVM_CAP_PPC_UNSET_IRQ :
2010-08-30 13:50:45 +02:00
case KVM_CAP_PPC_IRQ_LEVEL :
2010-03-24 21:48:29 +01:00
case KVM_CAP_ENABLE_CAP :
2014-06-02 11:02:59 +10:00
case KVM_CAP_ENABLE_CAP_VM :
2011-09-14 10:02:41 +02:00
case KVM_CAP_ONE_REG :
2012-10-09 00:06:20 +02:00
case KVM_CAP_IOEVENTFD :
2013-04-12 14:08:46 +00:00
case KVM_CAP_DEVICE_CTRL :
2017-02-08 11:50:15 +01:00
case KVM_CAP_IMMEDIATE_EXIT :
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:21:34 +00:00
r = 1 ;
break ;
case KVM_CAP_PPC_PAIRED_SINGLES :
2010-03-24 21:48:30 +01:00
case KVM_CAP_PPC_OSI :
2010-07-29 14:48:08 +02:00
case KVM_CAP_PPC_GET_PVINFO :
2012-02-15 23:40:00 +00:00
# if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
2011-08-18 15:25:21 -05:00
case KVM_CAP_SW_TLB :
2013-04-12 14:08:47 +00:00
# endif
2013-10-07 22:17:56 +05:30
/* We support this only for PR */
2013-10-07 22:18:01 +05:30
r = ! hv_enabled ;
2009-11-30 03:02:02 +00:00
break ;
2013-10-07 22:17:56 +05:30
# ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC :
r = 1 ;
break ;
# endif
2012-03-15 21:58:34 +00:00
# ifdef CONFIG_PPC_BOOK3S_64
2011-06-29 00:22:41 +00:00
case KVM_CAP_SPAPR_TCE :
2016-03-01 17:54:40 +11:00
case KVM_CAP_SPAPR_TCE_64 :
2017-03-22 15:21:56 +11:00
/* fallthrough */
case KVM_CAP_SPAPR_TCE_VFIO :
2013-04-17 20:30:00 +00:00
case KVM_CAP_PPC_RTAS :
2014-05-22 17:40:15 +02:00
case KVM_CAP_PPC_FIXUP_HCALL :
2014-06-02 11:02:59 +10:00
case KVM_CAP_PPC_ENABLE_HCALL :
2013-04-27 00:28:37 +00:00
# ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS :
# endif
2018-01-15 16:06:47 +11:00
case KVM_CAP_PPC_GET_CPU_CHAR :
2011-06-29 00:22:41 +00:00
r = 1 ;
break ;
2016-11-23 16:14:07 +11:00
case KVM_CAP_PPC_ALLOC_HTAB :
r = hv_enabled ;
break ;
2012-03-15 21:58:34 +00:00
# endif /* CONFIG_PPC_BOOK3S_64 */
2013-10-07 22:17:56 +05:30
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:23:08 +00:00
case KVM_CAP_PPC_SMT :
KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).
Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads. To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.
This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore. (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected. In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)
With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation. Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-18 17:43:30 +11:00
r = 0 ;
KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9
On POWER9, we no longer have the restriction that we had on POWER8
where all threads in a core have to be in the same partition, so
the CPU threads are now independent. However, we still want to be
able to run guests with a virtual SMT topology, if only to allow
migration of guests from POWER8 systems to POWER9.
A guest that has a virtual SMT mode greater than 1 will expect to
be able to use the doorbell facility; it will expect the msgsndp
and msgclrp instructions to work appropriately and to be able to read
sensible values from the TIR (thread identification register) and
DPDES (directed privileged doorbell exception status) special-purpose
registers. However, since each CPU thread is a separate sub-processor
in POWER9, these instructions and registers can only be used within
a single CPU thread.
In order for these instructions to appear to act correctly according
to the guest's virtual SMT mode, we have to trap and emulate them.
We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
register. The emulation is triggered by the hypervisor facility
unavailable interrupt that occurs when the guest uses them.
To cause a doorbell interrupt to occur within the guest, we set the
DPDES register to 1. If the guest has interrupts enabled, the CPU
will generate a doorbell interrupt and clear the DPDES register in
hardware. The DPDES hardware register for the guest is saved in the
vcpu->arch.vcore->dpdes field. Since this gets written by the guest
exit code, other VCPUs wishing to cause a doorbell interrupt don't
write that field directly, but instead set a vcpu->arch.doorbell_request
flag. This is consumed and set to 0 by the guest entry code, which
then sets DPDES to 1.
Emulating reads of the DPDES register is somewhat involved, because
it requires reading the doorbell pending interrupt status of all of the
VCPU threads in the virtual core, and if any of those VCPUs are
running, their doorbell status is only up-to-date in the hardware
DPDES registers of the CPUs where they are running. In order to get
a reasonable approximation of the current doorbell status, we send
those CPUs an IPI, causing an exit from the guest which will update
the vcpu->arch.vcore->dpdes field. We then use that value in
constructing the emulated DPDES register value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-16 16:41:20 +10:00
if ( kvm ) {
if ( kvm - > arch . emul_smt_mode > 1 )
r = kvm - > arch . emul_smt_mode ;
else
r = kvm - > arch . smt_mode ;
} else if ( hv_enabled ) {
KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).
Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads. To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.
This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore. (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected. In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)
With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation. Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-18 17:43:30 +11:00
if ( cpu_has_feature ( CPU_FTR_ARCH_300 ) )
r = 1 ;
else
r = threads_per_subcore ;
}
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:23:08 +00:00
break ;
2017-06-21 16:01:27 +10:00
case KVM_CAP_PPC_SMT_POSSIBLE :
r = 1 ;
if ( hv_enabled ) {
if ( ! cpu_has_feature ( CPU_FTR_ARCH_300 ) )
r = ( ( threads_per_subcore < < 1 ) - 1 ) ;
else
/* P9 can emulate dbells, so allow any mode */
r = 8 | 4 | 2 | 1 ;
}
break ;
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:25:44 +00:00
case KVM_CAP_PPC_RMA :
2014-12-03 13:30:38 +11:00
r = 0 ;
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:25:44 +00:00
break ;
2015-03-20 20:39:41 +11:00
case KVM_CAP_PPC_HWRNG :
r = kvmppc_hwrng_present ( ) ;
break ;
2017-01-30 21:21:41 +11:00
case KVM_CAP_PPC_MMU_RADIX :
2017-01-30 21:21:53 +11:00
r = ! ! ( hv_enabled & & radix_enabled ( ) ) ;
2017-01-30 21:21:41 +11:00
break ;
case KVM_CAP_PPC_MMU_HASH_V3 :
2017-09-13 16:00:10 +10:00
r = ! ! ( hv_enabled & & cpu_has_feature ( CPU_FTR_ARCH_300 ) ) ;
2017-01-30 21:21:41 +11:00
break ;
2012-08-07 10:24:14 +02:00
# endif
KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7. Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page. Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.
Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970. We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out. If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-12 12:38:05 +00:00
case KVM_CAP_SYNC_MMU :
2013-10-07 22:17:56 +05:30
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
2014-12-03 13:30:38 +11:00
r = hv_enabled ;
2012-08-07 10:24:14 +02:00
# elif defined(KVM_ARCH_WANT_MMU_NOTIFIER)
r = 1 ;
# else
r = 0 ;
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-19 22:57:20 +00:00
# endif
2013-10-07 22:17:56 +05:30
break ;
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-19 22:57:20 +00:00
case KVM_CAP_PPC_HTAB_FD :
2013-10-07 22:18:01 +05:30
r = hv_enabled ;
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-19 22:57:20 +00:00
break ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:21:34 +00:00
# endif
2011-12-07 16:55:57 +00:00
case KVM_CAP_NR_VCPUS :
/*
* Recommending a number of CPUs is somewhat arbitrary ; we
* return the number of present CPUs for - HV ( since a host
* will have secondary threads " offline " ) , and for other KVM
* implementations just count online CPUs .
*/
2013-10-07 22:18:01 +05:30
if ( hv_enabled )
2013-10-07 22:17:56 +05:30
r = num_present_cpus ( ) ;
else
r = num_online_cpus ( ) ;
2011-12-07 16:55:57 +00:00
break ;
2015-10-16 10:27:53 +05:30
case KVM_CAP_NR_MEMSLOTS :
r = KVM_USER_MEM_SLOTS ;
break ;
2011-12-07 16:55:57 +00:00
case KVM_CAP_MAX_VCPUS :
r = KVM_MAX_VCPUS ;
break ;
2012-04-26 19:43:42 +00:00
# ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_PPC_GET_SMMU_INFO :
r = 1 ;
break ;
2016-02-15 12:55:09 +11:00
case KVM_CAP_SPAPR_MULTITCE :
r = 1 ;
break ;
2016-12-20 16:49:07 +11:00
case KVM_CAP_SPAPR_RESIZE_HPT :
2018-02-02 14:29:08 +11:00
r = ! ! hv_enabled ;
2016-12-20 16:49:07 +11:00
break ;
2017-05-11 16:32:48 +05:30
# endif
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI :
r = hv_enabled ;
break ;
2012-04-26 19:43:42 +00:00
# endif
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.
The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.
The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 21:32:01 +11:00
# ifdef CONFIG_PPC_TRANSACTIONAL_MEM
2016-07-20 13:41:36 +10:00
case KVM_CAP_PPC_HTM :
2017-11-09 14:30:24 +11:00
r = hv_enabled & &
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.
The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.
The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 21:32:01 +11:00
( ! ! ( cur_cpu_spec - > cpu_user_features2 & PPC_FEATURE2_HTM ) | |
cpu_has_feature ( CPU_FTR_P9_TM_HV_ASSIST ) ) ;
2016-07-20 13:41:36 +10:00
break ;
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.
The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.
The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 21:32:01 +11:00
# endif
2008-04-16 23:28:09 -05:00
default :
r = 0 ;
break ;
}
return r ;
}
long kvm_arch_dev_ioctl ( struct file * filp ,
unsigned int ioctl , unsigned long arg )
{
return - EINVAL ;
}
2013-10-07 22:18:00 +05:30
void kvm_arch_free_memslot ( struct kvm * kvm , struct kvm_memory_slot * free ,
2012-02-08 13:02:18 +09:00
struct kvm_memory_slot * dont )
{
2013-10-07 22:18:00 +05:30
kvmppc_core_free_memslot ( kvm , free , dont ) ;
2012-02-08 13:02:18 +09:00
}
2013-10-07 22:18:00 +05:30
int kvm_arch_create_memslot ( struct kvm * kvm , struct kvm_memory_slot * slot ,
unsigned long npages )
2012-02-08 13:02:18 +09:00
{
2013-10-07 22:18:00 +05:30
return kvmppc_core_create_memslot ( kvm , slot , npages ) ;
2012-02-08 13:02:18 +09:00
}
2009-12-23 14:35:18 -02:00
int kvm_arch_prepare_memory_region ( struct kvm * kvm ,
2013-02-27 19:41:56 +09:00
struct kvm_memory_slot * memslot ,
2015-05-18 13:59:39 +02:00
const struct kvm_userspace_memory_region * mem ,
2013-02-27 19:44:34 +09:00
enum kvm_mr_change change )
2008-04-16 23:28:09 -05:00
{
2012-09-11 13:27:46 +00:00
return kvmppc_core_prepare_memory_region ( kvm , memslot , mem ) ;
2008-04-16 23:28:09 -05:00
}
2009-12-23 14:35:18 -02:00
void kvm_arch_commit_memory_region ( struct kvm * kvm ,
2015-05-18 13:59:39 +02:00
const struct kvm_userspace_memory_region * mem ,
2013-02-27 19:45:25 +09:00
const struct kvm_memory_slot * old ,
2015-05-18 13:20:23 +02:00
const struct kvm_memory_slot * new ,
2013-02-27 19:45:25 +09:00
enum kvm_mr_change change )
2009-12-23 14:35:18 -02:00
{
2015-05-18 13:20:23 +02:00
kvmppc_core_commit_memory_region ( kvm , mem , old , new ) ;
2009-12-23 14:35:18 -02:00
}
2012-08-24 15:54:57 -03:00
void kvm_arch_flush_shadow_memslot ( struct kvm * kvm ,
struct kvm_memory_slot * slot )
2008-07-10 20:49:31 -03:00
{
2012-09-11 13:28:18 +00:00
kvmppc_core_flush_memslot ( kvm , slot ) ;
2008-07-10 20:49:31 -03:00
}
2008-04-16 23:28:09 -05:00
struct kvm_vcpu * kvm_arch_vcpu_create ( struct kvm * kvm , unsigned int id )
{
2008-12-02 15:51:57 -06:00
struct kvm_vcpu * vcpu ;
vcpu = kvmppc_core_vcpu_create ( kvm , id ) ;
2011-12-06 21:19:42 +00:00
if ( ! IS_ERR ( vcpu ) ) {
vcpu - > arch . wqp = & vcpu - > wq ;
2010-03-09 14:13:43 +08:00
kvmppc_create_vcpu_debugfs ( vcpu , id ) ;
2011-12-06 21:19:42 +00:00
}
2008-12-02 15:51:57 -06:00
return vcpu ;
2008-04-16 23:28:09 -05:00
}
2014-12-04 15:47:07 +01:00
void kvm_arch_vcpu_postcreate ( struct kvm_vcpu * vcpu )
2012-11-27 23:29:02 -02:00
{
}
2008-04-16 23:28:09 -05:00
void kvm_arch_vcpu_free ( struct kvm_vcpu * vcpu )
{
2010-02-22 16:52:14 +01:00
/* Make sure we're not using the vcpu anymore */
hrtimer_cancel ( & vcpu - > arch . dec_timer ) ;
2008-12-02 15:51:57 -06:00
kvmppc_remove_vcpu_debugfs ( vcpu ) ;
2013-04-12 14:08:47 +00:00
switch ( vcpu - > arch . irq_type ) {
case KVMPPC_IRQ_MPIC :
kvmppc_mpic_disconnect_vcpu ( vcpu - > arch . mpic , vcpu ) ;
break ;
2013-04-17 20:30:26 +00:00
case KVMPPC_IRQ_XICS :
2017-04-05 17:54:56 +10:00
if ( xive_enabled ( ) )
kvmppc_xive_cleanup_vcpu ( vcpu ) ;
else
kvmppc_xics_free_icp ( vcpu ) ;
2013-04-17 20:30:26 +00:00
break ;
2013-04-12 14:08:47 +00:00
}
2008-11-05 09:36:18 -06:00
kvmppc_core_vcpu_free ( vcpu ) ;
2008-04-16 23:28:09 -05:00
}
void kvm_arch_vcpu_destroy ( struct kvm_vcpu * vcpu )
{
kvm_arch_vcpu_free ( vcpu ) ;
}
int kvm_cpu_has_pending_timer ( struct kvm_vcpu * vcpu )
{
2008-11-05 09:36:14 -06:00
return kvmppc_core_pending_dec ( vcpu ) ;
2008-04-16 23:28:09 -05:00
}
2015-05-22 09:25:02 +02:00
static enum hrtimer_restart kvmppc_decrementer_wakeup ( struct hrtimer * timer )
2009-11-02 12:02:31 +00:00
{
struct kvm_vcpu * vcpu ;
vcpu = container_of ( timer , struct kvm_vcpu , arch . dec_timer ) ;
2014-09-01 17:19:56 +03:00
kvmppc_decrementer_func ( vcpu ) ;
2009-11-02 12:02:31 +00:00
return HRTIMER_NORESTART ;
}
2008-04-16 23:28:09 -05:00
int kvm_arch_vcpu_init ( struct kvm_vcpu * vcpu )
{
2012-08-08 20:38:19 +00:00
int ret ;
2009-11-02 12:02:31 +00:00
hrtimer_init ( & vcpu - > arch . dec_timer , CLOCK_REALTIME , HRTIMER_MODE_ABS ) ;
vcpu - > arch . dec_timer . function = kvmppc_decrementer_wakeup ;
KVM: PPC: Book3S HV: Enable migration of decrementer register
This adds a register identifier for use with the one_reg interface
to allow the decrementer expiry time to be read and written by
userspace. The decrementer expiry time is in guest timebase units
and is equal to the sum of the decrementer and the guest timebase.
(The expiry time is used rather than the decrementer value itself
because the expiry time is not constantly changing, though the
decrementer value is, while the guest vcpu is not running.)
Without this, a guest vcpu migrated to a new host will see its
decrementer set to some random value. On POWER8 and earlier, the
decrementer is 32 bits wide and counts down at 512MHz, so the
guest vcpu will potentially see no decrementer interrupts for up
to about 4 seconds, which will lead to a stall. With POWER9, the
decrementer is now 56 bits side, so the stall can be much longer
(up to 2.23 years) and more noticeable.
To help work around the problem in cases where userspace has not been
updated to migrate the decrementer expiry time, we now set the
default decrementer expiry at vcpu creation time to the current time
rather than the maximum possible value. This should mean an
immediate decrementer interrupt when a migrated vcpu starts
running. In cases where the decrementer is 32 bits wide and more
than 4 seconds elapse between the creation of the vcpu and when it
first runs, the decrementer would have wrapped around to positive
values and there may still be a stall - but this is no worse than
the current situation. In the large-decrementer case, we are sure
to get an immediate decrementer interrupt (assuming the time from
vcpu creation to first run is less than 2.23 years) and we thus
avoid a very long stall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-12 20:55:20 +11:00
vcpu - > arch . dec_expires = get_tb ( ) ;
2008-04-16 23:28:09 -05:00
2011-03-25 10:32:13 +05:30
# ifdef CONFIG_KVM_EXIT_TIMING
mutex_init ( & vcpu - > arch . exit_timing_lock ) ;
# endif
2012-08-08 20:38:19 +00:00
ret = kvmppc_subarch_vcpu_init ( vcpu ) ;
return ret ;
2008-04-16 23:28:09 -05:00
}
void kvm_arch_vcpu_uninit ( struct kvm_vcpu * vcpu )
{
2009-01-03 16:22:59 -06:00
kvmppc_mmu_destroy ( vcpu ) ;
2012-08-08 20:38:19 +00:00
kvmppc_subarch_vcpu_uninit ( vcpu ) ;
2008-04-16 23:28:09 -05:00
}
void kvm_arch_vcpu_load ( struct kvm_vcpu * vcpu , int cpu )
{
2011-04-27 17:24:10 -05:00
# ifdef CONFIG_BOOKE
/*
* vrsave ( formerly usprg0 ) isn ' t used by Linux , but may
* be used by the guest .
*
* On non - booke this is associated with Altivec and
* is handled by code in book3s . c .
*/
mtspr ( SPRN_VRSAVE , vcpu - > arch . vrsave ) ;
# endif
2008-11-05 09:36:14 -06:00
kvmppc_core_vcpu_load ( vcpu , cpu ) ;
2008-04-16 23:28:09 -05:00
}
void kvm_arch_vcpu_put ( struct kvm_vcpu * vcpu )
{
2008-11-05 09:36:14 -06:00
kvmppc_core_vcpu_put ( vcpu ) ;
2011-04-27 17:24:10 -05:00
# ifdef CONFIG_BOOKE
vcpu - > arch . vrsave = mfspr ( SPRN_VRSAVE ) ;
# endif
2008-04-16 23:28:09 -05:00
}
2016-08-19 15:35:47 +10:00
/*
* irq_bypass_add_producer and irq_bypass_del_producer are only
* useful if the architecture supports PCI passthrough .
* irq_bypass_stop and irq_bypass_start are not needed and so
* kvm_ops are not defined for them .
*/
bool kvm_arch_has_irq_bypass ( void )
{
return ( ( kvmppc_hv_ops & & kvmppc_hv_ops - > irq_bypass_add_producer ) | |
( kvmppc_pr_ops & & kvmppc_pr_ops - > irq_bypass_add_producer ) ) ;
}
int kvm_arch_irq_bypass_add_producer ( struct irq_bypass_consumer * cons ,
struct irq_bypass_producer * prod )
{
struct kvm_kernel_irqfd * irqfd =
container_of ( cons , struct kvm_kernel_irqfd , consumer ) ;
struct kvm * kvm = irqfd - > kvm ;
if ( kvm - > arch . kvm_ops - > irq_bypass_add_producer )
return kvm - > arch . kvm_ops - > irq_bypass_add_producer ( cons , prod ) ;
return 0 ;
}
void kvm_arch_irq_bypass_del_producer ( struct irq_bypass_consumer * cons ,
struct irq_bypass_producer * prod )
{
struct kvm_kernel_irqfd * irqfd =
container_of ( cons , struct kvm_kernel_irqfd , consumer ) ;
struct kvm * kvm = irqfd - > kvm ;
if ( kvm - > arch . kvm_ops - > irq_bypass_del_producer )
kvm - > arch . kvm_ops - > irq_bypass_del_producer ( cons , prod ) ;
}
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# ifdef CONFIG_VSX
static inline int kvmppc_get_vsr_dword_offset ( int index )
{
int offset ;
if ( ( index ! = 0 ) & & ( index ! = 1 ) )
return - 1 ;
# ifdef __BIG_ENDIAN
offset = index ;
# else
offset = 1 - index ;
# endif
return offset ;
}
static inline int kvmppc_get_vsr_word_offset ( int index )
{
int offset ;
if ( ( index > 3 ) | | ( index < 0 ) )
return - 1 ;
# ifdef __BIG_ENDIAN
offset = index ;
# else
offset = 3 - index ;
# endif
return offset ;
}
static inline void kvmppc_set_vsr_dword ( struct kvm_vcpu * vcpu ,
u64 gpr )
{
union kvmppc_one_reg val ;
int offset = kvmppc_get_vsr_dword_offset ( vcpu - > arch . mmio_vsx_offset ) ;
int index = vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ;
if ( offset = = - 1 )
return ;
if ( vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
val . vval = VCPU_VSX_VR ( vcpu , index ) ;
val . vsxval [ offset ] = gpr ;
VCPU_VSX_VR ( vcpu , index ) = val . vval ;
} else {
VCPU_VSX_FPR ( vcpu , index , offset ) = gpr ;
}
}
static inline void kvmppc_set_vsr_dword_dump ( struct kvm_vcpu * vcpu ,
u64 gpr )
{
union kvmppc_one_reg val ;
int index = vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ;
if ( vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
val . vval = VCPU_VSX_VR ( vcpu , index ) ;
val . vsxval [ 0 ] = gpr ;
val . vsxval [ 1 ] = gpr ;
VCPU_VSX_VR ( vcpu , index ) = val . vval ;
} else {
VCPU_VSX_FPR ( vcpu , index , 0 ) = gpr ;
VCPU_VSX_FPR ( vcpu , index , 1 ) = gpr ;
}
}
2018-05-21 13:24:20 +08:00
static inline void kvmppc_set_vsr_word_dump ( struct kvm_vcpu * vcpu ,
u32 gpr )
{
union kvmppc_one_reg val ;
int index = vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ;
if ( vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
val . vsx32val [ 0 ] = gpr ;
val . vsx32val [ 1 ] = gpr ;
val . vsx32val [ 2 ] = gpr ;
val . vsx32val [ 3 ] = gpr ;
VCPU_VSX_VR ( vcpu , index ) = val . vval ;
} else {
val . vsx32val [ 0 ] = gpr ;
val . vsx32val [ 1 ] = gpr ;
VCPU_VSX_FPR ( vcpu , index , 0 ) = val . vsxval [ 0 ] ;
VCPU_VSX_FPR ( vcpu , index , 1 ) = val . vsxval [ 0 ] ;
}
}
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
static inline void kvmppc_set_vsr_word ( struct kvm_vcpu * vcpu ,
u32 gpr32 )
{
union kvmppc_one_reg val ;
int offset = kvmppc_get_vsr_word_offset ( vcpu - > arch . mmio_vsx_offset ) ;
int index = vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ;
int dword_offset , word_offset ;
if ( offset = = - 1 )
return ;
if ( vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
val . vval = VCPU_VSX_VR ( vcpu , index ) ;
val . vsx32val [ offset ] = gpr32 ;
VCPU_VSX_VR ( vcpu , index ) = val . vval ;
} else {
dword_offset = offset / 2 ;
word_offset = offset % 2 ;
val . vsxval [ 0 ] = VCPU_VSX_FPR ( vcpu , index , dword_offset ) ;
val . vsx32val [ word_offset ] = gpr32 ;
VCPU_VSX_FPR ( vcpu , index , dword_offset ) = val . vsxval [ 0 ] ;
}
}
# endif /* CONFIG_VSX */
2018-02-03 18:24:26 -02:00
# ifdef CONFIG_ALTIVEC
static inline void kvmppc_set_vmx_dword ( struct kvm_vcpu * vcpu ,
u64 gpr )
{
int index = vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ;
u32 hi , lo ;
u32 di ;
# ifdef __BIG_ENDIAN
hi = gpr > > 32 ;
lo = gpr & 0xffffffff ;
# else
lo = gpr > > 32 ;
hi = gpr & 0xffffffff ;
# endif
di = 2 - vcpu - > arch . mmio_vmx_copy_nums ; /* doubleword index */
if ( di > 1 )
return ;
if ( vcpu - > arch . mmio_host_swabbed )
di = 1 - di ;
VCPU_VSX_VR ( vcpu , index ) . u [ di * 2 ] = hi ;
VCPU_VSX_VR ( vcpu , index ) . u [ di * 2 + 1 ] = lo ;
}
# endif /* CONFIG_ALTIVEC */
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# ifdef CONFIG_PPC_FPU
static inline u64 sp_to_dp ( u32 fprs )
{
u64 fprd ;
preempt_disable ( ) ;
enable_kernel_fp ( ) ;
asm ( " lfs%U1%X1 0,%1; stfd%U0%X0 0,%0 " : " =m " ( fprd ) : " m " ( fprs )
: " fr0 " ) ;
preempt_enable ( ) ;
return fprd ;
}
static inline u32 dp_to_sp ( u64 fprd )
{
u32 fprs ;
preempt_disable ( ) ;
enable_kernel_fp ( ) ;
asm ( " lfd%U1%X1 0,%1; stfs%U0%X0 0,%0 " : " =m " ( fprs ) : " m " ( fprd )
: " fr0 " ) ;
preempt_enable ( ) ;
return fprs ;
}
# else
# define sp_to_dp(x) (x)
# define dp_to_sp(x) (x)
# endif /* CONFIG_PPC_FPU */
2008-04-16 23:28:09 -05:00
static void kvmppc_complete_mmio_load ( struct kvm_vcpu * vcpu ,
struct kvm_run * run )
{
2010-06-11 11:23:26 +00:00
u64 uninitialized_var ( gpr ) ;
2008-04-16 23:28:09 -05:00
2010-01-08 02:58:01 +01:00
if ( run - > mmio . len > sizeof ( gpr ) ) {
2008-04-16 23:28:09 -05:00
printk ( KERN_ERR " bad MMIO length: %d \n " , run - > mmio . len ) ;
return ;
}
2015-02-03 16:36:24 +11:00
if ( ! vcpu - > arch . mmio_host_swabbed ) {
2008-04-16 23:28:09 -05:00
switch ( run - > mmio . len ) {
2010-02-19 11:00:29 +01:00
case 8 : gpr = * ( u64 * ) run - > mmio . data ; break ;
2010-01-08 02:58:01 +01:00
case 4 : gpr = * ( u32 * ) run - > mmio . data ; break ;
case 2 : gpr = * ( u16 * ) run - > mmio . data ; break ;
case 1 : gpr = * ( u8 * ) run - > mmio . data ; break ;
2008-04-16 23:28:09 -05:00
}
} else {
switch ( run - > mmio . len ) {
2015-02-03 16:36:24 +11:00
case 8 : gpr = swab64 ( * ( u64 * ) run - > mmio . data ) ; break ;
case 4 : gpr = swab32 ( * ( u32 * ) run - > mmio . data ) ; break ;
case 2 : gpr = swab16 ( * ( u16 * ) run - > mmio . data ) ; break ;
2010-01-08 02:58:01 +01:00
case 1 : gpr = * ( u8 * ) run - > mmio . data ; break ;
2008-04-16 23:28:09 -05:00
}
}
2010-01-08 02:58:01 +01:00
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
/* conversion between single and double precision */
if ( ( vcpu - > arch . mmio_sp64_extend ) & & ( run - > mmio . len = = 4 ) )
gpr = sp_to_dp ( gpr ) ;
2010-02-19 11:00:30 +01:00
if ( vcpu - > arch . mmio_sign_extend ) {
switch ( run - > mmio . len ) {
# ifdef CONFIG_PPC64
case 4 :
gpr = ( s64 ) ( s32 ) gpr ;
break ;
# endif
case 2 :
gpr = ( s64 ) ( s16 ) gpr ;
break ;
case 1 :
gpr = ( s64 ) ( s8 ) gpr ;
break ;
}
}
2012-01-07 02:07:38 +01:00
switch ( vcpu - > arch . io_gpr & KVM_MMIO_REG_EXT_MASK ) {
case KVM_MMIO_REG_GPR :
2010-02-19 11:00:29 +01:00
kvmppc_set_gpr ( vcpu , vcpu - > arch . io_gpr , gpr ) ;
break ;
2012-01-07 02:07:38 +01:00
case KVM_MMIO_REG_FPR :
2018-05-21 13:24:22 +08:00
if ( vcpu - > kvm - > arch . kvm_ops - > giveup_ext )
vcpu - > kvm - > arch . kvm_ops - > giveup_ext ( vcpu , MSR_FP ) ;
2013-10-15 20:43:02 +11:00
VCPU_FPR ( vcpu , vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ) = gpr ;
2010-02-19 11:00:29 +01:00
break ;
2010-04-01 15:33:21 +02:00
# ifdef CONFIG_PPC_BOOK3S
2012-01-07 02:07:38 +01:00
case KVM_MMIO_REG_QPR :
vcpu - > arch . qpr [ vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ] = gpr ;
2010-02-19 11:00:29 +01:00
break ;
2012-01-07 02:07:38 +01:00
case KVM_MMIO_REG_FQPR :
2013-10-15 20:43:02 +11:00
VCPU_FPR ( vcpu , vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ) = gpr ;
2012-01-07 02:07:38 +01:00
vcpu - > arch . qpr [ vcpu - > arch . io_gpr & KVM_MMIO_REG_MASK ] = gpr ;
2010-02-19 11:00:29 +01:00
break ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# endif
# ifdef CONFIG_VSX
case KVM_MMIO_REG_VSX :
2018-05-21 13:24:22 +08:00
if ( vcpu - > kvm - > arch . kvm_ops - > giveup_ext )
vcpu - > kvm - > arch . kvm_ops - > giveup_ext ( vcpu , MSR_VSX ) ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
if ( vcpu - > arch . mmio_vsx_copy_type = = KVMPPC_VSX_COPY_DWORD )
kvmppc_set_vsr_dword ( vcpu , gpr ) ;
else if ( vcpu - > arch . mmio_vsx_copy_type = = KVMPPC_VSX_COPY_WORD )
kvmppc_set_vsr_word ( vcpu , gpr ) ;
else if ( vcpu - > arch . mmio_vsx_copy_type = =
KVMPPC_VSX_COPY_DWORD_LOAD_DUMP )
kvmppc_set_vsr_dword_dump ( vcpu , gpr ) ;
2018-05-21 13:24:20 +08:00
else if ( vcpu - > arch . mmio_vsx_copy_type = =
KVMPPC_VSX_COPY_WORD_LOAD_DUMP )
kvmppc_set_vsr_word_dump ( vcpu , gpr ) ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
break ;
2018-02-03 18:24:26 -02:00
# endif
# ifdef CONFIG_ALTIVEC
case KVM_MMIO_REG_VMX :
2018-05-21 13:24:22 +08:00
if ( vcpu - > kvm - > arch . kvm_ops - > giveup_ext )
vcpu - > kvm - > arch . kvm_ops - > giveup_ext ( vcpu , MSR_VEC ) ;
2018-02-03 18:24:26 -02:00
kvmppc_set_vmx_dword ( vcpu , gpr ) ;
break ;
2010-04-01 15:33:21 +02:00
# endif
2010-02-19 11:00:29 +01:00
default :
BUG ( ) ;
}
2008-04-16 23:28:09 -05:00
}
2016-05-05 16:17:10 +10:00
static int __kvmppc_handle_load ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
unsigned int rt , unsigned int bytes ,
int is_default_endian , int sign_extend )
2008-04-16 23:28:09 -05:00
{
2013-04-26 14:53:39 +00:00
int idx , ret ;
2015-02-03 16:36:24 +11:00
bool host_swabbed ;
2014-01-09 11:51:16 +01:00
2015-02-03 16:36:24 +11:00
/* Pity C doesn't have a logical XOR operator */
2014-01-09 11:51:16 +01:00
if ( kvmppc_need_byteswap ( vcpu ) ) {
2015-02-03 16:36:24 +11:00
host_swabbed = is_default_endian ;
2014-01-09 11:51:16 +01:00
} else {
2015-02-03 16:36:24 +11:00
host_swabbed = ! is_default_endian ;
2014-01-09 11:51:16 +01:00
}
2013-04-26 14:53:39 +00:00
2008-04-16 23:28:09 -05:00
if ( bytes > sizeof ( run - > mmio . data ) ) {
printk ( KERN_ERR " %s: bad MMIO length: %d \n " , __func__ ,
run - > mmio . len ) ;
}
run - > mmio . phys_addr = vcpu - > arch . paddr_accessed ;
run - > mmio . len = bytes ;
run - > mmio . is_write = 0 ;
vcpu - > arch . io_gpr = rt ;
2015-02-03 16:36:24 +11:00
vcpu - > arch . mmio_host_swabbed = host_swabbed ;
2008-04-16 23:28:09 -05:00
vcpu - > mmio_needed = 1 ;
vcpu - > mmio_is_write = 0 ;
2016-05-05 16:17:10 +10:00
vcpu - > arch . mmio_sign_extend = sign_extend ;
2008-04-16 23:28:09 -05:00
2013-04-26 14:53:39 +00:00
idx = srcu_read_lock ( & vcpu - > kvm - > srcu ) ;
2015-03-26 14:39:28 +00:00
ret = kvm_io_bus_read ( vcpu , KVM_MMIO_BUS , run - > mmio . phys_addr ,
2013-04-26 14:53:39 +00:00
bytes , & run - > mmio . data ) ;
srcu_read_unlock ( & vcpu - > kvm - > srcu , idx ) ;
if ( ! ret ) {
2012-10-09 00:06:20 +02:00
kvmppc_complete_mmio_load ( vcpu , run ) ;
vcpu - > mmio_needed = 0 ;
return EMULATE_DONE ;
}
2008-04-16 23:28:09 -05:00
return EMULATE_DO_MMIO ;
}
2016-05-05 16:17:10 +10:00
int kvmppc_handle_load ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
unsigned int rt , unsigned int bytes ,
int is_default_endian )
{
return __kvmppc_handle_load ( run , vcpu , rt , bytes , is_default_endian , 0 ) ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_handle_load ) ;
2008-04-16 23:28:09 -05:00
2010-02-19 11:00:30 +01:00
/* Same as above, but sign extends */
int kvmppc_handle_loads ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
2014-01-09 11:51:16 +01:00
unsigned int rt , unsigned int bytes ,
int is_default_endian )
2010-02-19 11:00:30 +01:00
{
2016-05-05 16:17:10 +10:00
return __kvmppc_handle_load ( run , vcpu , rt , bytes , is_default_endian , 1 ) ;
2010-02-19 11:00:30 +01:00
}
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# ifdef CONFIG_VSX
int kvmppc_handle_vsx_load ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
unsigned int rt , unsigned int bytes ,
int is_default_endian , int mmio_sign_extend )
{
enum emulation_result emulated = EMULATE_DONE ;
2017-11-20 19:56:27 +11:00
/* Currently, mmio_vsx_copy_nums only allowed to be 4 or less */
if ( vcpu - > arch . mmio_vsx_copy_nums > 4 )
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
return EMULATE_FAIL ;
while ( vcpu - > arch . mmio_vsx_copy_nums ) {
emulated = __kvmppc_handle_load ( run , vcpu , rt , bytes ,
is_default_endian , mmio_sign_extend ) ;
if ( emulated ! = EMULATE_DONE )
break ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
vcpu - > arch . mmio_vsx_copy_nums - - ;
vcpu - > arch . mmio_vsx_offset + + ;
}
return emulated ;
}
# endif /* CONFIG_VSX */
2008-04-16 23:28:09 -05:00
int kvmppc_handle_store ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
2014-01-09 11:51:16 +01:00
u64 val , unsigned int bytes , int is_default_endian )
2008-04-16 23:28:09 -05:00
{
void * data = run - > mmio . data ;
2013-04-26 14:53:39 +00:00
int idx , ret ;
2015-02-03 16:36:24 +11:00
bool host_swabbed ;
2014-01-09 11:51:16 +01:00
2015-02-03 16:36:24 +11:00
/* Pity C doesn't have a logical XOR operator */
2014-01-09 11:51:16 +01:00
if ( kvmppc_need_byteswap ( vcpu ) ) {
2015-02-03 16:36:24 +11:00
host_swabbed = is_default_endian ;
2014-01-09 11:51:16 +01:00
} else {
2015-02-03 16:36:24 +11:00
host_swabbed = ! is_default_endian ;
2014-01-09 11:51:16 +01:00
}
2008-04-16 23:28:09 -05:00
if ( bytes > sizeof ( run - > mmio . data ) ) {
printk ( KERN_ERR " %s: bad MMIO length: %d \n " , __func__ ,
run - > mmio . len ) ;
}
run - > mmio . phys_addr = vcpu - > arch . paddr_accessed ;
run - > mmio . len = bytes ;
run - > mmio . is_write = 1 ;
vcpu - > mmio_needed = 1 ;
vcpu - > mmio_is_write = 1 ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
if ( ( vcpu - > arch . mmio_sp64_extend ) & & ( bytes = = 4 ) )
val = dp_to_sp ( val ) ;
2008-04-16 23:28:09 -05:00
/* Store the value at the lowest bytes in 'data'. */
2015-02-03 16:36:24 +11:00
if ( ! host_swabbed ) {
2008-04-16 23:28:09 -05:00
switch ( bytes ) {
2010-02-19 11:00:29 +01:00
case 8 : * ( u64 * ) data = val ; break ;
2008-04-16 23:28:09 -05:00
case 4 : * ( u32 * ) data = val ; break ;
case 2 : * ( u16 * ) data = val ; break ;
case 1 : * ( u8 * ) data = val ; break ;
}
} else {
switch ( bytes ) {
2015-02-03 16:36:24 +11:00
case 8 : * ( u64 * ) data = swab64 ( val ) ; break ;
case 4 : * ( u32 * ) data = swab32 ( val ) ; break ;
case 2 : * ( u16 * ) data = swab16 ( val ) ; break ;
case 1 : * ( u8 * ) data = val ; break ;
2008-04-16 23:28:09 -05:00
}
}
2013-04-26 14:53:39 +00:00
idx = srcu_read_lock ( & vcpu - > kvm - > srcu ) ;
2015-03-26 14:39:28 +00:00
ret = kvm_io_bus_write ( vcpu , KVM_MMIO_BUS , run - > mmio . phys_addr ,
2013-04-26 14:53:39 +00:00
bytes , & run - > mmio . data ) ;
srcu_read_unlock ( & vcpu - > kvm - > srcu , idx ) ;
if ( ! ret ) {
2012-10-09 00:06:20 +02:00
vcpu - > mmio_needed = 0 ;
return EMULATE_DONE ;
}
2008-04-16 23:28:09 -05:00
return EMULATE_DO_MMIO ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_handle_store ) ;
2008-04-16 23:28:09 -05:00
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# ifdef CONFIG_VSX
static inline int kvmppc_get_vsr_data ( struct kvm_vcpu * vcpu , int rs , u64 * val )
{
u32 dword_offset , word_offset ;
union kvmppc_one_reg reg ;
int vsx_offset = 0 ;
int copy_type = vcpu - > arch . mmio_vsx_copy_type ;
int result = 0 ;
switch ( copy_type ) {
case KVMPPC_VSX_COPY_DWORD :
vsx_offset =
kvmppc_get_vsr_dword_offset ( vcpu - > arch . mmio_vsx_offset ) ;
if ( vsx_offset = = - 1 ) {
result = - 1 ;
break ;
}
if ( ! vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
* val = VCPU_VSX_FPR ( vcpu , rs , vsx_offset ) ;
} else {
reg . vval = VCPU_VSX_VR ( vcpu , rs ) ;
* val = reg . vsxval [ vsx_offset ] ;
}
break ;
case KVMPPC_VSX_COPY_WORD :
vsx_offset =
kvmppc_get_vsr_word_offset ( vcpu - > arch . mmio_vsx_offset ) ;
if ( vsx_offset = = - 1 ) {
result = - 1 ;
break ;
}
if ( ! vcpu - > arch . mmio_vsx_tx_sx_enabled ) {
dword_offset = vsx_offset / 2 ;
word_offset = vsx_offset % 2 ;
reg . vsxval [ 0 ] = VCPU_VSX_FPR ( vcpu , rs , dword_offset ) ;
* val = reg . vsx32val [ word_offset ] ;
} else {
reg . vval = VCPU_VSX_VR ( vcpu , rs ) ;
* val = reg . vsx32val [ vsx_offset ] ;
}
break ;
default :
result = - 1 ;
break ;
}
return result ;
}
int kvmppc_handle_vsx_store ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
int rs , unsigned int bytes , int is_default_endian )
{
u64 val ;
enum emulation_result emulated = EMULATE_DONE ;
vcpu - > arch . io_gpr = rs ;
2017-11-20 19:56:27 +11:00
/* Currently, mmio_vsx_copy_nums only allowed to be 4 or less */
if ( vcpu - > arch . mmio_vsx_copy_nums > 4 )
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
return EMULATE_FAIL ;
while ( vcpu - > arch . mmio_vsx_copy_nums ) {
if ( kvmppc_get_vsr_data ( vcpu , rs , & val ) = = - 1 )
return EMULATE_FAIL ;
emulated = kvmppc_handle_store ( run , vcpu ,
val , bytes , is_default_endian ) ;
if ( emulated ! = EMULATE_DONE )
break ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
vcpu - > arch . mmio_vsx_copy_nums - - ;
vcpu - > arch . mmio_vsx_offset + + ;
}
return emulated ;
}
static int kvmppc_emulate_mmio_vsx_loadstore ( struct kvm_vcpu * vcpu ,
struct kvm_run * run )
{
enum emulation_result emulated = EMULATE_FAIL ;
int r ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
if ( ! vcpu - > mmio_is_write ) {
emulated = kvmppc_handle_vsx_load ( run , vcpu , vcpu - > arch . io_gpr ,
run - > mmio . len , 1 , vcpu - > arch . mmio_sign_extend ) ;
} else {
emulated = kvmppc_handle_vsx_store ( run , vcpu ,
vcpu - > arch . io_gpr , run - > mmio . len , 1 ) ;
}
switch ( emulated ) {
case EMULATE_DO_MMIO :
run - > exit_reason = KVM_EXIT_MMIO ;
r = RESUME_HOST ;
break ;
case EMULATE_FAIL :
pr_info ( " KVM: MMIO emulation failed (VSX repeat) \n " ) ;
run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
run - > internal . suberror = KVM_INTERNAL_ERROR_EMULATION ;
r = RESUME_HOST ;
break ;
default :
r = RESUME_GUEST ;
break ;
}
return r ;
}
# endif /* CONFIG_VSX */
2018-02-03 18:24:26 -02:00
# ifdef CONFIG_ALTIVEC
/* handle quadword load access in two halves */
int kvmppc_handle_load128_by2x64 ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
unsigned int rt , int is_default_endian )
{
2018-02-13 15:45:21 +11:00
enum emulation_result emulated = EMULATE_DONE ;
2018-02-03 18:24:26 -02:00
while ( vcpu - > arch . mmio_vmx_copy_nums ) {
emulated = __kvmppc_handle_load ( run , vcpu , rt , 8 ,
is_default_endian , 0 ) ;
if ( emulated ! = EMULATE_DONE )
break ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
vcpu - > arch . mmio_vmx_copy_nums - - ;
}
return emulated ;
}
static inline int kvmppc_get_vmx_data ( struct kvm_vcpu * vcpu , int rs , u64 * val )
{
vector128 vrs = VCPU_VSX_VR ( vcpu , rs ) ;
u32 di ;
u64 w0 , w1 ;
di = 2 - vcpu - > arch . mmio_vmx_copy_nums ; /* doubleword index */
if ( di > 1 )
return - 1 ;
2018-05-07 14:20:09 +08:00
if ( kvmppc_need_byteswap ( vcpu ) )
2018-02-03 18:24:26 -02:00
di = 1 - di ;
w0 = vrs . u [ di * 2 ] ;
w1 = vrs . u [ di * 2 + 1 ] ;
# ifdef __BIG_ENDIAN
* val = ( w0 < < 32 ) | w1 ;
# else
* val = ( w1 < < 32 ) | w0 ;
# endif
return 0 ;
}
/* handle quadword store in two halves */
int kvmppc_handle_store128_by2x64 ( struct kvm_run * run , struct kvm_vcpu * vcpu ,
unsigned int rs , int is_default_endian )
{
u64 val = 0 ;
enum emulation_result emulated = EMULATE_DONE ;
vcpu - > arch . io_gpr = rs ;
while ( vcpu - > arch . mmio_vmx_copy_nums ) {
if ( kvmppc_get_vmx_data ( vcpu , rs , & val ) = = - 1 )
return EMULATE_FAIL ;
emulated = kvmppc_handle_store ( run , vcpu , val , 8 ,
is_default_endian ) ;
if ( emulated ! = EMULATE_DONE )
break ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
vcpu - > arch . mmio_vmx_copy_nums - - ;
}
return emulated ;
}
static int kvmppc_emulate_mmio_vmx_loadstore ( struct kvm_vcpu * vcpu ,
struct kvm_run * run )
{
enum emulation_result emulated = EMULATE_FAIL ;
int r ;
vcpu - > arch . paddr_accessed + = run - > mmio . len ;
if ( ! vcpu - > mmio_is_write ) {
emulated = kvmppc_handle_load128_by2x64 ( run , vcpu ,
vcpu - > arch . io_gpr , 1 ) ;
} else {
emulated = kvmppc_handle_store128_by2x64 ( run , vcpu ,
vcpu - > arch . io_gpr , 1 ) ;
}
switch ( emulated ) {
case EMULATE_DO_MMIO :
run - > exit_reason = KVM_EXIT_MMIO ;
r = RESUME_HOST ;
break ;
case EMULATE_FAIL :
pr_info ( " KVM: MMIO emulation failed (VMX repeat) \n " ) ;
run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
run - > internal . suberror = KVM_INTERNAL_ERROR_EMULATION ;
r = RESUME_HOST ;
break ;
default :
r = RESUME_GUEST ;
break ;
}
return r ;
}
# endif /* CONFIG_ALTIVEC */
2014-08-20 16:36:24 +03:00
int kvm_vcpu_ioctl_get_one_reg ( struct kvm_vcpu * vcpu , struct kvm_one_reg * reg )
{
int r = 0 ;
union kvmppc_one_reg val ;
int size ;
size = one_reg_size ( reg - > id ) ;
if ( size > sizeof ( val ) )
return - EINVAL ;
r = kvmppc_get_one_reg ( vcpu , reg - > id , & val ) ;
if ( r = = - EINVAL ) {
r = 0 ;
switch ( reg - > id ) {
2014-08-20 16:36:25 +03:00
# ifdef CONFIG_ALTIVEC
case KVM_REG_PPC_VR0 . . . KVM_REG_PPC_VR31 :
if ( ! cpu_has_feature ( CPU_FTR_ALTIVEC ) ) {
r = - ENXIO ;
break ;
}
2016-01-13 18:28:17 +01:00
val . vval = vcpu - > arch . vr . vr [ reg - > id - KVM_REG_PPC_VR0 ] ;
2014-08-20 16:36:25 +03:00
break ;
case KVM_REG_PPC_VSCR :
if ( ! cpu_has_feature ( CPU_FTR_ALTIVEC ) ) {
r = - ENXIO ;
break ;
}
2016-01-13 18:28:17 +01:00
val = get_reg_val ( reg - > id , vcpu - > arch . vr . vscr . u [ 3 ] ) ;
2014-08-20 16:36:25 +03:00
break ;
case KVM_REG_PPC_VRSAVE :
2016-01-13 18:28:17 +01:00
val = get_reg_val ( reg - > id , vcpu - > arch . vrsave ) ;
2014-08-20 16:36:25 +03:00
break ;
# endif /* CONFIG_ALTIVEC */
2014-08-20 16:36:24 +03:00
default :
r = - EINVAL ;
break ;
}
}
if ( r )
return r ;
if ( copy_to_user ( ( char __user * ) ( unsigned long ) reg - > addr , & val , size ) )
r = - EFAULT ;
return r ;
}
int kvm_vcpu_ioctl_set_one_reg ( struct kvm_vcpu * vcpu , struct kvm_one_reg * reg )
{
int r ;
union kvmppc_one_reg val ;
int size ;
size = one_reg_size ( reg - > id ) ;
if ( size > sizeof ( val ) )
return - EINVAL ;
if ( copy_from_user ( & val , ( char __user * ) ( unsigned long ) reg - > addr , size ) )
return - EFAULT ;
r = kvmppc_set_one_reg ( vcpu , reg - > id , & val ) ;
if ( r = = - EINVAL ) {
r = 0 ;
switch ( reg - > id ) {
2014-08-20 16:36:25 +03:00
# ifdef CONFIG_ALTIVEC
case KVM_REG_PPC_VR0 . . . KVM_REG_PPC_VR31 :
if ( ! cpu_has_feature ( CPU_FTR_ALTIVEC ) ) {
r = - ENXIO ;
break ;
}
2016-01-13 18:28:17 +01:00
vcpu - > arch . vr . vr [ reg - > id - KVM_REG_PPC_VR0 ] = val . vval ;
2014-08-20 16:36:25 +03:00
break ;
case KVM_REG_PPC_VSCR :
if ( ! cpu_has_feature ( CPU_FTR_ALTIVEC ) ) {
r = - ENXIO ;
break ;
}
2016-01-13 18:28:17 +01:00
vcpu - > arch . vr . vscr . u [ 3 ] = set_reg_val ( reg - > id , val ) ;
2014-08-20 16:36:25 +03:00
break ;
case KVM_REG_PPC_VRSAVE :
2016-01-13 18:28:17 +01:00
if ( ! cpu_has_feature ( CPU_FTR_ALTIVEC ) ) {
r = - ENXIO ;
break ;
}
vcpu - > arch . vrsave = set_reg_val ( reg - > id , val ) ;
2014-08-20 16:36:25 +03:00
break ;
# endif /* CONFIG_ALTIVEC */
2014-08-20 16:36:24 +03:00
default :
r = - EINVAL ;
break ;
}
}
return r ;
}
2008-04-16 23:28:09 -05:00
int kvm_arch_vcpu_ioctl_run ( struct kvm_vcpu * vcpu , struct kvm_run * run )
{
int r ;
2017-12-04 21:35:25 +01:00
vcpu_load ( vcpu ) ;
2008-04-16 23:28:09 -05:00
if ( vcpu - > mmio_needed ) {
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
vcpu - > mmio_needed = 0 ;
2008-04-16 23:28:09 -05:00
if ( ! vcpu - > mmio_is_write )
kvmppc_complete_mmio_load ( vcpu , run ) ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# ifdef CONFIG_VSX
if ( vcpu - > arch . mmio_vsx_copy_nums > 0 ) {
vcpu - > arch . mmio_vsx_copy_nums - - ;
vcpu - > arch . mmio_vsx_offset + + ;
}
if ( vcpu - > arch . mmio_vsx_copy_nums > 0 ) {
r = kvmppc_emulate_mmio_vsx_loadstore ( vcpu , run ) ;
if ( r = = RESUME_HOST ) {
vcpu - > mmio_needed = 1 ;
2017-12-04 21:35:25 +01:00
goto out ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
}
}
2018-02-03 18:24:26 -02:00
# endif
# ifdef CONFIG_ALTIVEC
if ( vcpu - > arch . mmio_vmx_copy_nums > 0 )
vcpu - > arch . mmio_vmx_copy_nums - - ;
if ( vcpu - > arch . mmio_vmx_copy_nums > 0 ) {
r = kvmppc_emulate_mmio_vmx_loadstore ( vcpu , run ) ;
if ( r = = RESUME_HOST ) {
vcpu - > mmio_needed = 1 ;
2018-02-09 21:36:57 +01:00
goto out ;
2018-02-03 18:24:26 -02:00
}
}
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
# endif
2010-03-24 21:48:30 +01:00
} else if ( vcpu - > arch . osi_needed ) {
u64 * gprs = run - > osi . gprs ;
int i ;
for ( i = 0 ; i < 32 ; i + + )
kvmppc_set_gpr ( vcpu , i , gprs [ i ] ) ;
vcpu - > arch . osi_needed = 0 ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:21:34 +00:00
} else if ( vcpu - > arch . hcall_needed ) {
int i ;
kvmppc_set_gpr ( vcpu , 3 , run - > papr_hcall . ret ) ;
for ( i = 0 ; i < 9 ; + + i )
kvmppc_set_gpr ( vcpu , 4 + i , run - > papr_hcall . args [ i ] ) ;
vcpu - > arch . hcall_needed = 0 ;
2013-01-04 18:12:48 +01:00
# ifdef CONFIG_BOOKE
} else if ( vcpu - > arch . epr_needed ) {
kvmppc_set_epr ( vcpu , run - > epr . epr ) ;
vcpu - > arch . epr_needed = 0 ;
# endif
2008-04-16 23:28:09 -05:00
}
2017-11-24 22:39:01 +01:00
kvm_sigset_activate ( vcpu ) ;
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.
The instructions that this adds emulation for are:
- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x
[paulus@ozlabs.org - some cleanups, fixes and rework, make it
compile for Book E, fix build when PR KVM is built in]
Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-21 21:12:36 +08:00
2017-02-08 11:50:15 +01:00
if ( run - > immediate_exit )
r = - EINTR ;
else
r = kvmppc_vcpu_run ( run , vcpu ) ;
2008-04-16 23:28:09 -05:00
2017-11-24 22:39:01 +01:00
kvm_sigset_deactivate ( vcpu ) ;
2008-04-16 23:28:09 -05:00
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n
Commit accb757d798c ("KVM: Move vcpu_load to arch-specific
kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
Since the only "goto out" is inside a CONFIG_VSX block,
compiling with CONFIG_VSX=n gives a warning that label "out"
is defined but not used, and because arch/powerpc is compiled
with -Werror, that becomes a compile error that makes the kernel
build fail.
Merge commit 1ab03c072feb ("Merge tag 'kvm-ppc-next-4.16-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
2018-02-09) added a similar block of code inside a #ifdef
CONFIG_ALTIVEC, with a "goto out" statement.
In order to make the build succeed, this adds a #ifdef around the
"out:" label. This is a minimal, ugly fix, to be replaced later
by a refactoring of the code. Since CONFIG_VSX depends on
CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.
Fixes: accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-13 15:16:01 +11:00
# ifdef CONFIG_ALTIVEC
2017-12-04 21:35:25 +01:00
out :
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n
Commit accb757d798c ("KVM: Move vcpu_load to arch-specific
kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
Since the only "goto out" is inside a CONFIG_VSX block,
compiling with CONFIG_VSX=n gives a warning that label "out"
is defined but not used, and because arch/powerpc is compiled
with -Werror, that becomes a compile error that makes the kernel
build fail.
Merge commit 1ab03c072feb ("Merge tag 'kvm-ppc-next-4.16-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
2018-02-09) added a similar block of code inside a #ifdef
CONFIG_ALTIVEC, with a "goto out" statement.
In order to make the build succeed, this adds a #ifdef around the
"out:" label. This is a minimal, ugly fix, to be replaced later
by a refactoring of the code. Since CONFIG_VSX depends on
CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.
Fixes: accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-13 15:16:01 +11:00
# endif
2017-12-04 21:35:25 +01:00
vcpu_put ( vcpu ) ;
2008-04-16 23:28:09 -05:00
return r ;
}
int kvm_vcpu_ioctl_interrupt ( struct kvm_vcpu * vcpu , struct kvm_interrupt * irq )
{
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-23 17:42:46 +10:00
if ( irq - > irq = = KVM_INTERRUPT_UNSET ) {
2013-02-14 14:00:25 +00:00
kvmppc_core_dequeue_external ( vcpu ) ;
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-23 17:42:46 +10:00
return 0 ;
}
kvmppc_core_queue_external ( vcpu , irq ) ;
2012-03-08 16:44:24 -05:00
2011-11-17 12:39:59 +00:00
kvm_vcpu_kick ( vcpu ) ;
2008-04-25 17:55:49 -05:00
2008-04-16 23:28:09 -05:00
return 0 ;
}
2010-03-24 21:48:29 +01:00
static int kvm_vcpu_ioctl_enable_cap ( struct kvm_vcpu * vcpu ,
struct kvm_enable_cap * cap )
{
int r ;
if ( cap - > flags )
return - EINVAL ;
switch ( cap - > cap ) {
2010-03-24 21:48:30 +01:00
case KVM_CAP_PPC_OSI :
r = 0 ;
vcpu - > arch . osi_enabled = true ;
break ;
2011-08-08 17:29:42 +02:00
case KVM_CAP_PPC_PAPR :
r = 0 ;
vcpu - > arch . papr_enabled = true ;
break ;
2013-01-04 18:12:48 +01:00
case KVM_CAP_PPC_EPR :
r = 0 ;
2013-04-12 14:08:46 +00:00
if ( cap - > args [ 0 ] )
vcpu - > arch . epr_flags | = KVMPPC_EPR_USER ;
else
vcpu - > arch . epr_flags & = ~ KVMPPC_EPR_USER ;
2013-01-04 18:12:48 +01:00
break ;
2012-08-08 20:38:19 +00:00
# ifdef CONFIG_BOOKE
case KVM_CAP_PPC_BOOKE_WATCHDOG :
r = 0 ;
vcpu - > arch . watchdog_enabled = true ;
break ;
# endif
2012-02-15 23:40:00 +00:00
# if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
2011-08-18 15:25:21 -05:00
case KVM_CAP_SW_TLB : {
struct kvm_config_tlb cfg ;
void __user * user_ptr = ( void __user * ) ( uintptr_t ) cap - > args [ 0 ] ;
r = - EFAULT ;
if ( copy_from_user ( & cfg , user_ptr , sizeof ( cfg ) ) )
break ;
r = kvm_vcpu_ioctl_config_tlb ( vcpu , & cfg ) ;
break ;
2013-04-12 14:08:47 +00:00
}
# endif
# ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC : {
2013-08-30 15:04:22 -04:00
struct fd f ;
2013-04-12 14:08:47 +00:00
struct kvm_device * dev ;
r = - EBADF ;
2013-08-30 15:04:22 -04:00
f = fdget ( cap - > args [ 0 ] ) ;
if ( ! f . file )
2013-04-12 14:08:47 +00:00
break ;
r = - EPERM ;
2013-08-30 15:04:22 -04:00
dev = kvm_device_from_filp ( f . file ) ;
2013-04-12 14:08:47 +00:00
if ( dev )
r = kvmppc_mpic_connect_vcpu ( dev , vcpu , cap - > args [ 1 ] ) ;
2013-08-30 15:04:22 -04:00
fdput ( f ) ;
2013-04-12 14:08:47 +00:00
break ;
2011-08-18 15:25:21 -05:00
}
# endif
2013-04-27 00:28:37 +00:00
# ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS : {
2013-08-30 15:04:22 -04:00
struct fd f ;
2013-04-27 00:28:37 +00:00
struct kvm_device * dev ;
r = - EBADF ;
2013-08-30 15:04:22 -04:00
f = fdget ( cap - > args [ 0 ] ) ;
if ( ! f . file )
2013-04-27 00:28:37 +00:00
break ;
r = - EPERM ;
2013-08-30 15:04:22 -04:00
dev = kvm_device_from_filp ( f . file ) ;
2017-04-05 17:54:56 +10:00
if ( dev ) {
if ( xive_enabled ( ) )
r = kvmppc_xive_connect_vcpu ( dev , vcpu , cap - > args [ 1 ] ) ;
else
r = kvmppc_xics_connect_vcpu ( dev , vcpu , cap - > args [ 1 ] ) ;
}
2013-04-27 00:28:37 +00:00
2013-08-30 15:04:22 -04:00
fdput ( f ) ;
2013-04-27 00:28:37 +00:00
break ;
}
# endif /* CONFIG_KVM_XICS */
2017-05-11 16:32:48 +05:30
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI :
r = - EINVAL ;
if ( ! is_kvmppc_hv_enabled ( vcpu - > kvm ) )
break ;
r = 0 ;
vcpu - > kvm - > arch . fwnmi_enabled = true ;
break ;
# endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
2010-03-24 21:48:29 +01:00
default :
r = - EINVAL ;
break ;
}
2011-08-10 13:57:08 +02:00
if ( ! r )
r = kvmppc_sanity_check ( vcpu ) ;
2010-03-24 21:48:29 +01:00
return r ;
}
2016-08-10 11:27:27 +10:00
bool kvm_arch_intc_initialized ( struct kvm * kvm )
{
# ifdef CONFIG_KVM_MPIC
if ( kvm - > arch . mpic )
return true ;
# endif
# ifdef CONFIG_KVM_XICS
2017-04-05 17:54:56 +10:00
if ( kvm - > arch . xics | | kvm - > arch . xive )
2016-08-10 11:27:27 +10:00
return true ;
# endif
return false ;
}
2008-04-16 23:28:09 -05:00
int kvm_arch_vcpu_ioctl_get_mpstate ( struct kvm_vcpu * vcpu ,
struct kvm_mp_state * mp_state )
{
return - EINVAL ;
}
int kvm_arch_vcpu_ioctl_set_mpstate ( struct kvm_vcpu * vcpu ,
struct kvm_mp_state * mp_state )
{
return - EINVAL ;
}
2017-12-12 17:41:34 +01:00
long kvm_arch_vcpu_async_ioctl ( struct file * filp ,
unsigned int ioctl , unsigned long arg )
2008-04-16 23:28:09 -05:00
{
struct kvm_vcpu * vcpu = filp - > private_data ;
void __user * argp = ( void __user * ) arg ;
2017-12-04 21:35:36 +01:00
if ( ioctl = = KVM_INTERRUPT ) {
2008-04-16 23:28:09 -05:00
struct kvm_interrupt irq ;
if ( copy_from_user ( & irq , argp , sizeof ( irq ) ) )
2017-12-04 21:35:36 +01:00
return - EFAULT ;
return kvm_vcpu_ioctl_interrupt ( vcpu , & irq ) ;
2008-04-16 23:28:09 -05:00
}
2017-12-12 17:41:34 +01:00
return - ENOIOCTLCMD ;
}
long kvm_arch_vcpu_ioctl ( struct file * filp ,
unsigned int ioctl , unsigned long arg )
{
struct kvm_vcpu * vcpu = filp - > private_data ;
void __user * argp = ( void __user * ) arg ;
long r ;
2010-05-13 12:30:43 +03:00
2017-12-04 21:35:36 +01:00
vcpu_load ( vcpu ) ;
switch ( ioctl ) {
2010-03-24 21:48:29 +01:00
case KVM_ENABLE_CAP :
{
struct kvm_enable_cap cap ;
r = - EFAULT ;
if ( copy_from_user ( & cap , argp , sizeof ( cap ) ) )
goto out ;
r = kvm_vcpu_ioctl_enable_cap ( vcpu , & cap ) ;
break ;
}
2011-08-18 15:25:21 -05:00
2011-09-14 10:02:41 +02:00
case KVM_SET_ONE_REG :
case KVM_GET_ONE_REG :
{
struct kvm_one_reg reg ;
r = - EFAULT ;
if ( copy_from_user ( & reg , argp , sizeof ( reg ) ) )
goto out ;
if ( ioctl = = KVM_SET_ONE_REG )
r = kvm_vcpu_ioctl_set_one_reg ( vcpu , & reg ) ;
else
r = kvm_vcpu_ioctl_get_one_reg ( vcpu , & reg ) ;
break ;
}
2012-02-15 23:40:00 +00:00
# if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
2011-08-18 15:25:21 -05:00
case KVM_DIRTY_TLB : {
struct kvm_dirty_tlb dirty ;
r = - EFAULT ;
if ( copy_from_user ( & dirty , argp , sizeof ( dirty ) ) )
goto out ;
r = kvm_vcpu_ioctl_dirty_tlb ( vcpu , & dirty ) ;
break ;
}
# endif
2008-04-16 23:28:09 -05:00
default :
r = - EINVAL ;
}
out :
2017-12-04 21:35:36 +01:00
vcpu_put ( vcpu ) ;
2008-04-16 23:28:09 -05:00
return r ;
}
2012-01-04 10:25:23 +01:00
int kvm_arch_vcpu_fault ( struct kvm_vcpu * vcpu , struct vm_fault * vmf )
{
return VM_FAULT_SIGBUS ;
}
2010-07-29 14:48:08 +02:00
static int kvm_vm_ioctl_get_pvinfo ( struct kvm_ppc_pvinfo * pvinfo )
{
2012-07-03 05:48:51 +00:00
u32 inst_nop = 0x60000000 ;
# ifdef CONFIG_KVM_BOOKE_HV
u32 inst_sc1 = 0x44000022 ;
2014-04-24 13:39:16 +02:00
pvinfo - > hcall [ 0 ] = cpu_to_be32 ( inst_sc1 ) ;
pvinfo - > hcall [ 1 ] = cpu_to_be32 ( inst_nop ) ;
pvinfo - > hcall [ 2 ] = cpu_to_be32 ( inst_nop ) ;
pvinfo - > hcall [ 3 ] = cpu_to_be32 ( inst_nop ) ;
2012-07-03 05:48:51 +00:00
# else
2010-07-29 14:48:08 +02:00
u32 inst_lis = 0x3c000000 ;
u32 inst_ori = 0x60000000 ;
u32 inst_sc = 0x44000002 ;
u32 inst_imm_mask = 0xffff ;
/*
* The hypercall to get into KVM from within guest context is as
* follows :
*
* lis r0 , r0 , KVM_SC_MAGIC_R0 @ h
* ori r0 , KVM_SC_MAGIC_R0 @ l
* sc
* nop
*/
2014-04-24 13:39:16 +02:00
pvinfo - > hcall [ 0 ] = cpu_to_be32 ( inst_lis | ( ( KVM_SC_MAGIC_R0 > > 16 ) & inst_imm_mask ) ) ;
pvinfo - > hcall [ 1 ] = cpu_to_be32 ( inst_ori | ( KVM_SC_MAGIC_R0 & inst_imm_mask ) ) ;
pvinfo - > hcall [ 2 ] = cpu_to_be32 ( inst_sc ) ;
pvinfo - > hcall [ 3 ] = cpu_to_be32 ( inst_nop ) ;
2012-07-03 05:48:51 +00:00
# endif
2010-07-29 14:48:08 +02:00
2012-07-03 05:48:52 +00:00
pvinfo - > flags = KVM_PPC_PVINFO_FLAGS_EV_IDLE ;
2010-07-29 14:48:08 +02:00
return 0 ;
}
2013-04-17 00:37:57 +02:00
int kvm_vm_ioctl_irq_line ( struct kvm * kvm , struct kvm_irq_level * irq_event ,
bool line_status )
{
if ( ! irqchip_in_kernel ( kvm ) )
return - ENXIO ;
irq_event - > status = kvm_set_irq ( kvm , KVM_USERSPACE_IRQ_SOURCE_ID ,
irq_event - > irq , irq_event - > level ,
line_status ) ;
return 0 ;
}
2014-06-02 11:02:59 +10:00
static int kvm_vm_ioctl_enable_cap ( struct kvm * kvm ,
struct kvm_enable_cap * cap )
{
int r ;
if ( cap - > flags )
return - EINVAL ;
switch ( cap - > cap ) {
# ifdef CONFIG_KVM_BOOK3S_64_HANDLER
case KVM_CAP_PPC_ENABLE_HCALL : {
unsigned long hcall = cap - > args [ 0 ] ;
r = - EINVAL ;
if ( hcall > MAX_HCALL_OPCODE | | ( hcall & 3 ) | |
cap - > args [ 1 ] > 1 )
break ;
2014-06-02 11:03:00 +10:00
if ( ! kvmppc_book3s_hcall_implemented ( kvm , hcall ) )
break ;
2014-06-02 11:02:59 +10:00
if ( cap - > args [ 1 ] )
set_bit ( hcall / 4 , kvm - > arch . enabled_hcalls ) ;
else
clear_bit ( hcall / 4 , kvm - > arch . enabled_hcalls ) ;
r = 0 ;
break ;
}
KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode
This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core. Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering. This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core. With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.
On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore. On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-06 13:24:41 +11:00
case KVM_CAP_PPC_SMT : {
unsigned long mode = cap - > args [ 0 ] ;
unsigned long flags = cap - > args [ 1 ] ;
r = - EINVAL ;
if ( kvm - > arch . kvm_ops - > set_smt_mode )
r = kvm - > arch . kvm_ops - > set_smt_mode ( kvm , mode , flags ) ;
break ;
}
2014-06-02 11:02:59 +10:00
# endif
default :
r = - EINVAL ;
break ;
}
return r ;
}
2018-01-15 16:06:47 +11:00
# ifdef CONFIG_PPC_BOOK3S_64
/*
* These functions check whether the underlying hardware is safe
* against attacks based on observing the effects of speculatively
* executed instructions , and whether it supplies instructions for
* use in workarounds . The information comes from firmware , either
* via the device tree on powernv platforms or from an hcall on
* pseries platforms .
*/
# ifdef CONFIG_PPC_PSERIES
static int pseries_get_cpu_char ( struct kvm_ppc_cpu_char * cp )
{
struct h_cpu_char_result c ;
unsigned long rc ;
if ( ! machine_is ( pseries ) )
return - ENOTTY ;
rc = plpar_get_cpu_characteristics ( & c ) ;
if ( rc = = H_SUCCESS ) {
cp - > character = c . character ;
cp - > behaviour = c . behaviour ;
cp - > character_mask = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 |
KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED |
KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 |
KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 |
KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV |
KVM_PPC_CPU_CHAR_BR_HINT_HONOURED |
KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS ;
cp - > behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY |
KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR |
KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR ;
}
return 0 ;
}
# else
static int pseries_get_cpu_char ( struct kvm_ppc_cpu_char * cp )
{
return - ENOTTY ;
}
# endif
static inline bool have_fw_feat ( struct device_node * fw_features ,
const char * state , const char * name )
{
struct device_node * np ;
bool r = false ;
np = of_get_child_by_name ( fw_features , name ) ;
if ( np ) {
r = of_property_read_bool ( np , state ) ;
of_node_put ( np ) ;
}
return r ;
}
static int kvmppc_get_cpu_char ( struct kvm_ppc_cpu_char * cp )
{
struct device_node * np , * fw_features ;
int r ;
memset ( cp , 0 , sizeof ( * cp ) ) ;
r = pseries_get_cpu_char ( cp ) ;
if ( r ! = - ENOTTY )
return r ;
np = of_find_node_by_name ( NULL , " ibm,opal " ) ;
if ( np ) {
fw_features = of_get_child_by_name ( np , " fw-features " ) ;
of_node_put ( np ) ;
if ( ! fw_features )
return 0 ;
if ( have_fw_feat ( fw_features , " enabled " ,
" inst-spec-barrier-ori31,31,0 " ) )
cp - > character | = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 ;
if ( have_fw_feat ( fw_features , " enabled " ,
" fw-bcctrl-serialized " ) )
cp - > character | = KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED ;
if ( have_fw_feat ( fw_features , " enabled " ,
" inst-l1d-flush-ori30,30,0 " ) )
cp - > character | = KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 ;
if ( have_fw_feat ( fw_features , " enabled " ,
" inst-l1d-flush-trig2 " ) )
cp - > character | = KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 ;
if ( have_fw_feat ( fw_features , " enabled " ,
" fw-l1d-thread-split " ) )
cp - > character | = KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV ;
if ( have_fw_feat ( fw_features , " enabled " ,
" fw-count-cache-disabled " ) )
cp - > character | = KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS ;
cp - > character_mask = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 |
KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED |
KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 |
KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 |
KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS ;
if ( have_fw_feat ( fw_features , " enabled " ,
" speculation-policy-favor-security " ) )
cp - > behaviour | = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY ;
if ( ! have_fw_feat ( fw_features , " disabled " ,
" needs-l1d-flush-msr-pr-0-to-1 " ) )
cp - > behaviour | = KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR ;
if ( ! have_fw_feat ( fw_features , " disabled " ,
" needs-spec-barrier-for-bound-checks " ) )
cp - > behaviour | = KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR ;
cp - > behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY |
KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR |
KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR ;
of_node_put ( fw_features ) ;
}
return 0 ;
}
# endif
2008-04-16 23:28:09 -05:00
long kvm_arch_vm_ioctl ( struct file * filp ,
unsigned int ioctl , unsigned long arg )
{
2013-04-12 14:08:46 +00:00
struct kvm * kvm __maybe_unused = filp - > private_data ;
2010-07-29 14:48:08 +02:00
void __user * argp = ( void __user * ) arg ;
2008-04-16 23:28:09 -05:00
long r ;
switch ( ioctl ) {
2010-07-29 14:48:08 +02:00
case KVM_PPC_GET_PVINFO : {
struct kvm_ppc_pvinfo pvinfo ;
2010-10-30 13:04:24 +04:00
memset ( & pvinfo , 0 , sizeof ( pvinfo ) ) ;
2010-07-29 14:48:08 +02:00
r = kvm_vm_ioctl_get_pvinfo ( & pvinfo ) ;
if ( copy_to_user ( argp , & pvinfo , sizeof ( pvinfo ) ) ) {
r = - EFAULT ;
goto out ;
}
break ;
}
2014-06-02 11:02:59 +10:00
case KVM_ENABLE_CAP :
{
struct kvm_enable_cap cap ;
r = - EFAULT ;
if ( copy_from_user ( & cap , argp , sizeof ( cap ) ) )
goto out ;
r = kvm_vm_ioctl_enable_cap ( kvm , & cap ) ;
break ;
}
KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms
Commit e91aa8e6ecd5 ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
Book 3S kernel configurations in order to simplify the code and
reduce #ifdefs. However, 64-bit Book 3S PPC platforms other than
pseries and powernv don't implement the necessary IOMMU callbacks,
leading to build failures like the following (for a pasemi config):
scripts/kconfig/conf --silentoldconfig Kconfig
warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))
...
CC [M] arch/powerpc/kvm/book3s_64_vio.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
iommu_tce_xchg(tbl, entry, &hpa, &dir);
^
To fix this, we make the inclusion of the SPAPR TCE support, and the
code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
the inclusion of support for the pseries and/or powernv platforms.
This means that when running a 'pseries' guest on those platforms,
the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
but at least now they compile.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-11 14:31:59 +10:00
# ifdef CONFIG_SPAPR_TCE_IOMMU
2016-03-01 17:54:40 +11:00
case KVM_CREATE_SPAPR_TCE_64 : {
struct kvm_create_spapr_tce_64 create_tce_64 ;
r = - EFAULT ;
if ( copy_from_user ( & create_tce_64 , argp , sizeof ( create_tce_64 ) ) )
goto out ;
if ( create_tce_64 . flags ) {
r = - EINVAL ;
goto out ;
}
r = kvm_vm_ioctl_create_spapr_tce ( kvm , & create_tce_64 ) ;
goto out ;
}
2011-06-29 00:22:41 +00:00
case KVM_CREATE_SPAPR_TCE : {
struct kvm_create_spapr_tce create_tce ;
2016-03-01 17:54:40 +11:00
struct kvm_create_spapr_tce_64 create_tce_64 ;
2011-06-29 00:22:41 +00:00
r = - EFAULT ;
if ( copy_from_user ( & create_tce , argp , sizeof ( create_tce ) ) )
goto out ;
2016-03-01 17:54:40 +11:00
create_tce_64 . liobn = create_tce . liobn ;
create_tce_64 . page_shift = IOMMU_PAGE_SHIFT_4K ;
create_tce_64 . offset = 0 ;
create_tce_64 . size = create_tce . window_size > >
IOMMU_PAGE_SHIFT_4K ;
create_tce_64 . flags = 0 ;
r = kvm_vm_ioctl_create_spapr_tce ( kvm , & create_tce_64 ) ;
2011-06-29 00:22:41 +00:00
goto out ;
}
KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms
Commit e91aa8e6ecd5 ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
Book 3S kernel configurations in order to simplify the code and
reduce #ifdefs. However, 64-bit Book 3S PPC platforms other than
pseries and powernv don't implement the necessary IOMMU callbacks,
leading to build failures like the following (for a pasemi config):
scripts/kconfig/conf --silentoldconfig Kconfig
warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))
...
CC [M] arch/powerpc/kvm/book3s_64_vio.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
iommu_tce_xchg(tbl, entry, &hpa, &dir);
^
To fix this, we make the inclusion of the SPAPR TCE support, and the
code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
the inclusion of support for the pseries and/or powernv platforms.
This means that when running a 'pseries' guest on those platforms,
the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
but at least now they compile.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-11 14:31:59 +10:00
# endif
# ifdef CONFIG_PPC_BOOK3S_64
2012-04-26 19:43:42 +00:00
case KVM_PPC_GET_SMMU_INFO : {
struct kvm_ppc_smmu_info info ;
2013-10-07 22:18:01 +05:30
struct kvm * kvm = filp - > private_data ;
2012-04-26 19:43:42 +00:00
memset ( & info , 0 , sizeof ( info ) ) ;
2013-10-07 22:18:01 +05:30
r = kvm - > arch . kvm_ops - > get_smmu_info ( kvm , & info ) ;
2012-04-26 19:43:42 +00:00
if ( r > = 0 & & copy_to_user ( argp , & info , sizeof ( info ) ) )
r = - EFAULT ;
break ;
}
2013-04-17 20:30:00 +00:00
case KVM_PPC_RTAS_DEFINE_TOKEN : {
struct kvm * kvm = filp - > private_data ;
r = kvm_vm_ioctl_rtas_define_token ( kvm , argp ) ;
break ;
}
2017-01-30 21:21:41 +11:00
case KVM_PPC_CONFIGURE_V3_MMU : {
struct kvm * kvm = filp - > private_data ;
struct kvm_ppc_mmuv3_cfg cfg ;
r = - EINVAL ;
if ( ! kvm - > arch . kvm_ops - > configure_mmu )
goto out ;
r = - EFAULT ;
if ( copy_from_user ( & cfg , argp , sizeof ( cfg ) ) )
goto out ;
r = kvm - > arch . kvm_ops - > configure_mmu ( kvm , & cfg ) ;
break ;
}
case KVM_PPC_GET_RMMU_INFO : {
struct kvm * kvm = filp - > private_data ;
struct kvm_ppc_rmmu_info info ;
r = - EINVAL ;
if ( ! kvm - > arch . kvm_ops - > get_rmmu_info )
goto out ;
r = kvm - > arch . kvm_ops - > get_rmmu_info ( kvm , & info ) ;
if ( r > = 0 & & copy_to_user ( argp , & info , sizeof ( info ) ) )
r = - EFAULT ;
break ;
}
2018-01-15 16:06:47 +11:00
case KVM_PPC_GET_CPU_CHAR : {
struct kvm_ppc_cpu_char cpuchar ;
r = kvmppc_get_cpu_char ( & cpuchar ) ;
if ( r > = 0 & & copy_to_user ( argp , & cpuchar , sizeof ( cpuchar ) ) )
r = - EFAULT ;
break ;
}
2013-10-07 22:18:01 +05:30
default : {
struct kvm * kvm = filp - > private_data ;
r = kvm - > arch . kvm_ops - > arch_vm_ioctl ( filp , ioctl , arg ) ;
}
2013-10-07 22:17:53 +05:30
# else /* CONFIG_PPC_BOOK3S_64 */
2008-04-16 23:28:09 -05:00
default :
2009-08-26 14:57:07 +03:00
r = - ENOTTY ;
2013-10-07 22:17:53 +05:30
# endif
2008-04-16 23:28:09 -05:00
}
2010-07-29 14:48:08 +02:00
out :
2008-04-16 23:28:09 -05:00
return r ;
}
2011-12-20 15:34:20 +00:00
static unsigned long lpid_inuse [ BITS_TO_LONGS ( KVMPPC_NR_LPIDS ) ] ;
static unsigned long nr_lpids ;
long kvmppc_alloc_lpid ( void )
{
long lpid ;
do {
lpid = find_first_zero_bit ( lpid_inuse , KVMPPC_NR_LPIDS ) ;
if ( lpid > = nr_lpids ) {
pr_err ( " %s: No LPIDs free \n " , __func__ ) ;
return - ENOMEM ;
}
} while ( test_and_set_bit ( lpid , lpid_inuse ) ) ;
return lpid ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_alloc_lpid ) ;
2011-12-20 15:34:20 +00:00
void kvmppc_claim_lpid ( long lpid )
{
set_bit ( lpid , lpid_inuse ) ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_claim_lpid ) ;
2011-12-20 15:34:20 +00:00
void kvmppc_free_lpid ( long lpid )
{
clear_bit ( lpid , lpid_inuse ) ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_free_lpid ) ;
2011-12-20 15:34:20 +00:00
void kvmppc_init_lpid ( unsigned long nr_lpids_param )
{
nr_lpids = min_t ( unsigned long , KVMPPC_NR_LPIDS , nr_lpids_param ) ;
memset ( lpid_inuse , 0 , sizeof ( lpid_inuse ) ) ;
}
2013-10-07 22:17:59 +05:30
EXPORT_SYMBOL_GPL ( kvmppc_init_lpid ) ;
2011-12-20 15:34:20 +00:00
2008-04-16 23:28:09 -05:00
int kvm_arch_init ( void * opaque )
{
return 0 ;
}
2014-08-05 11:29:07 +02:00
EXPORT_TRACEPOINT_SYMBOL_GPL ( kvm_ppc_instr ) ;