2005-09-26 10:04:21 +04:00
/*
* This program is used to generate definitions needed by
* assembly language modules .
*
* We use the technique used in the OSF Mach kernel code :
* generate asm statements containing # defines ,
* compile this file to assembler , and then extract the
* # defines from the assembly - language output .
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*/
# include <linux/signal.h>
# include <linux/sched.h>
# include <linux/kernel.h>
# include <linux/errno.h>
# include <linux/string.h>
# include <linux/types.h>
# include <linux/mman.h>
# include <linux/mm.h>
2007-05-03 16:31:38 +04:00
# include <linux/suspend.h>
2008-02-05 08:16:48 +03:00
# include <linux/hrtimer.h>
2005-09-28 18:35:31 +04:00
# ifdef CONFIG_PPC64
2005-09-26 10:04:21 +04:00
# include <linux/time.h>
# include <linux/hardirq.h>
2005-09-28 18:35:31 +04:00
# endif
2008-04-29 12:04:08 +04:00
# include <linux/kbuild.h>
2005-09-28 18:35:31 +04:00
2005-09-26 10:04:21 +04:00
# include <asm/io.h>
# include <asm/page.h>
# include <asm/pgtable.h>
# include <asm/processor.h>
# include <asm/cputable.h>
# include <asm/thread_info.h>
2005-10-26 11:05:24 +04:00
# include <asm/rtas.h>
2005-11-11 13:15:21 +03:00
# include <asm/vdso_datapage.h>
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller. This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.
Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:
rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
rm_intr: 1660.0ns (12 - 553050, 3600051 samples)
and this after the change:
rm_entry: 3060.1ns (212 - 65138, 953873 samples)
rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)
for a test of booting Fedora 20 big-endian to the login prompt.
The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.
The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-28 06:21:12 +03:00
# include <asm/dbell.h>
2005-09-26 10:04:21 +04:00
# ifdef CONFIG_PPC64
# include <asm/paca.h>
# include <asm/lppaca.h>
# include <asm/cache.h>
# include <asm/compat.h>
2006-08-09 11:00:30 +04:00
# include <asm/mmu.h>
2006-09-13 22:32:39 +04:00
# include <asm/hvcall.h>
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-23 11:42:46 +04:00
# include <asm/xics.h>
2005-09-26 10:04:21 +04:00
# endif
2011-09-19 21:45:04 +04:00
# ifdef CONFIG_PPC_POWERNV
# include <asm/opal.h>
# endif
2010-08-30 14:01:56 +04:00
# if defined(CONFIG_KVM) || defined(CONFIG_KVM_GUEST)
2009-01-04 01:23:08 +03:00
# include <linux/kvm_host.h>
2010-04-16 02:11:44 +04:00
# endif
2010-08-30 14:01:56 +04:00
# if defined(CONFIG_KVM) && defined(CONFIG_PPC_BOOK3S)
# include <asm/kvm_book3s.h>
2014-04-24 15:46:24 +04:00
# include <asm/kvm_ppc.h>
2008-11-05 18:36:18 +03:00
# endif
2005-09-26 10:04:21 +04:00
2009-07-28 05:59:34 +04:00
# ifdef CONFIG_PPC32
2008-04-30 14:23:21 +04:00
# if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
# include "head_booke.h"
# endif
2009-07-28 05:59:34 +04:00
# endif
2008-04-30 14:23:21 +04:00
2009-10-17 03:48:40 +04:00
# if defined(CONFIG_PPC_FSL_BOOK3E)
2008-12-09 06:34:55 +03:00
# include "../mm/mmu_decl.h"
# endif
powerpc/8xx: Fix vaddr for IMMR early remap
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
* 0xfffdf000..0xfffff000 : fixmap
* 0xfde00000..0xfe000000 : consistent mem
* 0xfddf6000..0xfde00000 : early ioremap
* 0xc9000000..0xfddf6000 : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Today, IMMR is mapped 1:1 at startup
Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area
This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.
The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-05-17 10:02:43 +03:00
# ifdef CONFIG_PPC_8xx
# include <asm/fixmap.h>
# endif
2016-06-02 07:29:47 +03:00
# define STACK_PT_REGS_OFFSET(sym, val) \
DEFINE ( sym , STACK_FRAME_OVERHEAD + offsetof ( struct pt_regs , val ) )
2005-09-26 10:04:21 +04:00
int main ( void )
{
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD , task_struct , thread ) ;
OFFSET ( MM , task_struct , mm ) ;
OFFSET ( MMCONTEXTID , mm_struct , context . id ) ;
2005-09-26 10:04:21 +04:00
# ifdef CONFIG_PPC64
powerpc: Allow perf_counters to access user memory at interrupt time
This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine. Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor. This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table. 32-bit processors are already able to access user memory
at interrupt time. Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.
On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca. This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields. To prevent this, we hard-disable
interrupts in switch_slb. Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.
This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.
Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.
If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism. An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-17 09:17:54 +04:00
DEFINE ( SIGSEGV , SIGSEGV ) ;
DEFINE ( NMI_MASK , NMI_MASK ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( TASKTHREADPPR , task_struct , thread . ppr ) ;
2005-09-28 18:35:31 +04:00
# else
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_INFO , task_struct , stack ) ;
2013-09-24 09:17:21 +04:00
DEFINE ( THREAD_INFO_GAP , _ALIGN_UP ( sizeof ( struct thread_info ) , 16 ) ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( KSP_LIMIT , thread_struct , ksp_limit ) ;
2005-09-28 18:35:31 +04:00
# endif /* CONFIG_PPC64 */
powerpc/livepatch: Add live patching support on ppc64le
Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.
Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.
However there is one particularly tricky case we need to handle.
If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.
When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.
An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.
To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.
When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
2016-03-24 14:04:05 +03:00
# ifdef CONFIG_LIVEPATCH
2017-02-15 13:41:20 +03:00
OFFSET ( TI_livepatch_sp , thread_info , livepatch_sp ) ;
powerpc/livepatch: Add live patching support on ppc64le
Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.
Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.
However there is one particularly tricky case we need to handle.
If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.
When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.
An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.
To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.
When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
2016-03-24 14:04:05 +03:00
# endif
2017-02-15 13:41:20 +03:00
OFFSET ( KSP , thread_struct , ksp ) ;
OFFSET ( PT_REGS , thread_struct , regs ) ;
2011-04-23 01:48:27 +04:00
# ifdef CONFIG_BOOKE
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_NORMSAVES , thread_struct , normsave [ 0 ] ) ;
2011-04-23 01:48:27 +04:00
# endif
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_FPEXC_MODE , thread_struct , fpexc_mode ) ;
OFFSET ( THREAD_FPSTATE , thread_struct , fp_state ) ;
OFFSET ( THREAD_FPSAVEAREA , thread_struct , fp_save_area ) ;
OFFSET ( FPSTATE_FPSCR , thread_fp_state , fpscr ) ;
OFFSET ( THREAD_LOAD_FP , thread_struct , load_fp ) ;
2005-09-26 10:04:21 +04:00
# ifdef CONFIG_ALTIVEC
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_VRSTATE , thread_struct , vr_state ) ;
OFFSET ( THREAD_VRSAVEAREA , thread_struct , vr_save_area ) ;
OFFSET ( THREAD_VRSAVE , thread_struct , vrsave ) ;
OFFSET ( THREAD_USED_VR , thread_struct , used_vr ) ;
OFFSET ( VRSTATE_VSCR , thread_vr_state , vscr ) ;
OFFSET ( THREAD_LOAD_VEC , thread_struct , load_vec ) ;
2005-09-26 10:04:21 +04:00
# endif /* CONFIG_ALTIVEC */
2008-06-25 08:07:18 +04:00
# ifdef CONFIG_VSX
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_USED_VSR , thread_struct , used_vsr ) ;
2008-06-25 08:07:18 +04:00
# endif /* CONFIG_VSX */
2005-09-28 18:35:31 +04:00
# ifdef CONFIG_PPC64
2017-02-15 13:41:20 +03:00
OFFSET ( KSP_VSID , thread_struct , ksp_vsid ) ;
2005-09-28 18:35:31 +04:00
# else /* CONFIG_PPC64 */
2017-02-15 13:41:20 +03:00
OFFSET ( PGDIR , thread_struct , pgdir ) ;
2005-09-26 10:04:21 +04:00
# ifdef CONFIG_SPE
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_EVR0 , thread_struct , evr [ 0 ] ) ;
OFFSET ( THREAD_ACC , thread_struct , acc ) ;
OFFSET ( THREAD_SPEFSCR , thread_struct , spefscr ) ;
OFFSET ( THREAD_USED_SPE , thread_struct , used_spe ) ;
2005-09-26 10:04:21 +04:00
# endif /* CONFIG_SPE */
2005-09-28 18:35:31 +04:00
# endif /* CONFIG_PPC64 */
2013-05-22 08:20:59 +04:00
# if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_DBCR0 , thread_struct , debug . dbcr0 ) ;
2013-05-22 08:20:59 +04:00
# endif
2010-04-16 02:11:51 +04:00
# ifdef CONFIG_KVM_BOOK3S_32_HANDLER
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_KVM_SVCPU , thread_struct , kvm_shadow_vcpu ) ;
2010-04-16 02:11:51 +04:00
# endif
2013-01-16 02:20:42 +04:00
# if defined(CONFIG_KVM) && defined(CONFIG_BOOKE)
2017-02-15 13:41:20 +03:00
OFFSET ( THREAD_KVM_VCPU , thread_struct , kvm_vcpu ) ;
2011-12-20 19:34:43 +04:00
# endif
2005-09-28 18:35:31 +04:00
2013-02-13 20:21:32 +04:00
# ifdef CONFIG_PPC_TRANSACTIONAL_MEM
2017-02-15 13:41:20 +03:00
OFFSET ( PACATMSCRATCH , paca_struct , tm_scratch ) ;
OFFSET ( THREAD_TM_TFHAR , thread_struct , tm_tfhar ) ;
OFFSET ( THREAD_TM_TEXASR , thread_struct , tm_texasr ) ;
OFFSET ( THREAD_TM_TFIAR , thread_struct , tm_tfiar ) ;
OFFSET ( THREAD_TM_TAR , thread_struct , tm_tar ) ;
OFFSET ( THREAD_TM_PPR , thread_struct , tm_ppr ) ;
OFFSET ( THREAD_TM_DSCR , thread_struct , tm_dscr ) ;
OFFSET ( PT_CKPT_REGS , thread_struct , ckpt_regs ) ;
OFFSET ( THREAD_CKVRSTATE , thread_struct , ckvr_state ) ;
OFFSET ( THREAD_CKVRSAVE , thread_struct , ckvrsave ) ;
OFFSET ( THREAD_CKFPSTATE , thread_struct , ckfp_state ) ;
2013-02-13 20:21:32 +04:00
/* Local pt_regs on stack for Transactional Memory funcs. */
DEFINE ( TM_FRAME_SIZE , STACK_FRAME_OVERHEAD +
sizeof ( struct pt_regs ) + 16 ) ;
# endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
2013-02-07 19:46:58 +04:00
2017-02-15 13:41:20 +03:00
OFFSET ( TI_FLAGS , thread_info , flags ) ;
OFFSET ( TI_LOCAL_FLAGS , thread_info , local_flags ) ;
OFFSET ( TI_PREEMPT , thread_info , preempt_count ) ;
OFFSET ( TI_TASK , thread_info , task ) ;
OFFSET ( TI_CPU , thread_info , cpu ) ;
2005-09-28 18:35:31 +04:00
# ifdef CONFIG_PPC64
2017-02-15 13:41:20 +03:00
OFFSET ( DCACHEL1BLOCKSIZE , ppc64_caches , l1d . block_size ) ;
OFFSET ( DCACHEL1LOGBLOCKSIZE , ppc64_caches , l1d . log_block_size ) ;
OFFSET ( DCACHEL1BLOCKSPERPAGE , ppc64_caches , l1d . blocks_per_page ) ;
OFFSET ( ICACHEL1BLOCKSIZE , ppc64_caches , l1i . block_size ) ;
OFFSET ( ICACHEL1LOGBLOCKSIZE , ppc64_caches , l1i . log_block_size ) ;
OFFSET ( ICACHEL1BLOCKSPERPAGE , ppc64_caches , l1i . blocks_per_page ) ;
2005-09-28 18:35:31 +04:00
/* paca */
DEFINE ( PACA_SIZE , sizeof ( struct paca_struct ) ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( PACAPACAINDEX , paca_struct , paca_index ) ;
OFFSET ( PACAPROCSTART , paca_struct , cpu_start ) ;
OFFSET ( PACAKSAVE , paca_struct , kstack ) ;
OFFSET ( PACACURRENT , paca_struct , __current ) ;
OFFSET ( PACASAVEDMSR , paca_struct , saved_msr ) ;
OFFSET ( PACASTABRR , paca_struct , stab_rr ) ;
OFFSET ( PACAR1 , paca_struct , saved_r1 ) ;
OFFSET ( PACATOC , paca_struct , kernel_toc ) ;
OFFSET ( PACAKBASE , paca_struct , kernelbase ) ;
OFFSET ( PACAKMSR , paca_struct , kernel_msr ) ;
OFFSET ( PACASOFTIRQEN , paca_struct , soft_enabled ) ;
OFFSET ( PACAIRQHAPPENED , paca_struct , irq_happened ) ;
2015-10-28 07:54:06 +03:00
# ifdef CONFIG_PPC_BOOK3S
2017-02-15 13:41:20 +03:00
OFFSET ( PACACONTEXTID , paca_struct , mm_ctx_id ) ;
2007-05-08 10:27:27 +04:00
# ifdef CONFIG_PPC_MM_SLICES
2017-02-15 13:41:20 +03:00
OFFSET ( PACALOWSLICESPSIZE , paca_struct , mm_ctx_low_slices_psize ) ;
OFFSET ( PACAHIGHSLICEPSIZE , paca_struct , mm_ctx_high_slices_psize ) ;
2017-03-22 06:36:59 +03:00
DEFINE ( PACA_ADDR_LIMIT , offsetof ( struct paca_struct , addr_limit ) ) ;
2007-05-08 10:27:27 +04:00
DEFINE ( MMUPSIZEDEFSIZE , sizeof ( struct mmu_psize_def ) ) ;
2009-06-03 01:17:41 +04:00
# endif /* CONFIG_PPC_MM_SLICES */
2015-10-28 07:54:06 +03:00
# endif
2009-07-24 03:15:42 +04:00
# ifdef CONFIG_PPC_BOOK3E
2017-02-15 13:41:20 +03:00
OFFSET ( PACAPGD , paca_struct , pgd ) ;
OFFSET ( PACA_KERNELPGD , paca_struct , kernel_pgd ) ;
OFFSET ( PACA_EXGEN , paca_struct , exgen ) ;
OFFSET ( PACA_EXTLB , paca_struct , extlb ) ;
OFFSET ( PACA_EXMC , paca_struct , exmc ) ;
OFFSET ( PACA_EXCRIT , paca_struct , excrit ) ;
OFFSET ( PACA_EXDBG , paca_struct , exdbg ) ;
OFFSET ( PACA_MC_STACK , paca_struct , mc_kstack ) ;
OFFSET ( PACA_CRIT_STACK , paca_struct , crit_kstack ) ;
OFFSET ( PACA_DBG_STACK , paca_struct , dbg_kstack ) ;
OFFSET ( PACA_TCD_PTR , paca_struct , tcd_ptr ) ;
OFFSET ( TCD_ESEL_NEXT , tlb_core_data , esel_next ) ;
OFFSET ( TCD_ESEL_MAX , tlb_core_data , esel_max ) ;
OFFSET ( TCD_ESEL_FIRST , tlb_core_data , esel_first ) ;
2009-07-24 03:15:42 +04:00
# endif /* CONFIG_PPC_BOOK3E */
2009-06-03 01:17:41 +04:00
# ifdef CONFIG_PPC_STD_MMU_64
2017-02-15 13:41:20 +03:00
OFFSET ( PACASLBCACHE , paca_struct , slb_cache ) ;
OFFSET ( PACASLBCACHEPTR , paca_struct , slb_cache_ptr ) ;
OFFSET ( PACAVMALLOCSLLP , paca_struct , vmalloc_sllp ) ;
2009-06-03 01:17:41 +04:00
# ifdef CONFIG_PPC_MM_SLICES
2017-02-15 13:41:20 +03:00
OFFSET ( MMUPSIZESLLP , mmu_psize_def , sllp ) ;
2007-05-08 10:27:27 +04:00
# else
2017-02-15 13:41:20 +03:00
OFFSET ( PACACONTEXTSLLP , paca_struct , mm_ctx_sllp ) ;
2007-05-08 10:27:27 +04:00
# endif /* CONFIG_PPC_MM_SLICES */
2017-02-15 13:41:20 +03:00
OFFSET ( PACA_EXGEN , paca_struct , exgen ) ;
OFFSET ( PACA_EXMC , paca_struct , exmc ) ;
OFFSET ( PACA_EXSLB , paca_struct , exslb ) ;
2016-12-19 21:30:04 +03:00
OFFSET ( PACA_EXNMI , paca_struct , exnmi ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( PACALPPACAPTR , paca_struct , lppaca_ptr ) ;
OFFSET ( PACA_SLBSHADOWPTR , paca_struct , slb_shadow_ptr ) ;
OFFSET ( SLBSHADOW_STACKVSID , slb_shadow , save_area [ SLB_NUM_BOLTED - 1 ] . vsid ) ;
OFFSET ( SLBSHADOW_STACKESID , slb_shadow , save_area [ SLB_NUM_BOLTED - 1 ] . esid ) ;
OFFSET ( SLBSHADOW_SAVEAREA , slb_shadow , save_area ) ;
OFFSET ( LPPACA_PMCINUSE , lppaca , pmcregs_in_use ) ;
OFFSET ( LPPACA_DTLIDX , lppaca , dtl_idx ) ;
OFFSET ( LPPACA_YIELDCOUNT , lppaca , yield_count ) ;
OFFSET ( PACA_DTL_RIDX , paca_struct , dtl_ridx ) ;
2009-06-03 01:17:41 +04:00
# endif /* CONFIG_PPC_STD_MMU_64 */
2017-02-15 13:41:20 +03:00
OFFSET ( PACAEMERGSP , paca_struct , emergency_sp ) ;
powerpc/book3s: handle machine check in Linux host.
Move machine check entry point into Linux. So far we were dependent on
firmware to decode MCE error details and handover the high level info to OS.
This patch introduces early machine check routine that saves the MCE
information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
stack frame on emergency stack and set the r1 accordingly. This allows us to be
prepared to take another exception without loosing context. One thing to note
here that, if we get another machine check while ME bit is off then we risk a
checkstop. Hence we restrict ourselves to save only MCE information and
register saved on PACA_EXMC save are before we turn the ME bit on. We use
paca->in_mce flag to differentiate between first entry and nested machine check
entry which helps proper use of emergency stack. We increment paca->in_mce
every time we enter in early machine check handler and decrement it while
leaving. When we enter machine check early handler first time (paca->in_mce ==
0), we are sure nobody is using MC emergency stack and allocate a stack frame
at the start of the emergency stack. During subsequent entry (paca->in_mce >
0), we know that r1 points inside emergency stack and we allocate separate
stack frame accordingly. This prevents us from clobbering MCE information
during nested machine checks.
The early machine check handler changes are placed under CPU_FTR_HVMODE
section. This makes sure that the early machine check handler will get executed
only in hypervisor kernel.
This is the code flow:
Machine Check Interrupt
|
V
0x200 vector ME=0, IR=0, DR=0
|
V
+-----------------------------------------------+
|machine_check_pSeries_early: | ME=0, IR=0, DR=0
| Alloc frame on emergency stack |
| Save srr1, srr0, dar and dsisr on stack |
+-----------------------------------------------+
|
(ME=1, IR=0, DR=0, RFID)
|
V
machine_check_handle_early ME=1, IR=0, DR=0
|
V
+-----------------------------------------------+
| machine_check_early (r3=pt_regs) | ME=1, IR=0, DR=0
| Things to do: (in next patches) |
| Flush SLB for SLB errors |
| Flush TLB for TLB errors |
| Decode and save MCE info |
+-----------------------------------------------+
|
(Fall through existing exception handler routine.)
|
V
machine_check_pSerie ME=1, IR=0, DR=0
|
(ME=1, IR=1, DR=1, RFID)
|
V
machine_check_common ME=1, IR=1, DR=1
.
.
.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 18:34:08 +04:00
# ifdef CONFIG_PPC_BOOK3S_64
2017-02-15 13:41:20 +03:00
OFFSET ( PACAMCEMERGSP , paca_struct , mc_emergency_sp ) ;
2016-12-19 21:30:06 +03:00
OFFSET ( PACA_NMI_EMERG_SP , paca_struct , nmi_emergency_sp ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( PACA_IN_MCE , paca_struct , in_mce ) ;
2016-12-19 21:30:05 +03:00
OFFSET ( PACA_IN_NMI , paca_struct , in_nmi ) ;
2017-02-15 13:41:20 +03:00
# endif
OFFSET ( PACAHWCPUID , paca_struct , hw_cpu_id ) ;
OFFSET ( PACAKEXECSTATE , paca_struct , kexec_state ) ;
OFFSET ( PACA_DSCR_DEFAULT , paca_struct , dscr_default ) ;
OFFSET ( ACCOUNT_STARTTIME , paca_struct , accounting . starttime ) ;
OFFSET ( ACCOUNT_STARTTIME_USER , paca_struct , accounting . starttime_user ) ;
powerpc updates for 4.11 part 2
Highlights include:
- An update of the disassembly code used by xmon to the latest versions in
binutils. We've received permission from all the authors of the relevant
binutils changes to relicense their changes to the relevant files from GPLv3
to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg
work to get permission from everyone.
- Addition of the "architected" Power9 CPU table entry, allowing us to boot
in Power9 architected mode under a hypervisor.
- Updates to the Power9 PMU code.
- Implementation of clear_bit_unlock_is_negative_byte() to optimise
unlock_page().
- Freescale updates from Scott: "Highlights include 8xx breakpoints and perf,
t1042rdb display support, and board updates."
Thanks to:
Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller,
Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan
Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney,
Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYthsKAAoJEFHr6jzI4aWAaWMQAJ7mAwX98ncoYschPgRmmIun
f6DtE4IonrxiZ22gp1ct4+c9OFtA+B5FXMcEhOKpfh93lg38PTDjHs9e5kfauD7+
oTQ2Bg1eXaL48FKdmC5Vs4Kt+/J8e9guGafUC1OVIpTyyRPoZeUDH0lx+kSPV5bd
PkL+wY/k3W0Njo8WgD1P9u3W15+BxISo/k8c7ajzKTHGBZlAvj5h2gO6XUBNMLyy
YClB/qIymjZriSB+AeWYD79k8gPbBZPsmZG0ZF1hY060894LgqLB9mPOJdffx/DY
H7/uP6jcsRDOXTOmyueW1SEmPoQbtysiMd1lNrCXKtC/Okr5uhn2cUhi88AsgWvd
1QFly2lobcDAKPah/yB7YQGMAcmYvGGNuqrWaosaV2T7r0KprzUYYgCOqzvC3WSJ
QtVatBzMIqRTMYq+3U4G1aHeCXlRazVQHDuvPby8RdR5b2gIexiqMab2eS7tSMIH
mCOIunRIvT14g/7wxUV7tahN+ifncNxzAk4DvPO+Wc4FQ4sy7wArv2YipSaWRWtE
u7tNdBkEwlDkKhJgRU5T0Op2PyMbHwCP8pWuz7PQIhKIcgwmP9wb07BIWG/GGIqn
07TxJYX2ItabyEMZMsYhzILZqjLyiAaCARANB7ScbQbdP8wdcGZcwismhwnfROIU
NuxsZg63BUDMoxk7Sauu
=rspd
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"Highlights include:
- an update of the disassembly code used by xmon to the latest
versions in binutils. We've received permission from all the
authors of the relevant binutils changes to relicense their changes
to the relevant files from GPLv3 to GPLv2, for inclusion in Linux.
Thanks to Peter Bergner for doing the leg work to get permission
from everyone.
- addition of the "architected" Power9 CPU table entry, allowing us
to boot in Power9 architected mode under a hypervisor.
- updates to the Power9 PMU code.
- implementation of clear_bit_unlock_is_negative_byte() to optimise
unlock_page().
- Freescale updates from Scott: "Highlights include 8xx breakpoints
and perf, t1042rdb display support, and board updates."
Thanks to:
Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas
Miller, Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan,
Michael Roth, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Peter
Bergner, Paul E. McKenney, Rashmica Gupta, Russell Currey, Sahil
Mehta, Stewart Smith"
* tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
powerpc: Remove leftover cputime_to_nsecs call causing build error
powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU
powerpc/optprobes: Fix TOC handling in optprobes trampoline
powerpc/pseries: Advertise Hot Plug Event support to firmware
cxl: fix nested locking hang during EEH hotplug
powerpc/xmon: Dump memory in CPU endian format
powerpc/pseries: Revert 'Auto-online hotplugged memory'
powerpc/powernv: Make PCI non-optional
powerpc/64: Implement clear_bit_unlock_is_negative_byte()
powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable()
powerpc/kernel: Remove error message in pcibios_setup_phb_resources()
powerpc/mm: Fix typo in set_pte_at()
pci/hotplug/pnv-php: Disable MSI and PCI device properly
pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts
pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
powerpc: Add POWER9 architected mode to cputable
powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags()
powerpc/perf: Avoid FAB_*_MATCH checks for power9
powerpc/perf: Add restrictions to PMC5 in power9 DD1
powerpc/perf: Use Instruction Counter value
...
2017-03-01 21:10:16 +03:00
OFFSET ( ACCOUNT_USER_TIME , paca_struct , accounting . utime ) ;
OFFSET ( ACCOUNT_SYSTEM_TIME , paca_struct , accounting . stime ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( PACA_TRAP_SAVE , paca_struct , trap_save ) ;
OFFSET ( PACA_NAPSTATELOST , paca_struct , nap_state_lost ) ;
OFFSET ( PACA_SPRG_VDSO , paca_struct , sprg_vdso ) ;
2016-05-17 09:33:46 +03:00
# else /* CONFIG_PPC64 */
# ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
2017-02-15 13:41:20 +03:00
OFFSET ( ACCOUNT_STARTTIME , thread_info , accounting . starttime ) ;
OFFSET ( ACCOUNT_STARTTIME_USER , thread_info , accounting . starttime_user ) ;
powerpc updates for 4.11 part 2
Highlights include:
- An update of the disassembly code used by xmon to the latest versions in
binutils. We've received permission from all the authors of the relevant
binutils changes to relicense their changes to the relevant files from GPLv3
to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg
work to get permission from everyone.
- Addition of the "architected" Power9 CPU table entry, allowing us to boot
in Power9 architected mode under a hypervisor.
- Updates to the Power9 PMU code.
- Implementation of clear_bit_unlock_is_negative_byte() to optimise
unlock_page().
- Freescale updates from Scott: "Highlights include 8xx breakpoints and perf,
t1042rdb display support, and board updates."
Thanks to:
Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller,
Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan
Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney,
Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYthsKAAoJEFHr6jzI4aWAaWMQAJ7mAwX98ncoYschPgRmmIun
f6DtE4IonrxiZ22gp1ct4+c9OFtA+B5FXMcEhOKpfh93lg38PTDjHs9e5kfauD7+
oTQ2Bg1eXaL48FKdmC5Vs4Kt+/J8e9guGafUC1OVIpTyyRPoZeUDH0lx+kSPV5bd
PkL+wY/k3W0Njo8WgD1P9u3W15+BxISo/k8c7ajzKTHGBZlAvj5h2gO6XUBNMLyy
YClB/qIymjZriSB+AeWYD79k8gPbBZPsmZG0ZF1hY060894LgqLB9mPOJdffx/DY
H7/uP6jcsRDOXTOmyueW1SEmPoQbtysiMd1lNrCXKtC/Okr5uhn2cUhi88AsgWvd
1QFly2lobcDAKPah/yB7YQGMAcmYvGGNuqrWaosaV2T7r0KprzUYYgCOqzvC3WSJ
QtVatBzMIqRTMYq+3U4G1aHeCXlRazVQHDuvPby8RdR5b2gIexiqMab2eS7tSMIH
mCOIunRIvT14g/7wxUV7tahN+ifncNxzAk4DvPO+Wc4FQ4sy7wArv2YipSaWRWtE
u7tNdBkEwlDkKhJgRU5T0Op2PyMbHwCP8pWuz7PQIhKIcgwmP9wb07BIWG/GGIqn
07TxJYX2ItabyEMZMsYhzILZqjLyiAaCARANB7ScbQbdP8wdcGZcwismhwnfROIU
NuxsZg63BUDMoxk7Sauu
=rspd
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"Highlights include:
- an update of the disassembly code used by xmon to the latest
versions in binutils. We've received permission from all the
authors of the relevant binutils changes to relicense their changes
to the relevant files from GPLv3 to GPLv2, for inclusion in Linux.
Thanks to Peter Bergner for doing the leg work to get permission
from everyone.
- addition of the "architected" Power9 CPU table entry, allowing us
to boot in Power9 architected mode under a hypervisor.
- updates to the Power9 PMU code.
- implementation of clear_bit_unlock_is_negative_byte() to optimise
unlock_page().
- Freescale updates from Scott: "Highlights include 8xx breakpoints
and perf, t1042rdb display support, and board updates."
Thanks to:
Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas
Miller, Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan,
Michael Roth, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Peter
Bergner, Paul E. McKenney, Rashmica Gupta, Russell Currey, Sahil
Mehta, Stewart Smith"
* tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
powerpc: Remove leftover cputime_to_nsecs call causing build error
powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU
powerpc/optprobes: Fix TOC handling in optprobes trampoline
powerpc/pseries: Advertise Hot Plug Event support to firmware
cxl: fix nested locking hang during EEH hotplug
powerpc/xmon: Dump memory in CPU endian format
powerpc/pseries: Revert 'Auto-online hotplugged memory'
powerpc/powernv: Make PCI non-optional
powerpc/64: Implement clear_bit_unlock_is_negative_byte()
powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable()
powerpc/kernel: Remove error message in pcibios_setup_phb_resources()
powerpc/mm: Fix typo in set_pte_at()
pci/hotplug/pnv-php: Disable MSI and PCI device properly
pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts
pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
powerpc: Add POWER9 architected mode to cputable
powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags()
powerpc/perf: Avoid FAB_*_MATCH checks for power9
powerpc/perf: Add restrictions to PMC5 in power9 DD1
powerpc/perf: Use Instruction Counter value
...
2017-03-01 21:10:16 +03:00
OFFSET ( ACCOUNT_USER_TIME , thread_info , accounting . utime ) ;
OFFSET ( ACCOUNT_SYSTEM_TIME , thread_info , accounting . stime ) ;
2016-05-17 09:33:46 +03:00
# endif
2005-10-26 11:05:24 +04:00
# endif /* CONFIG_PPC64 */
2005-09-28 18:35:31 +04:00
/* RTAS */
2017-02-15 13:41:20 +03:00
OFFSET ( RTASBASE , rtas_t , base ) ;
OFFSET ( RTASENTRY , rtas_t , entry ) ;
2005-09-28 18:35:31 +04:00
2005-09-26 10:04:21 +04:00
/* Interrupt register frame */
2008-04-24 00:33:49 +04:00
DEFINE ( INT_FRAME_SIZE , STACK_INT_FRAME_SIZE ) ;
2005-09-26 10:04:21 +04:00
DEFINE ( SWITCH_FRAME_SIZE , STACK_FRAME_OVERHEAD + sizeof ( struct pt_regs ) ) ;
2010-04-16 02:11:55 +04:00
# ifdef CONFIG_PPC64
2005-09-28 18:35:31 +04:00
/* Create extra stack space for SRR0 and SRR1 when calling prom/rtas. */
DEFINE ( PROM_FRAME_SIZE , STACK_FRAME_OVERHEAD + sizeof ( struct pt_regs ) + 16 ) ;
DEFINE ( RTAS_FRAME_SIZE , STACK_FRAME_OVERHEAD + sizeof ( struct pt_regs ) + 16 ) ;
# endif /* CONFIG_PPC64 */
2016-06-02 07:29:47 +03:00
STACK_PT_REGS_OFFSET ( GPR0 , gpr [ 0 ] ) ;
STACK_PT_REGS_OFFSET ( GPR1 , gpr [ 1 ] ) ;
STACK_PT_REGS_OFFSET ( GPR2 , gpr [ 2 ] ) ;
STACK_PT_REGS_OFFSET ( GPR3 , gpr [ 3 ] ) ;
STACK_PT_REGS_OFFSET ( GPR4 , gpr [ 4 ] ) ;
STACK_PT_REGS_OFFSET ( GPR5 , gpr [ 5 ] ) ;
STACK_PT_REGS_OFFSET ( GPR6 , gpr [ 6 ] ) ;
STACK_PT_REGS_OFFSET ( GPR7 , gpr [ 7 ] ) ;
STACK_PT_REGS_OFFSET ( GPR8 , gpr [ 8 ] ) ;
STACK_PT_REGS_OFFSET ( GPR9 , gpr [ 9 ] ) ;
STACK_PT_REGS_OFFSET ( GPR10 , gpr [ 10 ] ) ;
STACK_PT_REGS_OFFSET ( GPR11 , gpr [ 11 ] ) ;
STACK_PT_REGS_OFFSET ( GPR12 , gpr [ 12 ] ) ;
STACK_PT_REGS_OFFSET ( GPR13 , gpr [ 13 ] ) ;
2005-09-28 18:35:31 +04:00
# ifndef CONFIG_PPC64
2016-06-02 07:29:47 +03:00
STACK_PT_REGS_OFFSET ( GPR14 , gpr [ 14 ] ) ;
2005-09-28 18:35:31 +04:00
# endif /* CONFIG_PPC64 */
2005-09-26 10:04:21 +04:00
/*
* Note : these symbols include _ because they overlap with special
* register names
*/
2016-06-02 07:29:47 +03:00
STACK_PT_REGS_OFFSET ( _NIP , nip ) ;
STACK_PT_REGS_OFFSET ( _MSR , msr ) ;
STACK_PT_REGS_OFFSET ( _CTR , ctr ) ;
STACK_PT_REGS_OFFSET ( _LINK , link ) ;
STACK_PT_REGS_OFFSET ( _CCR , ccr ) ;
STACK_PT_REGS_OFFSET ( _XER , xer ) ;
STACK_PT_REGS_OFFSET ( _DAR , dar ) ;
STACK_PT_REGS_OFFSET ( _DSISR , dsisr ) ;
STACK_PT_REGS_OFFSET ( ORIG_GPR3 , orig_gpr3 ) ;
STACK_PT_REGS_OFFSET ( RESULT , result ) ;
STACK_PT_REGS_OFFSET ( _TRAP , trap ) ;
2005-09-28 18:35:31 +04:00
# ifndef CONFIG_PPC64
/*
* The PowerPC 400 - class & Book - E processors have neither the DAR
* nor the DSISR SPRs . Hence , we overload them to hold the similar
* DEAR and ESR SPRs for such processors . For critical interrupts
* we use them to hold SRR0 and SRR1 .
2005-09-26 10:04:21 +04:00
*/
2016-06-02 07:29:47 +03:00
STACK_PT_REGS_OFFSET ( _DEAR , dar ) ;
STACK_PT_REGS_OFFSET ( _ESR , dsisr ) ;
2005-09-28 18:35:31 +04:00
# else /* CONFIG_PPC64 */
2016-06-02 07:29:47 +03:00
STACK_PT_REGS_OFFSET ( SOFTE , softe ) ;
2005-09-28 18:35:31 +04:00
/* These _only_ to be used with {PROM,RTAS}_FRAME_SIZE!!! */
DEFINE ( _SRR0 , STACK_FRAME_OVERHEAD + sizeof ( struct pt_regs ) ) ;
DEFINE ( _SRR1 , STACK_FRAME_OVERHEAD + sizeof ( struct pt_regs ) + 8 ) ;
# endif /* CONFIG_PPC64 */
2009-07-28 05:59:34 +04:00
# if defined(CONFIG_PPC32)
2008-04-30 14:23:21 +04:00
# if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
DEFINE ( EXC_LVL_SIZE , STACK_EXC_LVL_FRAME_SIZE ) ;
DEFINE ( MAS0 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas0 ) ) ;
/* we overload MMUCR for 44x on MAS0 since they are mutually exclusive */
DEFINE ( MMUCR , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas0 ) ) ;
DEFINE ( MAS1 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas1 ) ) ;
DEFINE ( MAS2 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas2 ) ) ;
DEFINE ( MAS3 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas3 ) ) ;
DEFINE ( MAS6 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas6 ) ) ;
DEFINE ( MAS7 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , mas7 ) ) ;
DEFINE ( _SRR0 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , srr0 ) ) ;
DEFINE ( _SRR1 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , srr1 ) ) ;
DEFINE ( _CSRR0 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , csrr0 ) ) ;
DEFINE ( _CSRR1 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , csrr1 ) ) ;
DEFINE ( _DSRR0 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , dsrr0 ) ) ;
DEFINE ( _DSRR1 , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , dsrr1 ) ) ;
DEFINE ( SAVED_KSP_LIMIT , STACK_INT_FRAME_SIZE + offsetof ( struct exception_regs , saved_ksp_limit ) ) ;
# endif
2009-07-28 05:59:34 +04:00
# endif
2005-09-28 18:35:31 +04:00
# ifndef CONFIG_PPC64
2017-02-15 13:41:20 +03:00
OFFSET ( MM_PGD , mm_struct , pgd ) ;
2005-09-28 18:35:31 +04:00
# endif /* ! CONFIG_PPC64 */
2005-09-26 10:04:21 +04:00
/* About the CPU features table */
2017-02-15 13:41:20 +03:00
OFFSET ( CPU_SPEC_FEATURES , cpu_spec , cpu_features ) ;
OFFSET ( CPU_SPEC_SETUP , cpu_spec , cpu_setup ) ;
OFFSET ( CPU_SPEC_RESTORE , cpu_spec , cpu_restore ) ;
2005-09-26 10:04:21 +04:00
2017-02-15 13:41:20 +03:00
OFFSET ( pbe_address , pbe , address ) ;
OFFSET ( pbe_orig_address , pbe , orig_address ) ;
OFFSET ( pbe_next , pbe , next ) ;
2005-09-26 10:04:21 +04:00
2007-05-03 16:31:38 +04:00
# ifndef CONFIG_PPC64
2005-10-11 16:08:12 +04:00
DEFINE ( TASK_SIZE , TASK_SIZE ) ;
2005-09-28 18:35:31 +04:00
DEFINE ( NUM_USER_SEGMENTS , TASK_SIZE > > 28 ) ;
2005-11-11 13:15:21 +03:00
# endif /* ! CONFIG_PPC64 */
2005-09-26 10:04:21 +04:00
2005-11-11 13:15:21 +03:00
/* datapage offsets for use by vdso */
2017-02-15 13:41:20 +03:00
OFFSET ( CFG_TB_ORIG_STAMP , vdso_data , tb_orig_stamp ) ;
OFFSET ( CFG_TB_TICKS_PER_SEC , vdso_data , tb_ticks_per_sec ) ;
OFFSET ( CFG_TB_TO_XS , vdso_data , tb_to_xs ) ;
OFFSET ( CFG_TB_UPDATE_COUNT , vdso_data , tb_update_count ) ;
OFFSET ( CFG_TZ_MINUTEWEST , vdso_data , tz_minuteswest ) ;
OFFSET ( CFG_TZ_DSTTIME , vdso_data , tz_dsttime ) ;
OFFSET ( CFG_SYSCALL_MAP32 , vdso_data , syscall_map_32 ) ;
OFFSET ( WTOM_CLOCK_SEC , vdso_data , wtom_clock_sec ) ;
OFFSET ( WTOM_CLOCK_NSEC , vdso_data , wtom_clock_nsec ) ;
OFFSET ( STAMP_XTIME , vdso_data , stamp_xtime ) ;
OFFSET ( STAMP_SEC_FRAC , vdso_data , stamp_sec_fraction ) ;
OFFSET ( CFG_ICACHE_BLOCKSZ , vdso_data , icache_block_size ) ;
OFFSET ( CFG_DCACHE_BLOCKSZ , vdso_data , dcache_block_size ) ;
OFFSET ( CFG_ICACHE_LOGBLOCKSZ , vdso_data , icache_log_block_size ) ;
OFFSET ( CFG_DCACHE_LOGBLOCKSZ , vdso_data , dcache_log_block_size ) ;
2005-11-11 13:15:21 +03:00
# ifdef CONFIG_PPC64
2017-02-15 13:41:20 +03:00
OFFSET ( CFG_SYSCALL_MAP64 , vdso_data , syscall_map_64 ) ;
OFFSET ( TVAL64_TV_SEC , timeval , tv_sec ) ;
OFFSET ( TVAL64_TV_USEC , timeval , tv_usec ) ;
OFFSET ( TVAL32_TV_SEC , compat_timeval , tv_sec ) ;
OFFSET ( TVAL32_TV_USEC , compat_timeval , tv_usec ) ;
OFFSET ( TSPC64_TV_SEC , timespec , tv_sec ) ;
OFFSET ( TSPC64_TV_NSEC , timespec , tv_nsec ) ;
OFFSET ( TSPC32_TV_SEC , compat_timespec , tv_sec ) ;
OFFSET ( TSPC32_TV_NSEC , compat_timespec , tv_nsec ) ;
2005-11-11 13:15:21 +03:00
# else
2017-02-15 13:41:20 +03:00
OFFSET ( TVAL32_TV_SEC , timeval , tv_sec ) ;
OFFSET ( TVAL32_TV_USEC , timeval , tv_usec ) ;
OFFSET ( TSPC32_TV_SEC , timespec , tv_sec ) ;
OFFSET ( TSPC32_TV_NSEC , timespec , tv_nsec ) ;
2005-11-11 13:15:21 +03:00
# endif
/* timeval/timezone offsets for use by vdso */
2017-02-15 13:41:20 +03:00
OFFSET ( TZONE_TZ_MINWEST , timezone , tz_minuteswest ) ;
OFFSET ( TZONE_TZ_DSTTIME , timezone , tz_dsttime ) ;
2005-11-11 13:15:21 +03:00
/* Other bits used by the vdso */
DEFINE ( CLOCK_REALTIME , CLOCK_REALTIME ) ;
DEFINE ( CLOCK_MONOTONIC , CLOCK_MONOTONIC ) ;
DEFINE ( NSEC_PER_SEC , NSEC_PER_SEC ) ;
2008-02-08 01:24:52 +03:00
DEFINE ( CLOCK_REALTIME_RES , MONOTONIC_RES_NSEC ) ;
2005-11-11 13:15:21 +03:00
2007-01-01 21:45:34 +03:00
# ifdef CONFIG_BUG
DEFINE ( BUG_ENTRY_SIZE , sizeof ( struct bug_entry ) ) ;
# endif
2007-08-20 08:58:36 +04:00
2017-04-12 07:56:36 +03:00
# ifdef CONFIG_PPC_BOOK3S_64
DEFINE ( PGD_TABLE_SIZE , ( sizeof ( pgd_t ) < < max ( RADIX_PGD_INDEX_SIZE , H_PGD_INDEX_SIZE ) ) ) ;
2016-04-29 16:25:49 +03:00
# else
2007-09-18 11:22:59 +04:00
DEFINE ( PGD_TABLE_SIZE , PGD_TABLE_SIZE ) ;
2016-04-29 16:25:49 +03:00
# endif
2008-09-24 20:01:24 +04:00
DEFINE ( PTE_SIZE , sizeof ( pte_t ) ) ;
2007-12-06 22:11:04 +03:00
2008-04-17 08:28:09 +04:00
# ifdef CONFIG_KVM
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_HOST_STACK , kvm_vcpu , arch . host_stack ) ;
OFFSET ( VCPU_HOST_PID , kvm_vcpu , arch . host_pid ) ;
OFFSET ( VCPU_GUEST_PID , kvm_vcpu , arch . pid ) ;
OFFSET ( VCPU_GPRS , kvm_vcpu , arch . gpr ) ;
OFFSET ( VCPU_VRSAVE , kvm_vcpu , arch . vrsave ) ;
OFFSET ( VCPU_FPRS , kvm_vcpu , arch . fp . fpr ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
# ifdef CONFIG_ALTIVEC
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_VRS , kvm_vcpu , arch . vr . vr ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
# endif
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_XER , kvm_vcpu , arch . xer ) ;
OFFSET ( VCPU_CTR , kvm_vcpu , arch . ctr ) ;
OFFSET ( VCPU_LR , kvm_vcpu , arch . lr ) ;
2014-04-22 14:26:58 +04:00
# ifdef CONFIG_PPC_BOOK3S
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_TAR , kvm_vcpu , arch . tar ) ;
2014-04-22 14:26:58 +04:00
# endif
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_CR , kvm_vcpu , arch . cr ) ;
OFFSET ( VCPU_PC , kvm_vcpu , arch . pc ) ;
2013-10-07 20:47:52 +04:00
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_MSR , kvm_vcpu , arch . shregs . msr ) ;
OFFSET ( VCPU_SRR0 , kvm_vcpu , arch . shregs . srr0 ) ;
OFFSET ( VCPU_SRR1 , kvm_vcpu , arch . shregs . srr1 ) ;
OFFSET ( VCPU_SPRG0 , kvm_vcpu , arch . shregs . sprg0 ) ;
OFFSET ( VCPU_SPRG1 , kvm_vcpu , arch . shregs . sprg1 ) ;
OFFSET ( VCPU_SPRG2 , kvm_vcpu , arch . shregs . sprg2 ) ;
OFFSET ( VCPU_SPRG3 , kvm_vcpu , arch . shregs . sprg3 ) ;
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code. Currently these
times are accumulated per vcpu in 5 parts of the code:
* rm_entry - time taken from the start of kvmppc_hv_entry() until
just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
guest until we either re-enter the guest or decide to exit to the
host. This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
while other threads in the same vcore are active.
These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings". This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.
The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%. Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-28 06:21:02 +03:00
# endif
# ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_TB_RMENTRY , kvm_vcpu , arch . rm_entry ) ;
OFFSET ( VCPU_TB_RMINTR , kvm_vcpu , arch . rm_intr ) ;
OFFSET ( VCPU_TB_RMEXIT , kvm_vcpu , arch . rm_exit ) ;
OFFSET ( VCPU_TB_GUEST , kvm_vcpu , arch . guest_time ) ;
OFFSET ( VCPU_TB_CEDE , kvm_vcpu , arch . cede_time ) ;
OFFSET ( VCPU_CUR_ACTIVITY , kvm_vcpu , arch . cur_activity ) ;
OFFSET ( VCPU_ACTIVITY_START , kvm_vcpu , arch . cur_tb_start ) ;
OFFSET ( TAS_SEQCOUNT , kvmhv_tb_accumulator , seqcount ) ;
OFFSET ( TAS_TOTAL , kvmhv_tb_accumulator , tb_total ) ;
OFFSET ( TAS_MIN , kvmhv_tb_accumulator , tb_min ) ;
OFFSET ( TAS_MAX , kvmhv_tb_accumulator , tb_max ) ;
# endif
OFFSET ( VCPU_SHARED_SPRG3 , kvm_vcpu_arch_shared , sprg3 ) ;
OFFSET ( VCPU_SHARED_SPRG4 , kvm_vcpu_arch_shared , sprg4 ) ;
OFFSET ( VCPU_SHARED_SPRG5 , kvm_vcpu_arch_shared , sprg5 ) ;
OFFSET ( VCPU_SHARED_SPRG6 , kvm_vcpu_arch_shared , sprg6 ) ;
OFFSET ( VCPU_SHARED_SPRG7 , kvm_vcpu_arch_shared , sprg7 ) ;
OFFSET ( VCPU_SHADOW_PID , kvm_vcpu , arch . shadow_pid ) ;
OFFSET ( VCPU_SHADOW_PID1 , kvm_vcpu , arch . shadow_pid1 ) ;
OFFSET ( VCPU_SHARED , kvm_vcpu , arch . shared ) ;
OFFSET ( VCPU_SHARED_MSR , kvm_vcpu_arch_shared , msr ) ;
OFFSET ( VCPU_SHADOW_MSR , kvm_vcpu , arch . shadow_msr ) ;
2014-04-24 15:46:24 +04:00
# if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_SHAREDBE , kvm_vcpu , arch . shared_big_endian ) ;
2014-04-24 15:46:24 +04:00
# endif
2008-04-17 08:28:09 +04:00
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_SHARED_MAS0 , kvm_vcpu_arch_shared , mas0 ) ;
OFFSET ( VCPU_SHARED_MAS1 , kvm_vcpu_arch_shared , mas1 ) ;
OFFSET ( VCPU_SHARED_MAS2 , kvm_vcpu_arch_shared , mas2 ) ;
OFFSET ( VCPU_SHARED_MAS7_3 , kvm_vcpu_arch_shared , mas7_3 ) ;
OFFSET ( VCPU_SHARED_MAS4 , kvm_vcpu_arch_shared , mas4 ) ;
OFFSET ( VCPU_SHARED_MAS6 , kvm_vcpu_arch_shared , mas6 ) ;
KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn
This allows additional registers to be accessed by the guest
in PR-mode KVM without trapping.
SPRG4-7 are readable from userspace. On booke, KVM will sync
these registers when it enters the guest, so that accesses from
guest userspace will work. The guest kernel, OTOH, must consistently
use either the real registers or the shared area between exits. This
also applies to the already-paravirted SPRG3.
On non-booke, it's not clear to what extent SPRG4-7 are supported
(they're not architected for book3s, but exist on at least some classic
chips). They are copied in the get/set regs ioctls, but I do not see any
non-booke emulation. I also do not see any syncing with real registers
(in PR-mode) including the user-readable SPRG3. This patch should not
make that situation any worse.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-09 04:23:30 +04:00
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_KVM , kvm_vcpu , kvm ) ;
OFFSET ( KVM_LPID , kvm , arch . lpid ) ;
2011-12-20 19:34:43 +04:00
2010-04-16 02:11:42 +04:00
/* book3s */
2013-10-07 20:47:52 +04:00
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
2017-02-15 13:41:20 +03:00
OFFSET ( KVM_TLB_SETS , kvm , arch . tlb_sets ) ;
OFFSET ( KVM_SDR1 , kvm , arch . sdr1 ) ;
OFFSET ( KVM_HOST_LPID , kvm , arch . host_lpid ) ;
OFFSET ( KVM_HOST_LPCR , kvm , arch . host_lpcr ) ;
OFFSET ( KVM_HOST_SDR1 , kvm , arch . host_sdr1 ) ;
OFFSET ( KVM_NEED_FLUSH , kvm , arch . need_tlb_flush . bits ) ;
OFFSET ( KVM_ENABLED_HCALLS , kvm , arch . enabled_hcalls ) ;
OFFSET ( KVM_VRMA_SLB_V , kvm , arch . vrma_slb_v ) ;
OFFSET ( KVM_RADIX , kvm , arch . radix ) ;
OFFSET ( VCPU_DSISR , kvm_vcpu , arch . shregs . dsisr ) ;
OFFSET ( VCPU_DAR , kvm_vcpu , arch . shregs . dar ) ;
OFFSET ( VCPU_VPA , kvm_vcpu , arch . vpa . pinned_addr ) ;
OFFSET ( VCPU_VPA_DIRTY , kvm_vcpu , arch . vpa . dirty ) ;
OFFSET ( VCPU_HEIR , kvm_vcpu , arch . emul_inst ) ;
OFFSET ( VCPU_CPU , kvm_vcpu , cpu ) ;
OFFSET ( VCPU_THREAD_CPU , kvm_vcpu , arch . thread_cpu ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
# endif
2010-04-16 02:11:42 +04:00
# ifdef CONFIG_PPC_BOOK3S
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_PURR , kvm_vcpu , arch . purr ) ;
OFFSET ( VCPU_SPURR , kvm_vcpu , arch . spurr ) ;
OFFSET ( VCPU_IC , kvm_vcpu , arch . ic ) ;
OFFSET ( VCPU_DSCR , kvm_vcpu , arch . dscr ) ;
OFFSET ( VCPU_AMR , kvm_vcpu , arch . amr ) ;
OFFSET ( VCPU_UAMOR , kvm_vcpu , arch . uamor ) ;
OFFSET ( VCPU_IAMR , kvm_vcpu , arch . iamr ) ;
OFFSET ( VCPU_CTRL , kvm_vcpu , arch . ctrl ) ;
OFFSET ( VCPU_DABR , kvm_vcpu , arch . dabr ) ;
OFFSET ( VCPU_DABRX , kvm_vcpu , arch . dabrx ) ;
OFFSET ( VCPU_DAWR , kvm_vcpu , arch . dawr ) ;
OFFSET ( VCPU_DAWRX , kvm_vcpu , arch . dawrx ) ;
OFFSET ( VCPU_CIABR , kvm_vcpu , arch . ciabr ) ;
OFFSET ( VCPU_HFLAGS , kvm_vcpu , arch . hflags ) ;
OFFSET ( VCPU_DEC , kvm_vcpu , arch . dec ) ;
OFFSET ( VCPU_DEC_EXPIRES , kvm_vcpu , arch . dec_expires ) ;
OFFSET ( VCPU_PENDING_EXC , kvm_vcpu , arch . pending_exceptions ) ;
OFFSET ( VCPU_CEDED , kvm_vcpu , arch . ceded ) ;
OFFSET ( VCPU_PRODDED , kvm_vcpu , arch . prodded ) ;
KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9
On POWER9, we no longer have the restriction that we had on POWER8
where all threads in a core have to be in the same partition, so
the CPU threads are now independent. However, we still want to be
able to run guests with a virtual SMT topology, if only to allow
migration of guests from POWER8 systems to POWER9.
A guest that has a virtual SMT mode greater than 1 will expect to
be able to use the doorbell facility; it will expect the msgsndp
and msgclrp instructions to work appropriately and to be able to read
sensible values from the TIR (thread identification register) and
DPDES (directed privileged doorbell exception status) special-purpose
registers. However, since each CPU thread is a separate sub-processor
in POWER9, these instructions and registers can only be used within
a single CPU thread.
In order for these instructions to appear to act correctly according
to the guest's virtual SMT mode, we have to trap and emulate them.
We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
register. The emulation is triggered by the hypervisor facility
unavailable interrupt that occurs when the guest uses them.
To cause a doorbell interrupt to occur within the guest, we set the
DPDES register to 1. If the guest has interrupts enabled, the CPU
will generate a doorbell interrupt and clear the DPDES register in
hardware. The DPDES hardware register for the guest is saved in the
vcpu->arch.vcore->dpdes field. Since this gets written by the guest
exit code, other VCPUs wishing to cause a doorbell interrupt don't
write that field directly, but instead set a vcpu->arch.doorbell_request
flag. This is consumed and set to 0 by the guest entry code, which
then sets DPDES to 1.
Emulating reads of the DPDES register is somewhat involved, because
it requires reading the doorbell pending interrupt status of all of the
VCPU threads in the virtual core, and if any of those VCPUs are
running, their doorbell status is only up-to-date in the hardware
DPDES registers of the CPUs where they are running. In order to get
a reasonable approximation of the current doorbell status, we send
those CPUs an IPI, causing an exit from the guest which will update
the vcpu->arch.vcore->dpdes field. We then use that value in
constructing the emulated DPDES register value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-16 09:41:20 +03:00
OFFSET ( VCPU_DBELL_REQ , kvm_vcpu , arch . doorbell_request ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_MMCR , kvm_vcpu , arch . mmcr ) ;
OFFSET ( VCPU_PMC , kvm_vcpu , arch . pmc ) ;
OFFSET ( VCPU_SPMC , kvm_vcpu , arch . spmc ) ;
OFFSET ( VCPU_SIAR , kvm_vcpu , arch . siar ) ;
OFFSET ( VCPU_SDAR , kvm_vcpu , arch . sdar ) ;
OFFSET ( VCPU_SIER , kvm_vcpu , arch . sier ) ;
OFFSET ( VCPU_SLB , kvm_vcpu , arch . slb ) ;
OFFSET ( VCPU_SLB_MAX , kvm_vcpu , arch . slb_max ) ;
OFFSET ( VCPU_SLB_NR , kvm_vcpu , arch . slb_nr ) ;
OFFSET ( VCPU_FAULT_DSISR , kvm_vcpu , arch . fault_dsisr ) ;
OFFSET ( VCPU_FAULT_DAR , kvm_vcpu , arch . fault_dar ) ;
OFFSET ( VCPU_FAULT_GPA , kvm_vcpu , arch . fault_gpa ) ;
OFFSET ( VCPU_INTR_MSR , kvm_vcpu , arch . intr_msr ) ;
OFFSET ( VCPU_LAST_INST , kvm_vcpu , arch . last_inst ) ;
OFFSET ( VCPU_TRAP , kvm_vcpu , arch . trap ) ;
OFFSET ( VCPU_CFAR , kvm_vcpu , arch . cfar ) ;
OFFSET ( VCPU_PPR , kvm_vcpu , arch . ppr ) ;
OFFSET ( VCPU_FSCR , kvm_vcpu , arch . fscr ) ;
OFFSET ( VCPU_PSPB , kvm_vcpu , arch . pspb ) ;
OFFSET ( VCPU_EBBHR , kvm_vcpu , arch . ebbhr ) ;
OFFSET ( VCPU_EBBRR , kvm_vcpu , arch . ebbrr ) ;
OFFSET ( VCPU_BESCR , kvm_vcpu , arch . bescr ) ;
OFFSET ( VCPU_CSIGR , kvm_vcpu , arch . csigr ) ;
OFFSET ( VCPU_TACR , kvm_vcpu , arch . tacr ) ;
OFFSET ( VCPU_TCSCR , kvm_vcpu , arch . tcscr ) ;
OFFSET ( VCPU_ACOP , kvm_vcpu , arch . acop ) ;
OFFSET ( VCPU_WORT , kvm_vcpu , arch . wort ) ;
OFFSET ( VCPU_TID , kvm_vcpu , arch . tid ) ;
OFFSET ( VCPU_PSSCR , kvm_vcpu , arch . psscr ) ;
2017-02-15 06:30:17 +03:00
OFFSET ( VCPU_HFSCR , kvm_vcpu , arch . hfscr ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( VCORE_ENTRY_EXIT , kvmppc_vcore , entry_exit_map ) ;
OFFSET ( VCORE_IN_GUEST , kvmppc_vcore , in_guest ) ;
OFFSET ( VCORE_NAPPING_THREADS , kvmppc_vcore , napping_threads ) ;
OFFSET ( VCORE_KVM , kvmppc_vcore , kvm ) ;
OFFSET ( VCORE_TB_OFFSET , kvmppc_vcore , tb_offset ) ;
OFFSET ( VCORE_LPCR , kvmppc_vcore , lpcr ) ;
OFFSET ( VCORE_PCR , kvmppc_vcore , pcr ) ;
OFFSET ( VCORE_DPDES , kvmppc_vcore , dpdes ) ;
OFFSET ( VCORE_VTB , kvmppc_vcore , vtb ) ;
OFFSET ( VCPU_SLB_E , kvmppc_slb , orige ) ;
OFFSET ( VCPU_SLB_V , kvmppc_slb , origv ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
DEFINE ( VCPU_SLB_SIZE , sizeof ( struct kvmppc_slb ) ) ;
2014-01-08 14:25:32 +04:00
# ifdef CONFIG_PPC_TRANSACTIONAL_MEM
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_TFHAR , kvm_vcpu , arch . tfhar ) ;
OFFSET ( VCPU_TFIAR , kvm_vcpu , arch . tfiar ) ;
OFFSET ( VCPU_TEXASR , kvm_vcpu , arch . texasr ) ;
OFFSET ( VCPU_GPR_TM , kvm_vcpu , arch . gpr_tm ) ;
OFFSET ( VCPU_FPRS_TM , kvm_vcpu , arch . fp_tm . fpr ) ;
OFFSET ( VCPU_VRS_TM , kvm_vcpu , arch . vr_tm . vr ) ;
OFFSET ( VCPU_VRSAVE_TM , kvm_vcpu , arch . vrsave_tm ) ;
OFFSET ( VCPU_CR_TM , kvm_vcpu , arch . cr_tm ) ;
OFFSET ( VCPU_XER_TM , kvm_vcpu , arch . xer_tm ) ;
OFFSET ( VCPU_LR_TM , kvm_vcpu , arch . lr_tm ) ;
OFFSET ( VCPU_CTR_TM , kvm_vcpu , arch . ctr_tm ) ;
OFFSET ( VCPU_AMR_TM , kvm_vcpu , arch . amr_tm ) ;
OFFSET ( VCPU_PPR_TM , kvm_vcpu , arch . ppr_tm ) ;
OFFSET ( VCPU_DSCR_TM , kvm_vcpu , arch . dscr_tm ) ;
OFFSET ( VCPU_TAR_TM , kvm_vcpu , arch . tar_tm ) ;
2014-01-08 14:25:32 +04:00
# endif
2011-06-29 04:20:58 +04:00
# ifdef CONFIG_PPC_BOOK3S_64
2013-10-07 20:47:51 +04:00
# ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
2017-02-15 13:41:20 +03:00
OFFSET ( PACA_SVCPU , paca_struct , shadow_vcpu ) ;
2011-06-29 04:20:58 +04:00
# define SVCPU_FIELD(x, f) DEFINE(x, offsetof(struct paca_struct, shadow_vcpu.f))
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
# else
# define SVCPU_FIELD(x, f)
# endif
2011-06-29 04:20:58 +04:00
# define HSTATE_FIELD(x, f) DEFINE(x, offsetof(struct paca_struct, kvm_hstate.f))
# else /* 32-bit */
# define SVCPU_FIELD(x, f) DEFINE(x, offsetof(struct kvmppc_book3s_shadow_vcpu, f))
# define HSTATE_FIELD(x, f) DEFINE(x, offsetof(struct kvmppc_book3s_shadow_vcpu, hstate.f))
# endif
SVCPU_FIELD ( SVCPU_CR , cr ) ;
SVCPU_FIELD ( SVCPU_XER , xer ) ;
SVCPU_FIELD ( SVCPU_CTR , ctr ) ;
SVCPU_FIELD ( SVCPU_LR , lr ) ;
SVCPU_FIELD ( SVCPU_PC , pc ) ;
SVCPU_FIELD ( SVCPU_R0 , gpr [ 0 ] ) ;
SVCPU_FIELD ( SVCPU_R1 , gpr [ 1 ] ) ;
SVCPU_FIELD ( SVCPU_R2 , gpr [ 2 ] ) ;
SVCPU_FIELD ( SVCPU_R3 , gpr [ 3 ] ) ;
SVCPU_FIELD ( SVCPU_R4 , gpr [ 4 ] ) ;
SVCPU_FIELD ( SVCPU_R5 , gpr [ 5 ] ) ;
SVCPU_FIELD ( SVCPU_R6 , gpr [ 6 ] ) ;
SVCPU_FIELD ( SVCPU_R7 , gpr [ 7 ] ) ;
SVCPU_FIELD ( SVCPU_R8 , gpr [ 8 ] ) ;
SVCPU_FIELD ( SVCPU_R9 , gpr [ 9 ] ) ;
SVCPU_FIELD ( SVCPU_R10 , gpr [ 10 ] ) ;
SVCPU_FIELD ( SVCPU_R11 , gpr [ 11 ] ) ;
SVCPU_FIELD ( SVCPU_R12 , gpr [ 12 ] ) ;
SVCPU_FIELD ( SVCPU_R13 , gpr [ 13 ] ) ;
SVCPU_FIELD ( SVCPU_FAULT_DSISR , fault_dsisr ) ;
SVCPU_FIELD ( SVCPU_FAULT_DAR , fault_dar ) ;
SVCPU_FIELD ( SVCPU_LAST_INST , last_inst ) ;
SVCPU_FIELD ( SVCPU_SHADOW_SRR1 , shadow_srr1 ) ;
2010-04-16 02:11:44 +04:00
# ifdef CONFIG_PPC_BOOK3S_32
2011-06-29 04:20:58 +04:00
SVCPU_FIELD ( SVCPU_SR , sr ) ;
2010-04-16 02:11:44 +04:00
# endif
2011-06-29 04:20:58 +04:00
# ifdef CONFIG_PPC64
SVCPU_FIELD ( SVCPU_SLB , slb ) ;
SVCPU_FIELD ( SVCPU_SLB_MAX , slb_max ) ;
2014-04-29 18:48:44 +04:00
SVCPU_FIELD ( SVCPU_SHADOW_FSCR , shadow_fscr ) ;
2011-06-29 04:20:58 +04:00
# endif
HSTATE_FIELD ( HSTATE_HOST_R1 , host_r1 ) ;
HSTATE_FIELD ( HSTATE_HOST_R2 , host_r2 ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
HSTATE_FIELD ( HSTATE_HOST_MSR , host_msr ) ;
2011-06-29 04:20:58 +04:00
HSTATE_FIELD ( HSTATE_VMHANDLER , vmhandler ) ;
HSTATE_FIELD ( HSTATE_SCRATCH0 , scratch0 ) ;
HSTATE_FIELD ( HSTATE_SCRATCH1 , scratch1 ) ;
2013-11-11 17:59:47 +04:00
HSTATE_FIELD ( HSTATE_SCRATCH2 , scratch2 ) ;
2011-06-29 04:20:58 +04:00
HSTATE_FIELD ( HSTATE_IN_GUEST , in_guest ) ;
2011-07-23 11:41:44 +04:00
HSTATE_FIELD ( HSTATE_RESTORE_HID5 , restore_hid5 ) ;
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-23 11:42:46 +04:00
HSTATE_FIELD ( HSTATE_NAPPING , napping ) ;
2011-06-29 04:20:58 +04:00
2013-10-07 20:47:52 +04:00
# ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
2012-03-06 01:42:25 +04:00
HSTATE_FIELD ( HSTATE_HWTHREAD_REQ , hwthread_req ) ;
HSTATE_FIELD ( HSTATE_HWTHREAD_STATE , hwthread_state ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
HSTATE_FIELD ( HSTATE_KVM_VCPU , kvm_vcpu ) ;
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:23:08 +04:00
HSTATE_FIELD ( HSTATE_KVM_VCORE , kvm_vcore ) ;
HSTATE_FIELD ( HSTATE_XICS_PHYS , xics_phys ) ;
2017-04-05 10:54:56 +03:00
HSTATE_FIELD ( HSTATE_XIVE_TIMA_PHYS , xive_tima_phys ) ;
HSTATE_FIELD ( HSTATE_XIVE_TIMA_VIRT , xive_tima_virt ) ;
2013-04-18 00:30:50 +04:00
HSTATE_FIELD ( HSTATE_SAVED_XIRR , saved_xirr ) ;
HSTATE_FIELD ( HSTATE_HOST_IPI , host_ipi ) ;
KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core. Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers. Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).
POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core. Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.
This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace. However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.
To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA. Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.
In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest. In the latter case, thread 0 jumps to the
code that switches the MMU back to the host. This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host. This has required substantial
code movement, making the diff rather large.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-08 14:25:20 +04:00
HSTATE_FIELD ( HSTATE_PTID , ptid ) ;
2014-07-10 13:34:31 +04:00
HSTATE_FIELD ( HSTATE_MMCR0 , host_mmcr [ 0 ] ) ;
HSTATE_FIELD ( HSTATE_MMCR1 , host_mmcr [ 1 ] ) ;
HSTATE_FIELD ( HSTATE_MMCRA , host_mmcr [ 2 ] ) ;
HSTATE_FIELD ( HSTATE_SIAR , host_mmcr [ 3 ] ) ;
HSTATE_FIELD ( HSTATE_SDAR , host_mmcr [ 4 ] ) ;
HSTATE_FIELD ( HSTATE_MMCR2 , host_mmcr [ 5 ] ) ;
HSTATE_FIELD ( HSTATE_SIER , host_mmcr [ 6 ] ) ;
HSTATE_FIELD ( HSTATE_PMC1 , host_pmc [ 0 ] ) ;
HSTATE_FIELD ( HSTATE_PMC2 , host_pmc [ 1 ] ) ;
HSTATE_FIELD ( HSTATE_PMC3 , host_pmc [ 2 ] ) ;
HSTATE_FIELD ( HSTATE_PMC4 , host_pmc [ 3 ] ) ;
HSTATE_FIELD ( HSTATE_PMC5 , host_pmc [ 4 ] ) ;
HSTATE_FIELD ( HSTATE_PMC6 , host_pmc [ 5 ] ) ;
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
HSTATE_FIELD ( HSTATE_PURR , host_purr ) ;
HSTATE_FIELD ( HSTATE_SPURR , host_spurr ) ;
HSTATE_FIELD ( HSTATE_DSCR , host_dscr ) ;
HSTATE_FIELD ( HSTATE_DABR , dabr ) ;
HSTATE_FIELD ( HSTATE_DECEXP , dec_expires ) ;
KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip. Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core. With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.
Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.
To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread. In addition we extend the core_info struct to
have information on each subcore. When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.
Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode. It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.
This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap. In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered. The default is 6.
With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch. These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.
It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest. In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path. In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-02 13:38:16 +03:00
HSTATE_FIELD ( HSTATE_SPLIT_MODE , kvm_split_mode ) ;
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-23 11:42:46 +04:00
DEFINE ( IPI_PRIORITY , IPI_PRIORITY ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( KVM_SPLIT_RPR , kvm_split_mode , rpr ) ;
OFFSET ( KVM_SPLIT_PMMAR , kvm_split_mode , pmmar ) ;
OFFSET ( KVM_SPLIT_LDBAR , kvm_split_mode , ldbar ) ;
OFFSET ( KVM_SPLIT_DO_NAP , kvm_split_mode , do_nap ) ;
OFFSET ( KVM_SPLIT_NAPPED , kvm_split_mode , napped ) ;
2013-10-07 20:47:52 +04:00
# endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 04:21:34 +04:00
2013-02-04 22:10:51 +04:00
# ifdef CONFIG_PPC_BOOK3S_64
HSTATE_FIELD ( HSTATE_CFAR , cfar ) ;
2013-09-20 08:52:39 +04:00
HSTATE_FIELD ( HSTATE_PPR , ppr ) ;
2014-04-29 18:48:44 +04:00
HSTATE_FIELD ( HSTATE_HOST_FSCR , host_fscr ) ;
2013-02-04 22:10:51 +04:00
# endif /* CONFIG_PPC_BOOK3S_64 */
2011-06-29 04:20:58 +04:00
# else /* CONFIG_PPC_BOOK3S */
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_CR , kvm_vcpu , arch . cr ) ;
OFFSET ( VCPU_XER , kvm_vcpu , arch . xer ) ;
OFFSET ( VCPU_LR , kvm_vcpu , arch . lr ) ;
OFFSET ( VCPU_CTR , kvm_vcpu , arch . ctr ) ;
OFFSET ( VCPU_PC , kvm_vcpu , arch . pc ) ;
OFFSET ( VCPU_SPRG9 , kvm_vcpu , arch . sprg9 ) ;
OFFSET ( VCPU_LAST_INST , kvm_vcpu , arch . last_inst ) ;
OFFSET ( VCPU_FAULT_DEAR , kvm_vcpu , arch . fault_dear ) ;
OFFSET ( VCPU_FAULT_ESR , kvm_vcpu , arch . fault_esr ) ;
OFFSET ( VCPU_CRIT_SAVE , kvm_vcpu , arch . crit_save ) ;
2010-04-16 02:11:42 +04:00
# endif /* CONFIG_PPC_BOOK3S */
2011-06-29 04:20:58 +04:00
# endif /* CONFIG_KVM */
2010-07-29 16:47:57 +04:00
# ifdef CONFIG_KVM_GUEST
2017-02-15 13:41:20 +03:00
OFFSET ( KVM_MAGIC_SCRATCH1 , kvm_vcpu_arch_shared , scratch1 ) ;
OFFSET ( KVM_MAGIC_SCRATCH2 , kvm_vcpu_arch_shared , scratch2 ) ;
OFFSET ( KVM_MAGIC_SCRATCH3 , kvm_vcpu_arch_shared , scratch3 ) ;
OFFSET ( KVM_MAGIC_INT , kvm_vcpu_arch_shared , int_pending ) ;
OFFSET ( KVM_MAGIC_MSR , kvm_vcpu_arch_shared , msr ) ;
OFFSET ( KVM_MAGIC_CRITICAL , kvm_vcpu_arch_shared , critical ) ;
OFFSET ( KVM_MAGIC_SR , kvm_vcpu_arch_shared , sr ) ;
2010-07-29 16:47:57 +04:00
# endif
2008-12-11 04:55:41 +03:00
# ifdef CONFIG_44x
DEFINE ( PGD_T_LOG2 , PGD_T_LOG2 ) ;
DEFINE ( PTE_T_LOG2 , PTE_T_LOG2 ) ;
# endif
2009-10-17 03:48:40 +04:00
# ifdef CONFIG_PPC_FSL_BOOK3E
2010-05-13 23:38:21 +04:00
DEFINE ( TLBCAM_SIZE , sizeof ( struct tlbcam ) ) ;
2017-02-15 13:41:20 +03:00
OFFSET ( TLBCAM_MAS0 , tlbcam , MAS0 ) ;
OFFSET ( TLBCAM_MAS1 , tlbcam , MAS1 ) ;
OFFSET ( TLBCAM_MAS2 , tlbcam , MAS2 ) ;
OFFSET ( TLBCAM_MAS3 , tlbcam , MAS3 ) ;
OFFSET ( TLBCAM_MAS7 , tlbcam , MAS7 ) ;
2010-05-13 23:38:21 +04:00
# endif
2008-04-17 08:28:09 +04:00
2011-06-15 03:34:31 +04:00
# if defined(CONFIG_KVM) && defined(CONFIG_SPE)
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_EVR , kvm_vcpu , arch . evr [ 0 ] ) ;
OFFSET ( VCPU_ACC , kvm_vcpu , arch . acc ) ;
OFFSET ( VCPU_SPEFSCR , kvm_vcpu , arch . spefscr ) ;
OFFSET ( VCPU_HOST_SPEFSCR , kvm_vcpu , arch . host_spefscr ) ;
2011-06-15 03:34:31 +04:00
# endif
2011-12-20 19:34:43 +04:00
# ifdef CONFIG_KVM_BOOKE_HV
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_HOST_MAS4 , kvm_vcpu , arch . host_mas4 ) ;
OFFSET ( VCPU_HOST_MAS6 , kvm_vcpu , arch . host_mas6 ) ;
2011-12-20 19:34:43 +04:00
# endif
2017-04-05 10:54:56 +03:00
# ifdef CONFIG_KVM_XICS
DEFINE ( VCPU_XIVE_SAVED_STATE , offsetof ( struct kvm_vcpu ,
arch . xive_saved_state ) ) ;
DEFINE ( VCPU_XIVE_CAM_WORD , offsetof ( struct kvm_vcpu ,
arch . xive_cam_word ) ) ;
DEFINE ( VCPU_XIVE_PUSHED , offsetof ( struct kvm_vcpu , arch . xive_pushed ) ) ;
# endif
2008-12-03 00:51:57 +03:00
# ifdef CONFIG_KVM_EXIT_TIMING
2017-02-15 13:41:20 +03:00
OFFSET ( VCPU_TIMING_EXIT_TBU , kvm_vcpu , arch . timing_exit . tv32 . tbu ) ;
OFFSET ( VCPU_TIMING_EXIT_TBL , kvm_vcpu , arch . timing_exit . tv32 . tbl ) ;
OFFSET ( VCPU_TIMING_LAST_ENTER_TBU , kvm_vcpu , arch . timing_last_enter . tv32 . tbu ) ;
OFFSET ( VCPU_TIMING_LAST_ENTER_TBL , kvm_vcpu , arch . timing_last_enter . tv32 . tbl ) ;
2008-12-03 00:51:57 +03:00
# endif
2014-12-09 21:56:52 +03:00
# ifdef CONFIG_PPC_POWERNV
2017-02-15 13:41:20 +03:00
OFFSET ( PACA_CORE_IDLE_STATE_PTR , paca_struct , core_idle_state_ptr ) ;
OFFSET ( PACA_THREAD_IDLE_STATE , paca_struct , thread_idle_state ) ;
OFFSET ( PACA_THREAD_MASK , paca_struct , thread_mask ) ;
OFFSET ( PACA_SUBCORE_SIBLING_MASK , paca_struct , subcore_sibling_mask ) ;
2017-03-22 18:04:17 +03:00
OFFSET ( PACA_SIBLING_PACA_PTRS , paca_struct , thread_sibling_pacas ) ;
2014-12-09 21:56:52 +03:00
# endif
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller. This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.
Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:
rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
rm_intr: 1660.0ns (12 - 553050, 3600051 samples)
and this after the change:
rm_entry: 3060.1ns (212 - 65138, 953873 samples)
rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)
for a test of booting Fedora 20 big-endian to the login prompt.
The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.
The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-28 06:21:12 +03:00
DEFINE ( PPC_DBELL_SERVER , PPC_DBELL_SERVER ) ;
powerpc/8xx: Fix vaddr for IMMR early remap
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
* 0xfffdf000..0xfffff000 : fixmap
* 0xfde00000..0xfe000000 : consistent mem
* 0xfddf6000..0xfde00000 : early ioremap
* 0xc9000000..0xfddf6000 : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Today, IMMR is mapped 1:1 at startup
Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area
This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.
The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-05-17 10:02:43 +03:00
# ifdef CONFIG_PPC_8xx
2016-07-09 11:22:39 +03:00
DEFINE ( VIRT_IMMR_BASE , ( u64 ) __fix_to_virt ( FIX_IMMR_BASE ) ) ;
powerpc/8xx: Fix vaddr for IMMR early remap
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
* 0xfffdf000..0xfffff000 : fixmap
* 0xfde00000..0xfe000000 : consistent mem
* 0xfddf6000..0xfde00000 : early ioremap
* 0xc9000000..0xfddf6000 : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Today, IMMR is mapped 1:1 at startup
Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area
This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.
The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-05-17 10:02:43 +03:00
# endif
2005-09-26 10:04:21 +04:00
return 0 ;
}