IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.
However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.
Clean the callback up, and modify toggle_interruptibility to
match the comment above the call. As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction. However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).
In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment). So there is no need to
call guest_state_valid in the emulation loop.
Emulation performance test results indicate 1650-2600 cycles
for common instructions, versus 2300-3200 before this patch on
a Sandy Bridge Xeon.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 575203 the MCE subsystem in the Linux kernel for AMD sets bit 18
in MSR_K7_HWCR. Running such a kernel as a guest in KVM on an AMD host results
in a GPE injected into the guest because kvm_set_msr_common returns 1. This
patch fixes this by masking bit 18 from the MSR value desired by the guest.
Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We encountered a scenario in which after an INIT is delivered, a pending
interrupt is delivered, although it was sent before the INIT. As the SDM
states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as
KVM does. This also means that pending interrupts should be cleared. This
patch clears upon reset (and INIT) the pending interrupts; and at the same
occassion clears the pending exceptions, since they may cause a similar issue.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have noticed that qemu-kvm hangs early in the BIOS when runnning nested
under some versions of VMware ESXi.
The problem we believe is because KVM assumes that the platform preserves
the 'G' but for any segment register. The SVM specification itemizes the
segment attribute bits that are observed by the CPU, but the (G)ranularity bit
is not one of the bits itemized, for any segment. Though current AMD CPUs keep
track of the (G)ranularity bit for all segment registers other than CS, the
specification does not require it. VMware's virtual CPU may not track the
(G)ranularity bit for any segment register.
Since kvm already synthesizes the (G)ranularity bit for the CS segment. It
should do so for all segments. The patch below does that, and helps get rid of
the hangs. Patch applies on top of Linus' tree.
Signed-off-by: Jim Mattson <jmattson@vmware.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Document the MIPS specific parts of the KVM API, including:
- The layout of the kvm_regs structure.
- The interrupt number passed to KVM_INTERRUPT.
- The registers supported by the KVM_{GET,SET}_ONE_REG interface, and
the encoding of those register ids.
- That KVM_INTERRUPT and KVM_GET_REG_LIST are supported on MIPS.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the MIPS registers that can be accessed with the
KVM_{GET,SET}_ONE_REG interface have fairly long names, so widen the
Register column of the table in the KVM_SET_ONE_REG documentation to
allow them to fit.
Tabs in the table are replaced with spaces at the same time for
consistency.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_SET_SIGNAL_MASK is implemented in generic code and isn't x86
specific, so document it as being applicable for all architectures.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In two cases lapic.c does not use the apic_debug macro correctly. This patch
fixes them.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.
The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.
Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):
The first one:
0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]
The second one:
0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
#0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
#1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
#2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
#3 kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
#4 cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
#5 cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
#6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390
The third one:
0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
#0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
#1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
#2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
#3 kvm_cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
#4 cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
#5 cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
#6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482
The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).
I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The access size of an in/ins is reported in dst_bytes, and that of
out/outs in src_bytes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
First, kvm_read_guest returns 0 on success. And then we need to take the
access size into account when testing the bitmap: intercept if any of
bits corresponding to the access is set.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CLTS only changes TS which is not monitored by selected CR0
interception. So skip any attempt to translate WRITE_CR0 to
CR0_SEL_WRITE for this instruction.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A struct member variable is set to the same value more than once
This was found using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's impossible to fall into the error handling of the TLB index after
being masked by (KVM_MIPS_GUEST_TLB_SIZE - 1). Remove the dead code.
Reported-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The commpage is allocated using kzalloc(), so there's no need of cleaning
the memory of the kvm_mips_commpage struct and its internal mips_coproc.
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since all the files are in arch/mips/kvm/, there's no need of the prefixes
"kvm_" and "kvm_mips_".
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The keyword volatile for idx in the TLB functions is unnecessary.
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We import the CPL via SS.DPL since ae9fedc793. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
sie.h was missing in arch/s390/include/uapi/asm/Kbuild and therefore missed
the "make headers_check" target.
If added it reveals that also arch/s390/include/asm/sigp.h would become uapi.
This is something we certainly do not want. So remove that dependency as well.
The header file was merged with ceae283bb2e0176c "KVM: s390: add sie exit
reasons tables", therefore we never had a kernel release with this commit and
can still change anything.
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in
64-bit mode. The current implementation is broken since it does not use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it ignores
cs.l. This patch fixes this behavior.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On 32-bit mode only bits [31:0] of the CR should be used for setting the CR
value. Otherwise, the host may incorrectly assume the value is invalid if bits
[63:32] are not zero. Moreover, the CR is currently being read twice when CR8
is used. Last, nested mov-cr exiting is modified to handle the CR value
correctly as well.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, the hypercall handling routine only considers LME as an indication
to whether the guest uses 32/64-bit mode. This is incosistent with hyperv
hypercalls handling and against the common sense of considering cs.l as well.
This patch uses is_64_bit_mode instead of is_long_mode for that matter. In
addition, the result is masked in respect to the guest execution mode. Last, it
changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many real CPUs get this wrong as well, but ours is totally off: bits 9:1
define the highest index value.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow L1 to "leak" its debug controls into L2, i.e. permit cleared
VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS. This requires to manually
transfer the state of DR7 and IA32_DEBUGCTLMSR from L1 into L2 as both
run on different VMCS.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We already have this control enabled by exposing a broken
MSR_IA32_VMX_PROCBASED_CTLS value. This will properly advertise our
capability once the value is fixed by clearing the right bits in
MSR_IA32_VMX_TRUE_PROCBASED_CTLS. We also have to ensure to test the
right value on L2 entry.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We already implemented them but failed to advertise them. Currently they
all return the identical values to the capability MSRs they are
augmenting. So there is no change in exposed features yet.
Drop related comments at this chance that are partially incorrect and
redundant anyway.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The spec says those controls are at bit position 2 - makes 4 as value.
The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On long-mode the current NOP (0x90) emulation still writes back to RAX. As a
result, EAX is zero-extended and the high 32-bits of RAX are cleared.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even if the condition of cmov is not satisfied, bits[63:32] should be cleared.
This is clearly stated in Intel's CMOVcc documentation. The solution is to
reassign the destination onto itself if the condition is unsatisfied. For that
matter the original destination value needs to be read.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return unhandlable error on inter-privilege level ret instruction. This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The emulator does not emulate the xadd instruction correctly if the two
operands are the same. In this (unlikely) situation the result should be the
sum of X and X (2X) when it is currently X. The solution is to first perform
writeback to the source, before writing to the destination. The only
instruction which should be affected is xadd, as the other instructions that
perform writeback to the source use the extended accumlator (e.g., RAX:RDX).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current emulation of bit operations ignores the offset from the destination
on 64-bit target memory operands. This patch fixes this behavior.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
use mm.h definition
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We did not do that when interruptibility was added to the emulator,
because at the time pop to segment was not implemented. Now it is,
add it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In 64-bit mode, when the destination is a register, the assignment is done
according to the operand size. Otherwise (memory operand or no 64-bit mode), a
16-bit assignment is performed.
Currently, 16-bit assignment is always done to the destination.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
cmpxchg16b is currently unimplemented in the emulator. The least we can do is
return error upon the emulation of this instruction.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The rdpmc emulation checks that the counter (ECX) is not higher than 2, without
taking into considerations bits 30:31 role (e.g., bit 30 marks whether the
counter is fixed). The fix uses the pmu information for checking the validity
of the pmu counter.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the operand-size prefix (0x66) is used in 64-bit mode, the emulator would
assume the destination operand is 64-bit, when it should be 32-bit.
Reminder: movnti does not support 16-bit operands and its default operand size
is 32-bit.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current implementation of cmpxchg does not update the flags correctly,
since the accumulator should be compared with the destination and not the other
way around. The current implementation does not update the flags correctly.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>