74 Commits

Author SHA1 Message Date
Paolo Bonzini
c4f71901d5 KVM/arm64 updates for Linux 5.13
New features:
 
 - Stage-2 isolation for the host kernel when running in protected mode
 - Guest SVE support when running in nVHE mode
 - Force W^X hypervisor mappings in nVHE mode
 - ITS save/restore for guests using direct injection with GICv4.1
 - nVHE panics now produce readable backtraces
 - Guest support for PTP using the ptp_kvm driver
 - Performance improvements in the S2 fault handler
 - Alexandru is now a reviewer (not really a new feature...)
 
 Fixes:
 - Proper emulation of the GICR_TYPER register
 - Handle the complete set of relocation in the nVHE EL2 object
 - Get rid of the oprofile dependency in the PMU code (and of the
   oprofile body parts at the same time)
 - Debug and SPE fixes
 - Fix vcpu reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
 qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
 UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
 CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
 xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
 ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
 j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
 iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
 m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
 1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
 T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
 ACeKW1w5
 =fpey
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
  oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
2021-04-23 07:41:17 -04:00
Sean Christopherson
cd4c718352 KVM: arm64: Convert to the gfn-based MMU notifier callbacks
Move arm64 to the gfn-base MMU notifier APIs, which do the hva->gfn
lookup in common code.

No meaningful functional change intended, though the exact order of
operations is slightly different since the memslot lookups occur before
calling into arch code.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:07 -04:00
Sean Christopherson
501b918525 KVM: Move arm64's MMU notifier trace events to generic code
Move arm64's MMU notifier trace events into common code in preparation
for doing the hva->gfn lookup in common code.  The alternative would be
to trace the gfn instead of hva, but that's not obviously better and
could also be done in common code.  Tracing the notifiers is also quite
handy for debug regardless of architecture.

Remove a completely redundant tracepoint from PPC e500.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:56 -04:00
Marc Zyngier
3d63ef4d52 Merge branch 'kvm-arm64/memslot-fixes' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:40 +01:00
Gavin Shan
10ba2d17d2 KVM: arm64: Don't retrieve memory slot again in page fault handler
We needn't retrieve the memory slot again in user_mem_abort() because
the corresponding memory slot has been passed from the caller. This
would save some CPU cycles. For example, the time used to write 1GB
memory, which is backed by 2MB hugetlb pages and write-protected, is
dropped by 6.8% from 928ms to 864ms.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-4-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Gavin Shan
c728fd4ce7 KVM: arm64: Use find_vma_intersection()
find_vma_intersection() has been existing to search the intersected
vma. This uses the function where it's applicable, to simplify the
code.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-3-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Gavin Shan
eab6214847 KVM: arm64: Hide kvm_mmu_wp_memory_region()
We needn't expose the function as it's only used by mmu.c since it
was introduced by commit c64735554c0a ("KVM: arm: Add initial dirty
page locking support").

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-2-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Quentin Perret
cfb1a98de7 KVM: arm64: Use kvm_arch in kvm_s2_mmu
In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-21-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
834cd93deb KVM: arm64: Use kvm_arch for stage 2 pgtable
In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-20-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
bfa79a8054 KVM: arm64: Elevate hypervisor mappings creation at EL2
Previous commits have introduced infrastructure to enable the EL2 code
to manage its own stage 1 mappings. However, this was preliminary work,
and none of it is currently in use.

Put all of this together by elevating the mapping creation at EL2 when
memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ EL2 mappings, only used while
initializing the hypervisor, but frees them right after.

As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-19-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
7aef0cbcdc KVM: arm64: Factor memory allocation out of pgtable.c
In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.

No functional changes intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-7-qperret@google.com
2021-03-19 12:01:20 +00:00
Marc Zyngier
262b003d05 KVM: arm64: Fix exclusive limit for IPA size
When registering a memslot, we check the size and location of that
memslot against the IPA size to ensure that we can provide guest
access to the whole of the memory.

Unfortunately, this check rejects memslot that end-up at the exact
limit of the addressing capability for a given IPA size. For example,
it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit
IPA space.

Fix it by relaxing the check to accept a memslot reaching the
limit of the IPA space.

Fixes: c3058d5da222 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org
2021-03-12 15:43:22 +00:00
Yanan Wang
509552e65a KVM: arm64: Mark the page dirty only if the fault is handled successfully
We now set the pfn dirty and mark the page dirty before calling fault
handlers in user_mem_abort(), so we might end up having spurious dirty
pages if update of permissions or mapping has failed. Let's move these
two operations after the fault handlers, and they will be done only if
the fault has been handled successfully.

When an -EAGAIN errno is returned from the map handler, we hope to the
vcpu to enter guest directly instead of exiting back to userspace, so
adjust the return value at the end of function.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-4-wangyanan55@huawei.com
2021-01-25 16:30:20 +00:00
Linus Torvalds
6a447b0e31 ARM:
* PSCI relay at EL2 when "protected KVM" is enabled
 * New exception injection code
 * Simplification of AArch32 system register handling
 * Fix PMU accesses when no PMU is enabled
 * Expose CSV3 on non-Meltdown hosts
 * Cache hierarchy discovery fixes
 * PV steal-time cleanups
 * Allow function pointers at EL2
 * Various host EL2 entry cleanups
 * Simplification of the EL2 vector allocation
 
 s390:
 * memcg accouting for s390 specific parts of kvm and gmap
 * selftest for diag318
 * new kvm_stat for when async_pf falls back to sync
 
 x86:
 * Tracepoints for the new pagetable code from 5.10
 * Catch VFIO and KVM irqfd events before userspace
 * Reporting dirty pages to userspace with a ring buffer
 * SEV-ES host support
 * Nested VMX support for wait-for-SIPI activity state
 * New feature flag (AVX512 FP16)
 * New system ioctl to report Hyper-V-compatible paravirtualization features
 
 Generic:
 * Selftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
 lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
 BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
 XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
 bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
 =ISud
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Much x86 work was pushed out to 5.12, but ARM more than made up for it.

  ARM:
   - PSCI relay at EL2 when "protected KVM" is enabled
   - New exception injection code
   - Simplification of AArch32 system register handling
   - Fix PMU accesses when no PMU is enabled
   - Expose CSV3 on non-Meltdown hosts
   - Cache hierarchy discovery fixes
   - PV steal-time cleanups
   - Allow function pointers at EL2
   - Various host EL2 entry cleanups
   - Simplification of the EL2 vector allocation

  s390:
   - memcg accouting for s390 specific parts of kvm and gmap
   - selftest for diag318
   - new kvm_stat for when async_pf falls back to sync

  x86:
   - Tracepoints for the new pagetable code from 5.10
   - Catch VFIO and KVM irqfd events before userspace
   - Reporting dirty pages to userspace with a ring buffer
   - SEV-ES host support
   - Nested VMX support for wait-for-SIPI activity state
   - New feature flag (AVX512 FP16)
   - New system ioctl to report Hyper-V-compatible paravirtualization features

  Generic:
   - Selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: SVM: fix 32-bit compilation
  KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
  KVM: SVM: Provide support to launch and run an SEV-ES guest
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Do not report support for SMM for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Support string IO operations for an SEV-ES guest
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Create trace events for VMGEXIT processing
  ...
2020-12-20 10:44:05 -08:00
Yanan Wang
7d894834a3 KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
If we get a FSC_PERM fault, just using (logging_active && writable) to
determine calling kvm_pgtable_stage2_map(). There will be two more cases
we should consider.

(1) After logging_active is configged back to false from true. When we
get a FSC_PERM fault with write_fault and adjustment of hugepage is needed,
we should merge tables back to a block entry. This case is ignored by still
calling kvm_pgtable_stage2_relax_perms(), which will lead to an endless
loop and guest panic due to soft lockup.

(2) We use (FSC_PERM && logging_active && writable) to determine
collapsing a block entry into a table by calling kvm_pgtable_stage2_map().
But sometimes we may only need to relax permissions when trying to write
to a page other than a block.
In this condition,using kvm_pgtable_stage2_relax_perms() will be fine.

The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup
level at which a D-abort or I-abort occurred. By comparing granule of
the fault lookup level with vma_pagesize, we can strictly distinguish
conditions of calling kvm_pgtable_stage2_relax_perms() or
kvm_pgtable_stage2_map(), and the above two cases will be well considered.

Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201201201034.116760-4-wangyanan55@huawei.com
2020-12-02 09:53:29 +00:00
Marc Zyngier
37da329ed6 Merge branch 'kvm-arm64/el2-pc' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:33:10 +00:00
Marc Zyngier
4f6b838c37 Linux 5.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+V+LMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGGQoH/1FIf6373lekuQf0
 pSq+2PPeILjL6+BppjNGJdwTKTEFEaz7xBpDwZURW2dt0M5jib2sn/0VJ/lh0Ln3
 880hXPjVyziU7/p1vTiPFYwKxav/ZE5cHrEW+nKimucyYPgkDxikFRuvrPQ1M0Sc
 vLZMmwjQlBD1kTsh9WR5lQ9Z8KqUtOazW47AbWE5QTTCQPmIXIdqByqLXlqS46Ok
 gW8tqaCI+FpBLP3fJn0EX5UTYH1Tsj9TmIFE8jqm5lGa/+VDM5KNyczEosKv86Xk
 0hBEUbAAZWdwieySJwBH7Njqu9g1o7bRUIJJsbXm0Fcnu+Ft619r3mJkkkXaaWKN
 mk7M/Uk=
 =1dE8
 -----END PGP SIGNATURE-----

Merge tag 'v5.10-rc1' into kvmarm-master/next

Linux 5.10-rc1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-12 21:20:43 +00:00
Marc Zyngier
cdb5e02ed1 KVM: arm64: Make kvm_skip_instr() and co private to HYP
In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 08:34:24 +00:00
Marc Zyngier
6ddbc281e2 KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32()
There is no need to feed the result of kvm_vcpu_trap_il_is32bit()
to kvm_skip_instr(), as only AArch32 has a variable length ISA, and
this helper can equally be called from kvm_skip_instr32(), reducing
the complexity at all the call sites.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 08:34:24 +00:00
Paolo Bonzini
ff2bb93f53 KVM/arm64 fixes for v5.10, take #2
- Fix compilation error when PMD and PUD are folded
 - Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+lep8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDwNEP/3KJfcJtA//6JlQkRXRkYjXVZ2/2Crr0IHdu
 TqzQZ7Mg8w281HuZjrpvYwzHlXbQ89RJT+G/avG5EmfQcmJU5eQna9T1w1Vq2d2q
 3K35HdQskfFJYJ5MMvxSZ1WsE+EMWOXJGwL3jss/ThS+qzD+Ag7Fdg3Eg6kTv0Ic
 eMFtnBzI7UddNwZcrPM43dZTh9JEls9mySF6kjsIleUm3Xnk+6NKP6nDnJMukBOF
 b+9DaGx9cdXI7bqm3elvaWIeSpJQIBhLvYQqyD0OyF2qAqWrGHEULx2qZbMHRG2x
 lhQDcMyKjtv0hzKxmotVjhDaz/Af+yDJ57IRHLfEq/v5ytIqg76vxDwIjyLUHXyk
 3lPoycCDtcgRKlkoz1jQ35oDCo0LUG2sUgWIn6D3Pim1aZfppnlDjCuuOMGyf5Db
 RS0jIBm5u1YDchnL43HsQRBiVUD1In+QHtR/ZECMMUqjtnCojSZp30BGlqoVSlb8
 aSpzecBaA+C1lRFqLTHCldONloE/85vpsYEIfxB1SqVwrpDkrXaEwZPU6im5a8om
 Q9aJ+TqIUGHLLWsI4SGNrS/kjSpt/GAP4Kkfg+wBqPUgYL+lDTSekFWY6DbekwDP
 +CdopqFCSa1Jfby/6rTYzGK1152NH03O63Ky1Z3uGKpCNK9lAptdUIYMkYCj69C2
 m8zS3zyz
 =LmMm
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for v5.10, take #2

- Fix compilation error when PMD and PUD are folded
- Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1
2020-11-08 04:15:53 -05:00
Gavin Shan
faf000397e KVM: arm64: Fix build error in user_mem_abort()
The PUD and PMD are folded into PGD when the following options are
enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail
to build with the indicated errors:

   CONFIG_ARM64_VA_BITS_42=y
   CONFIG_ARM64_PAGE_SHIFT=16
   CONFIG_PGTABLE_LEVELS=3

   arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’:
   arch/arm64/kvm/mmu.c:798:2: error: duplicate case value
     case PMD_SHIFT:
     ^~~~
   arch/arm64/kvm/mmu.c:791:2: note: previously used here
     case PUD_SHIFT:
     ^~~~

This fixes the issue by skipping the check on PUD huge page when PUD
and PMD are folded into PGD.

Fixes: 2f40c46021bbb ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com
2020-11-06 16:00:29 +00:00
Paolo Bonzini
699116c45e KVM/arm64 fixes for 5.10, take #1
- Force PTE mapping on device pages provided via VFIO
 - Fix detection of cacheable mapping at S2
 - Fallback to PMD/PTE mappings for composite huge pages
 - Fix accounting of Stage-2 PGD allocation
 - Fix AArch32 handling of some of the debug registers
 - Simplify host HYP entry
 - Fix stray pointer conversion on nVHE TLB invalidation
 - Fix initialization of the nVHE code
 - Simplify handling of capabilities exposed to HYP
 - Nuke VCPUs caught using a forbidden AArch32 EL0
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+cO3oPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDJdoP/jiKYR8iVkq/RmIsQl383KwQiJGTMi0iL2Zw
 /tHnf8bKowAPyG8bqyXMJqlWOb7tcp6U3m+WhENAZHWH02r2M921q0DGVW5p48ou
 Ek4zJnFF1iL5ryOBgROKK1nymUZOi3W1a1SsD6ZPImQsKsjNGbqKgWsGs8i9ft0P
 vkNZwlqebzJp+OR3agJemc8dkXcGlcRHk7fffdMcU8jsF5RJ9zC0XU0+scKryxhV
 o8PzKSlwCeisyL+Vz+s7POzoD3Rt+P+qjblz5NWqy/NHuLh+V9hzUSDOjWbZb70f
 Er29vGv7Yjb4nKK2KUzNqirSfXsRylfsjGr+YibP6uKEUMuUm/V41DqzT7nMalIm
 cOBGtPk6W9wOL8JNDmlyVGCfATI+5RrErQ8nFClrPu3qw4Hv4pb1Ad5OgAhNE0u1
 PfUyBBtQKNAjTdVCRfSuFL4d2yegy1rrpCmYWrvdQjLlXemwgYgKnSQN98cZHgjA
 foCAP5gJpAWGualyhKJx2CkY/5deeWKS39ISiNgHo5eRvKsGEnMN7j9UX77VbhRr
 PkwCmeUJ3kjzaAfmtcBN/iLjwQbWypidjX2Vbfl5WoVdLuiYXFZIvsdaqRHGl56F
 5zhYxM8DKODNEJKMl7a89oEFGKy8x1PQ0kqer9a6GBWkNDrQMOSL4+FkxCyM2m9g
 RoHtmdy0
 =gVaX
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.10, take #1

- Force PTE mapping on device pages provided via VFIO
- Fix detection of cacheable mapping at S2
- Fallback to PMD/PTE mappings for composite huge pages
- Fix accounting of Stage-2 PGD allocation
- Fix AArch32 handling of some of the debug registers
- Simplify host HYP entry
- Fix stray pointer conversion on nVHE TLB invalidation
- Fix initialization of the nVHE code
- Simplify handling of capabilities exposed to HYP
- Nuke VCPUs caught using a forbidden AArch32 EL0
2020-10-30 13:25:09 -04:00
Santosh Shukla
91a2c34b7d KVM: arm64: Force PTE mapping on fault resulting in a device mapping
VFIO allows a device driver to resolve a fault by mapping a MMIO
range. This can be subsequently result in user_mem_abort() to
try and compute a huge mapping based on the MMIO pfn, which is
a sure recipe for things to go wrong.

Instead, force a PTE mapping when the pfn faulted in has a device
mapping.

Fixes: 6d674e28f642 ("KVM: arm/arm64: Properly handle faulting of device mappings")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com
2020-10-29 20:41:04 +00:00
Gavin Shan
2f40c46021 KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
Although huge pages can be created out of multiple contiguous PMDs
or PTEs, the corresponding sizes are not supported at Stage-2 yet.

Instead of failing the mapping, fall back to the nearer supported
mapping size (CONT_PMD to PMD and CONT_PTE to PTE respectively).

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201025230626.18501-1-gshan@redhat.com
2020-10-29 20:39:46 +00:00
Paolo Bonzini
1b21c8db0e KVM/arm64 updates for Linux 5.10
- New page table code for both hypervisor and guest stage-2
 - Introduction of a new EL2-private host context
 - Allow EL2 to have its own private per-CPU variables
 - Support of PMU event filtering
 - Complete rework of the Spectre mitigation
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+B2vwPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDI20P/3zj+DT2INf6VrQr9nKg4uxwRk3AQYevRUhs
 id6s6OA9CCXNrN4OuboH3XXcPbI9bYJrp9xuxALAIPzVtLpQs4CLU0tgNclM3zw2
 Fe6u8wEIEFUxuNKlezd1J0gUqskTOloCsscMZ4s72ayDp56VR9fs6Ll/48rvtKXO
 x0FDkouIOAK8Cp/LVhFc0R4ZqaLvFiRr8OlJIqybM5JATl7CEq60jVlZqtoJcjOp
 M/Q/uPtoPZ+yaNizvzhNX9SSX5q7bW2hYqfjveCvmJXLoRZnUzmSiCGHk0DWSw86
 YO0X9cmK/E4feu7EwDppCbVi+FD/+Wcvhs87wAAa6jG2QgHab+OjGOg88gyNAQdL
 APMGALROiTPyC5s9SmrS8eMYZHJku9NRvBjXz5kdrXYhGhMSEmv9RfoBfwJKC4gz
 iqaQ9ejDm+uo6n9p9O8Fng34SqpIzCD1NwRW7AowM2WZH7wkpZ7d0RnbtToNzXsH
 TRKWrSZDydsy/rfL/e6k9CHdoIYCZcrRo/tfc7WGPoO6bwpUZP8O2K/dNz39fUTY
 eQls1fpEFJa8xRJFrxLHl3LuLcL4MuiTfN7TzbJpiwojXCFfUYcBJ3MGniONXIdQ
 2rtl6cx/FhUFekaC8S/k3UDK4i7AqjWsBm9nxRNBwSdwD/ecnqXMljq/DmOfjEmt
 3kOh0WS3
 =mjUF
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.10

- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
2020-10-20 08:14:25 -04:00
Will Deacon
ffd1b63a58 KVM: arm64: Ensure user_mem_abort() return value is initialised
If a change in the MMU notifier sequence number forces user_mem_abort()
to return early when attempting to handle a stage-2 fault, we return
uninitialised stack to kvm_handle_guest_abort(), which could potentially
result in the injection of an external abort into the guest or a spurious
return to userspace. Neither or these are what we want to do.

Initialise 'ret' to 0 in user_mem_abort() so that bailing due to a
change in the MMU notrifier sequence number is treated as though the
fault was handled.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20200930102442.16142-1-will@kernel.org
2020-10-02 09:25:25 +01:00
Paolo Bonzini
b73815a18d KVM/arm64 fixes for 5.9, take #2
- Fix handling of S1 Page Table Walk permission fault at S2
   on instruction fetch
 - Cleanup kvm_vcpu_dabt_iswrite()
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl9k6LYPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDe6UP+gMRK9yEgY7eBg23nfOVDiF7t8dvj+J5eSws
 8YP3nRew9pL710SLClGJCrVyVaCEMPN7cL7O926IALqGPMsPgKFPX7RcshjVUOUo
 otjQPfT3hMOKfmAGCHyQHeudQ8KMEZX9ikdvGd19zT3F2t2CojUn0uFZPgjIXKOe
 x4Un5F/QuOVFoI85BctqpNRJsc9yzjpIu/J2YjO+QeKyMoK1gp+GLIACDczfbaLq
 IWSRApDPP3vyxImxtWaahI/IlAk4XcDV4EFmUMN2GDuK64bFTcky1YLr88njp0ZV
 VVrxw4Z/lGGKl1MqNV8/OVBS+z1+L8G8+uNFELFL1veDc+eDn+P/Sr/SoDxT93uO
 Dz9/gWJd2GEtneJ/PspjxLdzpAkUluV7vRQBikjVtBld7OivnDYeylJsA88olXzI
 PUY+Y9p1K3T/u5oeiKEnDvufzisDiYZSKpE81KtJP+++biOULVIm7rkCmg5IvUAF
 GhxnQ/n2RW9Q+FCXy7FTss8gxl7FBKdWSCnGQ1ZYa+hwnIa9WU1GLG94Zx/qc/GO
 zIYstUKsYtn2ViHn9XTbbewUz1XeCaedmb/R1Uu14j6WrzgirhJOfVHE2US9Xlji
 w/GruENw4Xqlb842q5NbZnv2XtTFQWLAduthAotclnIecYRvqIWRR8JVf64XVvjR
 K0JmH08N
 =D7jg
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 5.9, take #2

- Fix handling of S1 Page Table Walk permission fault at S2
  on instruction fetch
- Cleanup kvm_vcpu_dabt_iswrite()
2020-09-20 17:31:07 -04:00
Marc Zyngier
c4ad98e4b7 KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch
KVM currently assumes that an instruction abort can never be a write.
This is in general true, except when the abort is triggered by
a S1PTW on instruction fetch that tries to update the S1 page tables
(to set AF, for example).

This can happen if the page tables have been paged out and brought
back in without seeing a direct write to them (they are thus marked
read only), and the fault handling code will make the PT executable(!)
instead of writable. The guest gets stuck forever.

In these conditions, the permission fault must be considered as
a write so that the Stage-1 update can take place. This is essentially
the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM:
Take S1 walks into account when determining S2 write faults").

Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce
kvm_vcpu_trap_is_exec_fault() that only return true when no faulting
on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to
kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't
specific to data abort.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org
2020-09-18 18:01:48 +01:00
Xiaofei Tan
c9c0279cc0 KVM: arm64: Fix doc warnings in mmu code
Fix following warnings caused by mismatch bewteen function parameters
and comments.
arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'mmu' not described in '__unmap_stage2_range'
arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'may_block' not described in '__unmap_stage2_range'
arch/arm64/kvm/mmu.c:128: warning: Excess function parameter 'kvm' description in '__unmap_stage2_range'
arch/arm64/kvm/mmu.c:499: warning: Function parameter or member 'writable' not described in 'kvm_phys_addr_ioremap'
arch/arm64/kvm/mmu.c:538: warning: Function parameter or member 'mmu' not described in 'stage2_wp_range'
arch/arm64/kvm/mmu.c:538: warning: Excess function parameter 'kvm' description in 'stage2_wp_range'

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1600307269-50957-1-git-send-email-tanxiaofei@huawei.com
2020-09-18 16:18:51 +01:00
Alexandru Elisei
ada329e6b5 KVM: arm64: Do not flush memslot if FWB is supported
As a result of a KVM_SET_USER_MEMORY_REGION ioctl, KVM flushes the
dcache for the memslot being changed to ensure a consistent view of memory
between the host and the guest: the host runs with caches enabled, and
it is possible for the data written by the hypervisor to still be in the
caches, but the guest is running with stage 1 disabled, meaning data
accesses are to Device-nGnRnE memory, bypassing the caches entirely.

Flushing the dcache is not necessary when KVM enables FWB, because it
forces the guest to uses cacheable memory accesses.

The current behaviour does not change, as the dcache flush helpers execute
the cache operation only if FWB is not enabled, but walking the stage 2
table is avoided.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915170442.131635-1-alexandru.elisei@arm.com
2020-09-18 16:18:24 +01:00
Alexandru Elisei
523b3999e5 KVM: arm64: Try PMD block mappings if PUD mappings are not supported
When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
use the same block size to map the faulting IPA in stage 2. If stage 2
cannot the same block mapping because the block size doesn't fit in the
memslot or the memslot is not properly aligned, user_mem_abort() will fall
back to a page mapping, regardless of the block size. We can do better for
PUD backed hugetlbfs by checking if a PMD block mapping is supported before
deciding to use a page.

vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning
its value.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200910133351.118191-1-alexandru.elisei@arm.com
2020-09-18 16:10:15 +01:00
Paolo Bonzini
1b67fd086d KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
 - Fix for hugetlbfs mappings when PUD and PMD are the same level
 - Fix for hugetlbfs mappings when PTE mappings are enforced
   (dirty logging, for example)
 - Fix tracing output of 64bit values
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl9SGP4PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDWSoP/iZdsgkcMXM1qGRpMtn0FDVQiL2zd8QOEki+
 /ldkeLvcUC9aqVOWdCM1fTweIvyPM0KSRxfbQRVGRGXym9bUMzx1laSl0CLXgi9q
 fa/lbmZOLG5PovQLe8eNon6aXybWWwuxfh/dhpBWLg5VGb0fXFutH3Cs2MGbX/Ec
 6qMvbwO09SGSTrXSQepbhFkAmViBAgUH2kiRhKReDfvRc4OwsbxdlmA5r9Ek6R5L
 x7tH8mYwOJVP1OEZWYRX+9GG1n8hCIKKSuRrhMbZtErTSNdf+YhldC6DuJF0zWVE
 nsoKzIEl15kIl0akkC6oA3MLNX1sfRh9C2J85Rt+odN4fY4MidrfcqRgWE2X3cuu
 CKDsr0Lb6aTtfZASHm7QQRsM4hujWoArBq6ZvUNjpfNXOPe4ovxX9TkPHp6OezuK
 v3PRzXQxUtmreK+02ZzalL6IBwAQrmLKxXM2P2Nuh4gDMgFC/BrHMxc1QVSnmb/m
 flMuKtvm+fkwKySQvX22FZrzhPfCMAuxCh28WdDSW2pnmZ8H0M3922y45xw3QTpg
 SltMtIcpO6ipzMsrVvO/hI/GvByFNN6jcLVGUV1Wx8mNdcf2kPeebA4hINKt5UDh
 gpwkz4zb2Bgp/YdgiG8NzBjpk2FMO0IPiAnouPrXenizbesAlKR2V3uFqa70PW/A
 BBkHnakS
 =VBrL
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for Linux 5.9, take #1

- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
  (dirty logging, for example)
- Fix tracing output of 64bit values
2020-09-11 13:12:11 -04:00
Will Deacon
74cfa7ea66 KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
'struct kvm_s2_mmu', so remove the unused 'pgd' field.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-21-will@kernel.org
2020-09-11 15:51:15 +01:00
Will Deacon
3f26ab58e3 KVM: arm64: Remove unused page-table code
Now that KVM is using the generic page-table code to manage the guest
stage-2 page-tables, we can remove a bunch of unused macros, #defines
and static inline functions from the old implementation.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-20-will@kernel.org
2020-09-11 15:51:15 +01:00
Will Deacon
063deeb1f2 KVM: arm64: Check the pgt instead of the pgd when modifying page-table
In preparation for removing the 'pgd' field of 'struct kvm_s2_mmu',
update the few remaining users to check the 'pgt' field instead.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-19-will@kernel.org
2020-09-11 15:51:15 +01:00
Will Deacon
6f745f1bb5 KVM: arm64: Convert user_mem_abort() to generic page-table API
Convert user_mem_abort() to call kvm_pgtable_stage2_relax_perms() when
handling a stage-2 permission fault and kvm_pgtable_stage2_map() when
handling a stage-2 translation fault, rather than walking the page-table
manually.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-18-will@kernel.org
2020-09-11 15:51:15 +01:00
Quentin Perret
8d5207bef6 KVM: arm64: Convert memslot cache-flushing code to generic page-table API
Convert stage2_flush_memslot() to call the kvm_pgtable_stage2_flush()
function of the generic page-table code instead of walking the page-table
directly.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-16-will@kernel.org
2020-09-11 15:51:14 +01:00
Quentin Perret
cc38d61cac KVM: arm64: Convert write-protect operation to generic page-table API
Convert stage2_wp_range() to call the kvm_pgtable_stage2_wrprotect()
function of the generic page-table code instead of walking the page-table
directly.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-14-will@kernel.org
2020-09-11 15:51:14 +01:00
Will Deacon
ee8efad799 KVM: arm64: Convert page-aging and access faults to generic page-table API
Convert the page-aging functions and access fault handler to use the
generic page-table code instead of walking the page-table directly.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-12-will@kernel.org
2020-09-11 15:51:14 +01:00
Will Deacon
52bae936f0 KVM: arm64: Convert unmap_stage2_range() to generic page-table API
Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
of walking the page-table directly.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-10-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
e9edb17ae0 KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-9-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
02bbd374ce KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-8-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
71233d05f4 KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables
Introduce alloc() and free() functions to the generic page-table code
for guest stage-2 page-tables and plumb these into the existing KVM
page-table allocator. Subsequent patches will convert other operations
within the KVM allocator over to the generic code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-6-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
0f9d09b8e2 KVM: arm64: Use generic allocator for hyp stage-1 page-tables
Now that we have a shiny new page-table allocator, replace the hyp
page-table code with calls into the new API. This also allows us to
remove the extended idmap code, as we can now simply ensure that the
VA size is large enough to map everything we need.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-5-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
9af3e08baa KVM: arm64: Remove kvm_mmu_free_memory_caches()
kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(),
so inline the implementation and get rid of the extra function.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-2-will@kernel.org
2020-09-11 15:51:12 +01:00
Alexandru Elisei
7b75cd5128 KVM: arm64: Update page shift if stage 2 block mapping not supported
Commit 196f878a7ac2e (" KVM: arm/arm64: Signal SIGBUS when stage2 discovers
hwpoison memory") modifies user_mem_abort() to send a SIGBUS signal when
the fault IPA maps to a hwpoisoned page. Commit 1559b7583ff6 ("KVM:
arm/arm64: Re-check VMA on detecting a poisoned page") changed
kvm_send_hwpoison_signal() to use the page shift instead of the VMA because
at that point the code had already released the mmap lock, which means
userspace could have modified the VMA.

If userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
map the guest fault IPA using block mappings in stage 2. That is not always
possible, if, for example, userspace uses dirty page logging for the VM.
Update the page shift appropriately in those cases when we downgrade the
stage 2 entry from a block mapping to a page.

Fixes: 1559b7583ff6 ("KVM: arm/arm64: Re-check VMA on detecting a poisoned page")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20200901133357.52640-2-alexandru.elisei@arm.com
2020-09-04 10:53:48 +01:00
Marc Zyngier
3fb884ffe9 KVM: arm64: Do not try to map PUDs when they are folded into PMD
For the obscure cases where PMD and PUD are the same size
(64kB pages with 42bit VA, for example, which results in only
two levels of page tables), we can't map anything as a PUD,
because there is... erm... no PUD to speak of. Everything is
either a PMD or a PTE.

So let's only try and map a PUD when its size is different from
that of a PMD.

Cc: stable@vger.kernel.org
Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at stage 2")
Reported-by: Gavin Shan <gshan@redhat.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-04 10:52:49 +01:00
Will Deacon
b5331379bc KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc: <stable@vger.kernel.org>
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:06:43 -04:00
Will Deacon
fdfe7cbd58 KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:03:47 -04:00
Paolo Bonzini
0378daef0c KVM/arm64 updates for Linux 5.9:
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
   separately, allowing for the VHE code to now be built with instrumentation
 
 - Level-based TLB invalidation support
 
 - Restructure of the vcpu register storage to accomodate the NV code
 
 - Pointer Authentication available for guests on nVHE hosts
 
 - Simplification of the system register table parsing
 
 - MMU cleanups and fixes
 
 - A number of post-32bit cleanups and other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl8q5DEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDQFAP/jtscnC5OxEOoGNW1gvg/1QI/BuU4zLvqQL1
 OEW72fUQlil7tmF/CbLLKnsBpxKmzO02C3wDdg3oaRi884bRtTXdok0nsFuCvrZD
 u/wrlMnP0zTjjk1uwIFfZJTx+nnUiT0jC6ffvGxB/jnTJk/8atvOUFL7ODFEfixz
 mS5g1jwwJkRmWKESFg7KGSghKuwXTvo4HVWCfME+t1rQwAa03stXFV8H5tkU6+cG
 BRIssxo7BkAV2AozwL7hgl/M6wd6QvbOrYJqgb67+sQ8qts0YNne96NN3InMedb1
 RENyDssXlA+VI0HoYyEbYnPtFy1Hoj1lOGDZLEZAEH1qcmWrV+hApnoSXSmuofvn
 QlfOWCyd92CZySu21MALRUVXbrKkA3zT2b9R93A5z7iEBPY+Wk0ryJCO6IxdZzF8
 48LNjtzb/Kd0SMU/issJlw+u6fJvLbpnSzXNsYYhiiTMUE9cbu2SEkq0SkonH0a4
 d3V8UifZyeffXsOfOAG0DJZOu/fWZp1/I3tfzujtG9rCb+jTQueJ4E1cFYrwSO6b
 sFNyiI1AzlwcCippG08zSUX61nGfKXBuMXuhIlMRk7GeiF95DmSXuxEgYndZX9I+
 E6zJr1iQk/1lrip41svDIIOBHuMbIeD/w1bsOKi7Zoa270MxB4r2Z3IqRMgosoE5
 l4YO9pl1
 =Ukr4
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next-5.6

KVM/arm64 updates for Linux 5.9:

- Split the VHE and nVHE hypervisor code bases, build the EL2 code
  separately, allowing for the VHE code to now be built with instrumentation

- Level-based TLB invalidation support

- Restructure of the vcpu register storage to accomodate the NV code

- Pointer Authentication available for guests on nVHE hosts

- Simplification of the system register table parsing

- MMU cleanups and fixes

- A number of post-32bit cleanups and other fixes
2020-08-09 12:58:23 -04:00