211 Commits

Author SHA1 Message Date
Linus Torvalds
8fa590bf34 ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 * Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
   which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d:
   "Fix a number of issues with MTE, such as races on the tags being
   initialised vs the PG_mte_tagged flag as well as the lack of support
   for VM_SHARED when KVM is involved.  Patches from Catalin Marinas and
   Peter Collingbourne").
 
 * Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 * Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 * Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 * Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 s390:
 
 * Second batch of the lazy destroy patches
 
 * First batch of KVM changes for kernel virtual != physical address support
 
 * Removal of a unused function
 
 x86:
 
 * Allow compiling out SMM support
 
 * Cleanup and documentation of SMM state save area format
 
 * Preserve interrupt shadow in SMM state save area
 
 * Respond to generic signals during slow page faults
 
 * Fixes and optimizations for the non-executable huge page errata fix.
 
 * Reprogram all performance counters on PMU filter change
 
 * Cleanups to Hyper-V emulation and tests
 
 * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
   running on top of a L1 Hyper-V hypervisor)
 
 * Advertise several new Intel features
 
 * x86 Xen-for-KVM:
 
 ** Allow the Xen runstate information to cross a page boundary
 
 ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
 
 ** Add support for 32-bit guests in SCHEDOP_poll
 
 * Notable x86 fixes and cleanups:
 
 ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
 ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
 ** Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
 ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
 ** Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 
 ** Advertise (on AMD) that the SMM_CTL MSR is not supported
 
 ** Remove unnecessary exports
 
 Generic:
 
 * Support for responding to signals during page faults; introduces
   new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
 
 Selftests:
 
 * Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.
 
 * Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.
 
 * Introduce actual atomics for clear/set_bit() in selftests
 
 * Add support for pinning vCPUs in dirty_log_perf_test.
 
 * Rename the so called "perf_util" framework to "memstress".
 
 * Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the memstress tests.
 
 * Add a common ucall implementation; code dedup and pre-work for running
   SEV (and beyond) guests in selftests.
 
 * Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).
 
 * A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
   breakpoints, stage-2 faults and access tracking.
 
 * x86-specific selftest changes:
 
 ** Clean up x86's page table management.
 
 ** Clean up and enhance the "smaller maxphyaddr" test, and add a related
    test to cover generic emulation failure.
 
 ** Clean up the nEPT support checks.
 
 ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
 
 ** Fix an ordering issue in the AMX test introduced by recent conversions
    to use kvm_cpu_has(), and harden the code to guard against similar bugs
    in the future.  Anything that tiggers caching of KVM's supported CPUID,
    kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
    the caching occurs before the test opts in via prctl().
 
 Documentation:
 
 * Remove deleted ioctls from documentation
 
 * Clean up the docs for the x86 MSR filter.
 
 * Various fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
 KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
 mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
 yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
 Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
 MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
 =QAYX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a97d: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
2022-12-15 11:12:21 -08:00
Will Deacon
9d84ad425d Merge branch 'for-next/trivial' into for-next/core
* for-next/trivial:
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm64/asm: Remove unused assembler DAIF save/restore macros
  arm64/kpti: Move DAIF masking to C code
  Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"
  arm64/mm: Drop unused restore_ttbr1
  arm64: alternatives: make apply_alternatives_vdso() static
  arm64/mm: Drop idmap_pg_end[] declaration
  arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
  arm64: make is_ttbrX_addr() noinstr-safe
  arm64/signal: Document our convention for choosing magic numbers
  arm64: atomics: lse: remove stale dependency on JUMP_LABEL
  arm64: paravirt: remove conduit check in has_pv_steal_clock
  arm64: entry: Fix typo
  arm64/booting: Add missing colon to FA64 entry
  arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS
  arm64/asm: Remove unused enable_da macro
2022-12-06 11:33:29 +00:00
Mark Brown
9e75e74b07 arm64/signal: Document our convention for choosing magic numbers
Szabolcs Nagy has pointed out that most of our signal frame magic numbers
are chosen to be meaningful ASCII when dumped to aid manual parsing. This
seems sensible since it might help someone parsing things out, let's
document it so people implementing new signal contexts are aware of it and
are more likely to follow it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221031192450.826159-1-broonie@kernel.org
[will: Fixed typo and tweaked wording]
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-14 15:08:15 +00:00
Gavin Shan
9cb1096f85 KVM: arm64: Enable ring-based dirty memory tracking
Enable ring-based dirty memory tracking on ARM64:

  - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL.

  - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP.

  - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
    offset.

  - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to
    keep the site of saving vgic/its tables out of the no-running-vcpu
    radar.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
2022-11-10 13:11:58 +00:00
Mark Brown
d12aada8df arm64/hwcap: Add support for SVE 2.1
FEAT_SVE2p1 introduces a number of new SVE instructions. Since there is no
new architectural state added kernel support is simply a new hwcap which
lets userspace know that the feature is supported.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-6-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-09 17:54:53 +00:00
Mark Brown
939e4649d4 arm64/hwcap: Add support for FEAT_RPRFM
FEAT_RPRFM adds a new range prefetch hint within the existing PRFM space
for range prefetch hinting. Add a new hwcap to allow userspace to discover
support for the new instruction.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-09 17:54:53 +00:00
Mark Brown
95aa6860d6 arm64/hwcap: Add support for FEAT_CSSC
FEAT_CSSC adds a number of new instructions usable to optimise common short
sequences of instructions, add a hwcap indicating that the feature is
available and can be used by userspace.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-09 17:54:53 +00:00
Catalin Marinas
b23ec74cbd Merge branches 'for-next/doc', 'for-next/sve', 'for-next/sysreg', 'for-next/gettimeofday', 'for-next/stacktrace', 'for-next/atomics', 'for-next/el1-exceptions', 'for-next/a510-erratum-2658417', 'for-next/defconfig', 'for-next/tpidr2_el0' and 'for-next/ftrace', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  arm64/sve: Add Perf extensions documentation
  perf: arm64: Add SVE vector granule register to user regs
  MAINTAINERS: add maintainers for Alibaba' T-Head PMU driver
  drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC
  docs: perf: Add description for Alibaba's T-Head PMU driver

* for-next/doc:
  : Documentation/arm64 updates
  arm64/sve: Document our actual ABI for clearing registers on syscall

* for-next/sve:
  : SVE updates
  arm64/sysreg: Add hwcap for SVE EBF16

* for-next/sysreg: (35 commits)
  : arm64 system registers generation (more conversions)
  arm64/sysreg: Fix a few missed conversions
  arm64/sysreg: Convert ID_AA64AFRn_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64FDR0_EL1 to automatic generation
  arm64/sysreg: Use feature numbering for PMU and SPE revisions
  arm64/sysreg: Add _EL1 into ID_AA64DFR0_EL1 definition names
  arm64/sysreg: Align field names in ID_AA64DFR0_EL1 with architecture
  arm64/sysreg: Add defintion for ALLINT
  arm64/sysreg: Convert SCXTNUM_EL1 to automatic generation
  arm64/sysreg: Convert TIPDR_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64PFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR0_EL1 to automatic generation
  arm64/sysreg: Convert HCRX_EL2 to automatic generation
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 SME enumeration
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 BTI enumeration
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 fractional version fields
  arm64/sysreg: Standardise naming for MTE feature enumeration
  ...

* for-next/gettimeofday:
  : Use self-synchronising counter access in gettimeofday() (if FEAT_ECV)
  arm64: vdso: use SYS_CNTVCTSS_EL0 for gettimeofday
  arm64: alternative: patch alternatives in the vDSO
  arm64: module: move find_section to header

* for-next/stacktrace:
  : arm64 stacktrace cleanups and improvements
  arm64: stacktrace: track hyp stacks in unwinder's address space
  arm64: stacktrace: track all stack boundaries explicitly
  arm64: stacktrace: remove stack type from fp translator
  arm64: stacktrace: rework stack boundary discovery
  arm64: stacktrace: add stackinfo_on_stack() helper
  arm64: stacktrace: move SDEI stack helpers to stacktrace code
  arm64: stacktrace: rename unwind_next_common() -> unwind_next_frame_record()
  arm64: stacktrace: simplify unwind_next_common()
  arm64: stacktrace: fix kerneldoc comments

* for-next/atomics:
  : arm64 atomics improvements
  arm64: atomic: always inline the assembly
  arm64: atomics: remove LL/SC trampolines

* for-next/el1-exceptions:
  : Improve the reporting of EL1 exceptions
  arm64: rework BTI exception handling
  arm64: rework FPAC exception handling
  arm64: consistently pass ESR_ELx to die()
  arm64: die(): pass 'err' as long
  arm64: report EL1 UNDEFs better

* for-next/a510-erratum-2658417:
  : Cortex-A510: 2658417: remove BF16 support due to incorrect result
  arm64: errata: remove BF16 HWCAP due to incorrect result on Cortex-A510
  arm64: cpufeature: Expose get_arm64_ftr_reg() outside cpufeature.c
  arm64: cpufeature: Force HWCAP to be based on the sysreg visible to user-space

* for-next/defconfig:
  : arm64 defconfig updates
  arm64: defconfig: Add Coresight as module
  arm64: Enable docker support in defconfig
  arm64: defconfig: Enable memory hotplug and hotremove config
  arm64: configs: Enable all PMUs provided by Arm

* for-next/tpidr2_el0:
  : arm64 ptrace() support for TPIDR2_EL0
  kselftest/arm64: Add coverage of TPIDR2_EL0 ptrace interface
  arm64/ptrace: Support access to TPIDR2_EL0
  arm64/ptrace: Document extension of NT_ARM_TLS to cover TPIDR2_EL0
  kselftest/arm64: Add test coverage for NT_ARM_TLS

* for-next/ftrace:
  : arm64 ftraces updates/fixes
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
2022-09-30 09:17:57 +01:00
Will Deacon
aa3e49b606 arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
An arm64 'allmodconfig' build fails with GCC due to use of a C++-style
comment for the new SVE vector granule 'enum perf_event_arm_regs' entry:

  | /usr/include/asm/perf_regs.h:42:26: error: C++ style comments are not allowed in ISO C90

Use good ol' /* */ comment syntax to keep things rosey.

Link: https://lore.kernel.org/r/632cceb2.170a0220.599ec.0a3a@mx.google.com
Fixes: cbb0c02caf4b ("perf: arm64: Add SVE vector granule register to user regs")
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 22:15:33 +01:00
James Clark
cbb0c02caf perf: arm64: Add SVE vector granule register to user regs
Dwarf based unwinding in a function that pushes SVE registers onto
the stack requires the unwinder to know the length of the SVE register
to calculate the stack offsets correctly. This was added to the Arm
specific Dwarf spec as the VG pseudo register[1].

Add the vector length at position 46 if it's requested by userspace and
SVE is supported. If it's not supported then fail to open the event.

The vector length must be on each sample because it can be changed
at runtime via a prctl or ptrace call. Also by adding it as a register
rather than a separate attribute, minimal changes will be required in an
unwinder that already indexes into the register list.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20220901132658.1024635-2-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 15:06:02 +01:00
Mark Brown
81ff692ad9 arm64/sysreg: Add hwcap for SVE EBF16
SVE has a separate identification register indicating support for BFloat16
operations. Add a hwcap identifying support for EBF16 in this register,
mirroring what we did for the non-SVE case.

While there is currently an architectural requirement for BF16 support to
be the same in SVE and non-SVE contexts there are separate identification
registers this separate hwcap helps avoid issues if that requirement were
to be relaxed in the future, we have already chosen to have a separate
capability for base BF16 support.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154815.832347-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-06 18:53:52 +01:00
Paolo Bonzini
959d6c4ae2 KVM/arm64 fixes for 6.0, take #1
- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
 
 - Tidy-up handling of AArch32 on asymmetric systems
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmL+RsgPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDHrUP/3IYZ0LnYUZBImSU/YTPL5yYzdSVAuMNcdRQ
 EgvLQKwP+JSrmd7B7wZ4MhY1LheKjpNmmuSqTRsZOHb/yBmnh3+ao5n2gqusYQeJ
 PCuLYjeF7ZU5fGIrPAW6BW0BFlmYMVbTrC6SEMhZsisBhna44jrrWgkBz9mOsXE/
 YcDWv8kP15lisuQzMvnYxmZobbVgSJ3KgQY4/Dp6vyKMR8ULujCxziFV5R4RD0xP
 Ay8wnxtMUymx9P6sZsd6Vwi5h1MUXOOoI4He7+8ejIfoMOManIMOIq4PDQhINwQv
 tGysDmQavftSbUkXJ1VB+8cJ/9KufzwKxFoc5WqGk1y14QulyBNyb/XR3UtORe1n
 bitINTTkqibHY6fdQJA7z1sD0jaEAh/xNwO1Gq0BS40o4XVQDv2BjdQir9TEdlZO
 tsZKVaFpN3UZe681ru12No8YzQDhpuLH65gDHDjLaftH99WKsrSwMZLoEjqZTlM/
 vH/9acd4UB+9zMGTpN2tJ//2cq6g3JoUC7jJIQB1oGStHX0/7AxKMlabR2xHmt9E
 4CmJND9RLK6+yEelagxOAYMQfnCdj6pW/3bhvAmsZWh0t3fNxCeBBXFr2I5os+E9
 hV0FYx4PG9GtorMSqudCDsP83SDIxCluNZ5iM8t1suSn3dFhk5bDFChS86XGuqQe
 XxHQ6JTF
 =r5iq
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.0, take #1

- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK

- Tidy-up handling of AArch32 on asymmetric systems
2022-08-19 05:43:53 -04:00
Yang Yingliang
ae3b1da954 KVM: arm64: Fix compile error due to sign extension
Use GENMASK() to generate the masks of device type and device id,
fixing compilation errors due to the sign extension when using
older versions of GCC (such as is 7.5):

In function ‘kvm_vm_ioctl_set_device_addr.isra.38’,
    inlined from ‘kvm_arch_vm_ioctl’ at arch/arm64/kvm/arm.c:1454:10:
././include/linux/compiler_types.h:354:38: error: call to ‘__compiletime_assert_599’ \
declared with attribute error: FIELD_GET: mask is not constant
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)

Fixes: 9f968c9266aa ("KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
[maz: tidy up commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220810013435.1525363-1-yangyingliang@huawei.com
2022-08-10 09:49:16 +01:00
Mark Brown
a6a468f50d arm64/hwcap: Support FEAT_EBF16
The v9.2 feature FEAT_EBF16 provides support for an extended BFloat16 mode.
Allow userspace to discover system support for this feature by adding a
hwcap for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220707103632.12745-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-20 11:18:11 +01:00
Mark Brown
d3e4a9d308 arm64/hwcap: Document allocation of upper bits of AT_HWCAP
The top two bits of AT_HWCAP are reserved for use by glibc and the rest of
the top 32 bits are being kept unallocated for potential use by glibc.
Document this in the header.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Link: https://lore.kernel.org/r/20220707103632.12745-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-20 11:18:11 +01:00
Linus Torvalds
bf9095424d S390:
* ultravisor communication device driver
 
 * fix TEID on terminating storage key ops
 
 RISC-V:
 
 * Added Sv57x4 support for G-stage page table
 
 * Added range based local HFENCE functions
 
 * Added remote HFENCE functions based on VCPU requests
 
 * Added ISA extension registers in ONE_REG interface
 
 * Updated KVM RISC-V maintainers entry to cover selftests support
 
 ARM:
 
 * Add support for the ARMv8.6 WFxT extension
 
 * Guard pages for the EL2 stacks
 
 * Trap and emulate AArch32 ID registers to hide unsupported features
 
 * Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 * Support for PSCI-initiated suspend in collaboration with userspace
 
 * GICv3 register-based LPI invalidation support
 
 * Move host PMU event merging into the vcpu data structure
 
 * GICv3 ITS save/restore fixes
 
 * The usual set of small-scale cleanups and fixes
 
 x86:
 
 * New ioctls to get/set TSC frequency for a whole VM
 
 * Allow userspace to opt out of hypercall patching
 
 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
 
 AMD SEV improvements:
 
 * Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
 
 * V_TSC_AUX support
 
 Nested virtualization improvements for AMD:
 
 * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
   nested vGIF)
 
 * Allow AVIC to co-exist with a nested guest running
 
 * Fixes for LBR virtualizations when a nested guest is running,
   and nested LBR virtualization support
 
 * PAUSE filtering for nested hypervisors
 
 Guest support:
 
 * Decoupling of vcpu_is_preempted from PV spinlocks
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKN9M4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNLeAf+KizAlQwxEehHHeNyTkZuKyMawrD6
 zsqAENR6i1TxiXe7fDfPFbO2NR0ZulQopHbD9mwnHJ+nNw0J4UT7g3ii1IAVcXPu
 rQNRGMVWiu54jt+lep8/gDg0JvPGKVVKLhxUaU1kdWT9PhIOC6lwpP3vmeWkUfRi
 PFL/TMT0M8Nfryi0zHB0tXeqg41BiXfqO8wMySfBAHUbpv8D53D2eXQL6YlMM0pL
 2quB1HxHnpueE5vj3WEPQ3PCdy1M2MTfCDBJAbZGG78Ljx45FxSGoQcmiBpPnhJr
 C6UGP4ZDWpml5YULUoA70k5ylCbP+vI61U4vUtzEiOjHugpPV5wFKtx5nw==
 =ozWx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - ultravisor communication device driver

   - fix TEID on terminating storage key ops

  RISC-V:

   - Added Sv57x4 support for G-stage page table

   - Added range based local HFENCE functions

   - Added remote HFENCE functions based on VCPU requests

   - Added ISA extension registers in ONE_REG interface

   - Updated KVM RISC-V maintainers entry to cover selftests support

  ARM:

   - Add support for the ARMv8.6 WFxT extension

   - Guard pages for the EL2 stacks

   - Trap and emulate AArch32 ID registers to hide unsupported features

   - Ability to select and save/restore the set of hypercalls exposed to
     the guest

   - Support for PSCI-initiated suspend in collaboration with userspace

   - GICv3 register-based LPI invalidation support

   - Move host PMU event merging into the vcpu data structure

   - GICv3 ITS save/restore fixes

   - The usual set of small-scale cleanups and fixes

  x86:

   - New ioctls to get/set TSC frequency for a whole VM

   - Allow userspace to opt out of hypercall patching

   - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

  AMD SEV improvements:

   - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

   - V_TSC_AUX support

  Nested virtualization improvements for AMD:

   - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
     nested vGIF)

   - Allow AVIC to co-exist with a nested guest running

   - Fixes for LBR virtualizations when a nested guest is running, and
     nested LBR virtualization support

   - PAUSE filtering for nested hypervisors

  Guest support:

   - Decoupling of vcpu_is_preempted from PV spinlocks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
  KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
  KVM: selftests: x86: Sync the new name of the test case to .gitignore
  Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
  x86, kvm: use correct GFP flags for preemption disabled
  KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
  x86/kvm: Alloc dummy async #PF token outside of raw spinlock
  KVM: x86: avoid calling x86 emulator without a decoded instruction
  KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
  x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
  s390/uv_uapi: depend on CONFIG_S390
  KVM: selftests: x86: Fix test failure on arch lbr capable platforms
  KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
  KVM: s390: selftest: Test suppression indication on key prot exception
  KVM: s390: Don't indicate suppression on dirtying, failing memop
  selftests: drivers/s390x: Add uvdevice tests
  drivers/s390/char: Add Ultravisor io device
  MAINTAINERS: Update KVM RISC-V entry to cover selftests support
  RISC-V: KVM: Introduce ISA extension register
  RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
  RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
  ...
2022-05-26 14:20:14 -07:00
Catalin Marinas
0616ea3f1b Merge branch 'for-next/esr-elx-64-bit' into for-next/core
* for-next/esr-elx-64-bit:
  : Treat ESR_ELx as a 64-bit register.
  KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
  KVM: arm64: Treat ESR_EL2 as a 64-bit register
  arm64: Treat ESR_ELx as a 64-bit register
  arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
  arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-05-20 18:51:54 +01:00
Marc Zyngier
0586e28aaa Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
* kvm-arm64/hcall-selection:
  : .
  : Introduce a new set of virtual sysregs for userspace to
  : select the hypercalls it wants to see exposed to the guest.
  :
  : Patches courtesy of Raghavendra and Oliver.
  : .
  KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
  KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
  Documentation: Fix index.rst after psci.rst renaming
  selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
  selftests: KVM: aarch64: Introduce hypercall ABI test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  tools: Import ARM SMCCC definitions
  Docs: KVM: Add doc for the bitmap firmware registers
  Docs: KVM: Rename psci.rst to hypercalls.rst
  KVM: arm64: Add vendor hypervisor firmware register
  KVM: arm64: Add standard hypervisor firmware register
  KVM: arm64: Setup a framework for hypercall bitmap firmware registers
  KVM: arm64: Factor out firmware register handling from psci.c

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:47:03 +01:00
Marc Zyngier
2cde51f1e1 KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
These constants will change over time, and userspace has no
business knowing about them. Hide them behind __KERNEL__.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-15 11:36:24 +01:00
Marc Zyngier
b2c4caf331 Merge branch kvm-arm64/wfxt into kvmarm-master/next
* kvm-arm64/wfxt:
  : .
  : Add support for the WFET/WFIT instructions that provide the same
  : service as WFE/WFI, only with a timeout.
  : .
  KVM: arm64: Expose the WFXT feature to guests
  KVM: arm64: Offer early resume for non-blocking WFxT instructions
  KVM: arm64: Handle blocking WFIT instruction
  KVM: arm64: Introduce kvm_counter_compute_delta() helper
  KVM: arm64: Simplify kvm_cpu_has_pending_timer()
  arm64: Use WFxT for __delay() when possible
  arm64: Add wfet()/wfit() helpers
  arm64: Add HWCAP advertising FEAT_WFXT
  arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
  arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04 09:42:16 +01:00
Raghavendra Rao Ananta
b22216e1a6 KVM: arm64: Add vendor hypervisor firmware register
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support for KVM-vendor features along with
reading the UID, represented by bit-0, and Precision Time
Protocol (PTP), represented by bit-1.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta
428fd6788d KVM: arm64: Add standard hypervisor firmware register
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta
05714cab7d KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.

As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.

These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.

During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.

Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.

Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.

In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: reduced the scope of some helpers, tidy-up bitmap max values,
 dropped error-only fast path]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
2022-05-03 21:30:19 +01:00
Alexandru Elisei
18f3976fdb KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
When userspace is debugging a VM, the kvm_debug_exit_arch part of the
kvm_run struct contains arm64 specific debug information: the ESR_EL2
value, encoded in the field "hsr", and the address of the instruction
that caused the exception, encoded in the field "far".

Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
kvm_debug_exit_arch.hsr cannot be changed because that would change the
memory layout of the struct on big endian machines:

Current layout:			| Layout with "hsr" extended to 64 bits:
				|
offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)

which breaks existing code.

The padding is inserted by the compiler because the "far" field must be
aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
page 18), and the struct itself must be aligned to 8 bytes (the struct must
be aligned to the maximum alignment of its fields - aapcs64, page 18),
which means that "hsr" must be aligned to 8 bytes as it is the first field
in the struct.

To avoid changing the struct size and layout for the existing fields, add a
new field, "hsr_high", which replaces the existing padding. "hsr_high" will
be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
both on big and little endian machine, becomes:

offset 0: ESR_EL2[31:0]  (hsr)
offset 4: ESR_EL2[61:32] (hsr_high)
offset 8: FAR_EL2[61:0]  (far)

The padding that the compiler inserts for the current struct layout is
unitialized. To prevent an updated userspace running on an old kernel
mistaking the padding for a valid "hsr_high" value, add a new flag,
KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that
"hsr_high" holds a valid ESR_EL2[61:32] value.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Mark Brown
776b4a1cf3 arm64/sme: Add ptrace support for ZA
The ZA array can be read and written with the NT_ARM_ZA.  Similarly to
our interface for the SVE vector registers the regset consists of a
header with information on the current vector length followed by an
optional register data payload, represented as for signals as a series
of horizontal vectors from 0 to VL/8 in the endianness independent
format used for vectors.

On get if ZA is enabled then register data will be provided, otherwise
it will be omitted.  On set if register data is provided then ZA is
enabled and initialized using the provided data, otherwise it is
disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-22-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:17 +01:00
Mark Brown
e12310a0d3 arm64/sme: Implement ptrace support for streaming mode SVE registers
The streaming mode SVE registers are represented using the same data
structures as for SVE but since the vector lengths supported and in use
may not be the same as SVE we represent them with a new type NT_ARM_SSVE.
Unfortunately we only have a single 16 bit reserved field available in
the header so there is no space to fit the current and maximum vector
length for both standard and streaming SVE mode without redefining the
structure in a way the creates a complicatd and fragile ABI. Since FFR
is not present in streaming mode it is read and written as zero.

Setting NT_ARM_SSVE registers will put the task into streaming mode,
similarly setting NT_ARM_SVE registers will exit it. Reads that do not
correspond to the current mode of the task will return the header with
no register data. For compatibility reasons on write setting no flag for
the register type will be interpreted as setting SVE registers, though
users can provide no register data as an alternative mechanism for doing
so.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-21-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:15 +01:00
Mark Brown
39782210eb arm64/sme: Implement ZA signal handling
Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8 in the endinanness independent format used for vectors.

As with SVE we do not allow changes in the vector length during signal
return but we do allow ZA to be enabled or disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-20-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:13 +01:00
Mark Brown
85ed24dad2 arm64/sme: Implement streaming SVE signal handling
When in streaming mode we have the same set of SVE registers as we do in
regular SVE mode with the exception of FFR and the use of the SME vector
length. Provide signal handling for these registers by taking one of the
reserved words in the SVE signal context as a flags field and defining a
flag which is set for streaming mode. When the flag is set the vector
length is set to the streaming mode vector length and we save and
restore streaming mode data. We support entering or leaving streaming
mode based on the value of the flag but do not support changing the
vector length, this is not currently supported SVE signal handling.

We could instead allocate a separate record in the signal frame for the
streaming mode SVE context but this inflates the size of the maximal signal
frame required and adds complication when validating signal frames from
userspace, especially given the current structure of the code.

Any implementation of support for streaming mode vectors in signals will
have some potential for causing issues for applications that attempt to
handle SVE vectors in signals, use streaming mode but do not understand
streaming mode in their signal handling code, it is hard to identify a
case that is clearly better than any other - they all have cases where
they could cause unexpected register corruption or faults.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-19-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:10 +01:00
Mark Brown
5e64b862c4 arm64/sme: Basic enumeration support
This patch introduces basic cpufeature support for discovering the presence
of the Scalable Matrix Extension.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-9-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:49 +01:00
Marc Zyngier
69bb02ebc3 arm64: Add HWCAP advertising FEAT_WFXT
In order to allow userspace to enjoy WFET, add a new HWCAP that
advertises it when available.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419182755.601427-9-maz@kernel.org
2022-04-20 13:24:44 +01:00
Linus Torvalds
1ebdbeb03e ARM:
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 
 RISC-V:
 
 - Prevent KVM_COMPAT from being selected
 
 - Optimize __kvm_riscv_switch_to() implementation
 
 - RISC-V SBI v0.3 support
 
 s390:
 
 - memop selftest
 
 - fix SCK locking
 
 - adapter interruptions virtualization for secure guests
 
 - add Claudio Imbrenda as maintainer
 
 - first step to do proper storage key checking
 
 x86:
 
 - Continue switching kvm_x86_ops to static_call(); introduce
   static_call_cond() and __static_call_ret0 when applicable.
 
 - Cleanup unused arguments in several functions
 
 - Synthesize AMD 0x80000021 leaf
 
 - Fixes and optimization for Hyper-V sparse-bank hypercalls
 
 - Implement Hyper-V's enlightened MSR bitmap for nested SVM
 
 - Remove MMU auditing
 
 - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
   page tracking is enabled
 
 - Cleanup the implementation of the guest PGD cache
 
 - Preparation for the implementation of Intel IPI virtualization
 
 - Fix some segment descriptor checks in the emulator
 
 - Allow AMD AVIC support on systems with physical APIC ID above 255
 
 - Better API to disable virtualization quirks
 
 - Fixes and optimizations for the zapping of page tables:
 
   - Zap roots in two passes, avoiding RCU read-side critical sections
     that last too long for very large guests backed by 4 KiB SPTEs.
 
   - Zap invalid and defunct roots asynchronously via concurrency-managed
     work queue.
 
   - Allowing yielding when zapping TDP MMU roots in response to the root's
     last reference being put.
 
   - Batch more TLB flushes with an RCU trick.  Whoever frees the paging
     structure now holds RCU as a proxy for all vCPUs running in the guest,
     i.e. to prolongs the grace period on their behalf.  It then kicks the
     the vCPUs out of guest mode before doing rcu_read_unlock().
 
 Generic:
 
 - Introduce __vcalloc and use it for very large allocations that
   need memcg accounting
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
 wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
 pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
 dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
 RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
 BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
 =keox
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
2022-03-24 11:58:57 -07:00
Will Deacon
641d804157 Merge branch 'for-next/spectre-bhb' into for-next/core
Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.

* for-next/spectre-bhb: (21 commits)
  arm64: Do not include __READ_ONCE() block in assembly files
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  ...
2022-03-14 19:08:31 +00:00
Mark Brown
d082a0255f arm64/mte: Add hwcap for asymmetric mode
Allow userspace to detect support for asymmetric mode by providing a hwcap
for it, using the official feature name FEAT_MTE3.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Tested-by: Branislav Rankov <branislav.rankov@arm.com>
Link: https://lore.kernel.org/r/20220216173224.2342152-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-25 14:41:05 +00:00
Marc Zyngier
1a48ce9264 Merge branch kvm-arm64/psci-1.1 into kvmarm-master/next
* kvm-arm64/psci-1.1:
  : .
  : Limited PSCI-1.1 support from Will Deacon:
  :
  : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which
  : allows the propagation of a "reset_type" and a "cookie" back to the VMM.
  : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"),
  : the vendor-defined range can be used by a bootloader to provide additional
  : information about the reset, such as an error code.
  : .
  KVM: arm64: Remove unneeded semicolons
  KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
  KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
  KVM: arm64: Bump guest PSCI version to 1.1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-25 13:49:48 +00:00
James Morse
a5905d6af4 KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
KVM allows the guest to discover whether the ARCH_WORKAROUND SMCCC are
implemented, and to preserve that state during migration through its
firmware register interface.

Add the necessary boiler plate for SMCCC_ARCH_WORKAROUND_3.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2022-02-24 13:58:52 +00:00
Will Deacon
34739fd95f KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
When handling reset and power-off PSCI calls from the guest, we
initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
re-run the vCPU after issuing the call.

Unfortunately, this also means that the VMM cannot see which PSCI call
was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
and SYSTEM_RESET2 calls, which is necessary in order to determine the
validity of the "reset_type" in X1.

Allocate bit 0 of the previously unused 'flags' field of the
system_event structure so that we can indicate the PSCI call used to
initiate the reset.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
2022-02-21 16:02:55 +00:00
Alexandru Elisei
583cda1b0e KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
2022-02-08 17:51:22 +00:00
Alexandru Elisei
6ee7fca2a4 KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
When KVM creates an event and there are more than one PMUs present on the
system, perf_init_event() will go through the list of available PMUs and
will choose the first one that can create the event. The order of the PMUs
in this list depends on the probe order, which can change under various
circumstances, for example if the order of the PMU nodes change in the DTB
or if asynchronous driver probing is enabled on the kernel command line
(with the driver_async_probe=armv8-pmu option).

Another consequence of this approach is that on heteregeneous systems all
virtual machines that KVM creates will use the same PMU. This might cause
unexpected behaviour for userspace: when a VCPU is executing on the
physical CPU that uses this default PMU, PMU events in the guest work
correctly; but when the same VCPU executes on another CPU, PMU events in
the guest will suddenly stop counting.

Fortunately, perf core allows user to specify on which PMU to create an
event by using the perf_event_attr->type field, which is used by
perf_init_event() as an index in the radix tree of available PMUs.

Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
attribute to allow userspace to specify the arm_pmu that KVM will use when
creating events for that VCPU. KVM will make no attempt to run the VCPU on
the physical CPUs that share the PMU, leaving it up to userspace to manage
the VCPU threads' affinity accordingly.

To ensure that KVM doesn't expose an asymmetric system to the guest, the
PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run,
the PMU cannot be changed in order to avoid changing the list of available
events for a VCPU, or to change the semantics of existing events.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
2022-02-08 17:51:21 +00:00
Joey Gouly
1175011a7d arm64: cpufeature: add HWCAP for FEAT_RPRES
Add a new HWCAP to detect the Increased precision of Reciprocal Estimate
and Reciprocal Square Root Estimate feature (FEAT_RPRES), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64ISAR2_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-4-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:53:00 +00:00
Joey Gouly
5c13f042e7 arm64: cpufeature: add HWCAP for FEAT_AFP
Add a new HWCAP to detect the Alternate Floating-point Behaviour
feature (FEAT_AFP), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64MMFR1_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-2-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:52:59 +00:00
Marc Zyngier
fee29f008a arm64: Add HWCAP for self-synchronising virtual counter
Since userspace can make use of the CNTVSS_EL0 instruction, expose
it via a HWCAP.

Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-18-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-19 10:56:20 +01:00
Mike Rapoport
7bb7f2ac24 arch, mm: wire up memfd_secret system call where relevant
Wire up memfd_secret system call on architectures that define
ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.

Link: https://lkml.kernel.org/r/20210518072034.31572-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:21 -07:00
Steven Price
f0376edb1d KVM: arm64: Add ioctl to fetch/store tags in a guest
The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
2021-06-22 14:08:06 +01:00
Will Deacon
8d14797b53 KVM: arm64: Move 'struct kvm_arch_memory_slot' out of uapi/
'struct kvm_arch_memory_slot' isn't part of the user ABI, so move it out
of the uapi/ headers in case we start using it in future and accidentally
back ourselves into a corner.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-2-will@kernel.org
2020-11-27 18:59:05 +00:00
Linus Torvalds
f9a705ad1c ARM:
- New page table code for both hypervisor and guest stage-2
 - Introduction of a new EL2-private host context
 - Allow EL2 to have its own private per-CPU variables
 - Support of PMU event filtering
 - Complete rework of the Spectre mitigation
 
 PPC:
 - Fix for running nested guests with in-kernel IRQ chip
 - Fix race condition causing occasional host hard lockup
 - Minor cleanups and bugfixes
 
 x86:
 - allow trapping unknown MSRs to userspace
 - allow userspace to force #GP on specific MSRs
 - INVPCID support on AMD
 - nested AMD cleanup, on demand allocation of nested SVM state
 - hide PV MSRs and hypercalls for features not enabled in CPUID
 - new test for MSR_IA32_TSC writes from host and guest
 - cleanups: MMU, CPUID, shared MSRs
 - LAPIC latency optimizations ad bugfixes
 
 For x86, also included in this pull request is a new alternative and
 (in the future) more scalable implementation of extended page tables
 that does not need a reverse map from guest physical addresses to
 host physical addresses.  For now it is disabled by default because
 it is still lacking a few of the existing MMU's bells and whistles.
 However it is a very solid piece of work and it is already available
 for people to hammer on it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+S8dsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM40Af+M46NJmuS5rcwFfybvK/c42KT6svX
 Co1NrZDwzSQ2mMy3WQzH9qeLvb+nbY4sT3n5BPNPNsT+aIDPOTDt//qJ2/Ip9UUs
 tRNea0MAR96JWLE7MSeeRxnTaQIrw/AAZC0RXFzZvxcgytXwdqBExugw4im+b+dn
 Dcz8QxX1EkwT+4lTm5HC0hKZAuo4apnK1QkqCq4SdD2QVJ1YE6+z7pgj4wX7xitr
 STKD6q/Yt/0ndwqS0GSGbyg0jy6mE620SN6isFRkJYwqfwLJci6KnqvEK67EcNMu
 qeE017K+d93yIVC46/6TfVHzLR/D1FpQ8LZ16Yl6S13OuGIfAWBkQZtPRg==
 =AD6a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "For x86, there is a new alternative and (in the future) more scalable
  implementation of extended page tables that does not need a reverse
  map from guest physical addresses to host physical addresses.

  For now it is disabled by default because it is still lacking a few of
  the existing MMU's bells and whistles. However it is a very solid
  piece of work and it is already available for people to hammer on it.

  Other updates:

  ARM:
   - New page table code for both hypervisor and guest stage-2
   - Introduction of a new EL2-private host context
   - Allow EL2 to have its own private per-CPU variables
   - Support of PMU event filtering
   - Complete rework of the Spectre mitigation

  PPC:
   - Fix for running nested guests with in-kernel IRQ chip
   - Fix race condition causing occasional host hard lockup
   - Minor cleanups and bugfixes

  x86:
   - allow trapping unknown MSRs to userspace
   - allow userspace to force #GP on specific MSRs
   - INVPCID support on AMD
   - nested AMD cleanup, on demand allocation of nested SVM state
   - hide PV MSRs and hypercalls for features not enabled in CPUID
   - new test for MSR_IA32_TSC writes from host and guest
   - cleanups: MMU, CPUID, shared MSRs
   - LAPIC latency optimizations ad bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
  kvm: x86/mmu: NX largepage recovery for TDP MMU
  kvm: x86/mmu: Don't clear write flooding count for direct roots
  kvm: x86/mmu: Support MMIO in the TDP MMU
  kvm: x86/mmu: Support write protection for nesting in tdp MMU
  kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
  kvm: x86/mmu: Support dirty logging for the TDP MMU
  kvm: x86/mmu: Support changed pte notifier in tdp MMU
  kvm: x86/mmu: Add access tracking for tdp_mmu
  kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
  kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
  kvm: x86/mmu: Add TDP MMU PF handler
  kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
  kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
  KVM: Cache as_id in kvm_memory_slot
  kvm: x86/mmu: Add functions to handle changed TDP SPTEs
  kvm: x86/mmu: Allocate and free TDP MMU roots
  kvm: x86/mmu: Init / Uninit the TDP MMU
  kvm: x86/mmu: Introduce tdp_iter
  KVM: mmu: extract spte.h and spte.c
  KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
  ...
2020-10-23 11:17:56 -07:00
Will Deacon
baab853229 Merge branch 'for-next/mte' into for-next/core
Add userspace support for the Memory Tagging Extension introduced by
Armv8.5.

(Catalin Marinas and others)
* for-next/mte: (30 commits)
  arm64: mte: Fix typo in memory tagging ABI documentation
  arm64: mte: Add Memory Tagging Extension documentation
  arm64: mte: Kconfig entry
  arm64: mte: Save tags when hibernating
  arm64: mte: Enable swap of tagged pages
  mm: Add arch hooks for saving/restoring tags
  fs: Handle intra-page faults in copy_mount_options()
  arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
  arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
  arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
  arm64: mte: Restore the GCR_EL1 register after a suspend
  arm64: mte: Allow user control of the generated random tags via prctl()
  arm64: mte: Allow user control of the tag check mode via prctl()
  mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
  arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
  mm: Introduce arch_validate_flags()
  arm64: mte: Add PROT_MTE support to mmap() and mprotect()
  mm: Introduce arch_calc_vm_flag_bits()
  arm64: mte: Tags-aware aware memcmp_pages() implementation
  arm64: Avoid unnecessary clear_user_page() indirection
  ...
2020-10-02 12:16:11 +01:00
Marc Zyngier
14ef9d0492 Merge branch 'kvm-arm64/hyp-pcpu' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30 14:05:35 +01:00
Marc Zyngier
29e8910a56 KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
Owing to the fact that the host kernel is always mitigated, we can
drastically simplify the WA2 handling by keeping the mitigation
state ON when entering the guest. This means the guest is either
unaffected or not mitigated.

This results in a nice simplification of the mitigation space,
and the removal of a lot of code that was never really used anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Marc Zyngier
d7eec2360e KVM: arm64: Add PMU event filtering infrastructure
It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.

For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).

Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.

Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-29 14:19:39 +01:00
Catalin Marinas
18ddbaa02b arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
Add support for bulk setting/getting of the MTE tags in a tracee's
address space at 'addr' in the ptrace() syscall prototype. 'data' points
to a struct iovec in the tracer's address space with iov_base
representing the address of a tracer's buffer of length iov_len. The
tags to be copied to/from the tracer's buffer are stored as one tag per
byte.

On successfully copying at least one tag, ptrace() returns 0 and updates
the tracer's iov_len with the number of tags copied. In case of error,
either -EIO or -EFAULT is returned, trying to follow the ptrace() man
page.

Note that the tag copying functions are not performance critical,
therefore they lack optimisations found in typical memory copy routines.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alan Hayward <Alan.Hayward@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04 12:46:07 +01:00