IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
confidential VMs
* fix "underline too short" in docs
* eliminate log spam from limited APIC timer periods
* disallow pre-faulting of memory before SEV-SNP VMs are initialized
* delay clearing and encrypting private memory until it is added to
guest page tables
* this change also enables another small cleanup: the checks in
SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
can now be moved in the common kvm_gmem_populate() function
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmar0uEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMf9Af9EZ0k0HHltM+iUSqKW+hcfnyjRSlh
MI2m8ZFF4Ra4a/H2CYWbUZSZd6U2TGQoy0cz8vN12uiaaRFSXHAzkoy1zhJGYujq
ljCUx46Ovo6DDfA1ve9jPdHQNOKWy6Js8yheP+i58Pau1u9fWTewfvWnrwkMgnfD
lkrSfnWhw7aBy7jTSd8KflRU/IugP2/ApsIhrjZZ9sFGncAwPBbb8NL/u5tI/l6f
VDp1in5a5gk2PhVRVzvINUxNzhcyuQ0wC07N+B4H+3U0NLg4CwiTBJr/yz0OOWz6
ThA20/fLTrs5jc2f5APk1EjGT8pqeMJYydI2FdqafSfY0PcTZJtXvzgdSw==
=CwzF
-----END PGP SIGNATURE-----
Merge branch 'kvm-fixes' into HEAD
* fix latent bug in how usage of large pages is determined for
confidential VMs
* fix "underline too short" in docs
* eliminate log spam from limited APIC timer periods
* disallow pre-faulting of memory before SEV-SNP VMs are initialized
* delay clearing and encrypting private memory until it is added to
guest page tables
* this change also enables another small cleanup: the checks in
SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
can now be moved in the common kvm_gmem_populate() function
KVM_PRE_FAULT_MEMORY for an SNP guest can race with
sev_gmem_post_populate() in bad ways. The following sequence for
instance can potentially trigger an RMP fault:
thread A, sev_gmem_post_populate: called
thread B, sev_gmem_prepare: places below 'pfn' in a private state in RMP
thread A, sev_gmem_post_populate: *vaddr = kmap_local_pfn(pfn + i);
thread A, sev_gmem_post_populate: copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
RMP #PF
Fix this by only allowing KVM_PRE_FAULT_MEMORY to run after a guest's
initial private memory contents have been finalized via
KVM_SEV_SNP_LAUNCH_FINISH.
Beyond fixing this issue, it just sort of makes sense to enforce this,
since the KVM_PRE_FAULT_MEMORY documentation states:
"KVM maps memory as if the vCPU generated a stage-2 read page fault"
which sort of implies we should be acting on the same guest state that a
vCPU would see post-launch after the initial guest memory is all set up.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix "WARNING: Title underline too short" by extending title line to the
proper length.
Signed-off-by: Chang Yu <marcus.yu.56@gmail.com>
Message-ID: <ZqB3lofbzMQh5Q-5@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Support for preemption
- i386 Rust support
- Huge cleanup by Benjamin Berg
- UBSAN support
- Removal of dead code
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmahIpkWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW4PD/wN03iDGNTPhegGgXTJTSwA8Gwk
i5JTEmhc84ifE9/bJpru8w4mcLMiWLWFIpF4bGcqfKLp67tTi3jn9Vk7ivaYkn2G
S875GqjdyqMVMfhJX+1qTxM6q/J5B7XGUpt1Zrot3AY1ANxnlwYscWX8jNvwmf+5
eCK9+xldkNWh1N67EjwsDgH6kkWyx3fcEe4E3gjXY0eSZtIwO/ZXYHSCSKznJOfu
iXo1Sx02w8TZp4tf/EwpWR1SMkPL23X8Of+rmiyI5udyLZixTnrFlclu8WUK4ZBO
ExYvOrzyYZ3E/mPFZf0E88h8xC3ETLsiHO3++JRAM1uDMp1+a6tPK7Bi6NTytemH
PIT++XRiORAbXu3aSTjpFDAhTHIMZ925eJMvQAtVhtAAwbkjSNh9NbusbMiucPNm
vvtYrEqYjPJpx+HRxy8kUywe/+jFLYofSDn6YrNRM+3HaM44YgkvbD6AOEMxWq19
YWkflmkDADez6eti03bAbiVuBB1v+Vnuz15ofrx45IUubb3uGVJYwEqQA5u8bAVr
H4NeIWDRpXOuYLgSyxRLFFVYhe6eAWbAXeSWBFxcGNDY6OBpMqr7kgV1mBOZtooK
8aBgZ0YcyiTpmiEevskkNWSBnqUMKIdztKkD7Db9HfCgd9yy7Vvfl+iLJTIqFJ5m
JxpvTy3it53ghQj40A==
=ybqq
-----END PGP SIGNATURE-----
Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Support for preemption
- i386 Rust support
- Huge cleanup by Benjamin Berg
- UBSAN support
- Removal of dead code
* tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
um: vector: always reset vp->opened
um: vector: remove vp->lock
um: register power-off handler
um: line: always fill *error_out in setup_one_line()
um: remove pcap driver from documentation
um: Enable preemption in UML
um: refactor TLB update handling
um: simplify and consolidate TLB updates
um: remove force_flush_all from fork_handler
um: Do not flush MM in flush_thread
um: Delay flushing syscalls until the thread is restarted
um: remove copy_context_skas0
um: remove LDT support
um: compress memory related stub syscalls while adding them
um: Rework syscall handling
um: Add generic stub_syscall6 function
um: Create signal stack memory assignment in stub_data
um: Remove stub-data.h include from common-offsets.h
um: time-travel: fix signal blocking race/hang
um: time-travel: remove time_exit()
...
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
* Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
* Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
the protocol
* FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
* New command-line parameter to control the WFx trap behavior under KVM
* Introduce kCFI hardening in the EL2 hypervisor
* Fixes + cleanups for handling presence/absence of FEAT_TCRX
* Miscellaneous fixes + documentation updates
LoongArch:
* Add paravirt steal time support.
* Add support for KVM_DIRTY_LOG_INITIALLY_SET.
* Add perf kvm-stat support for loongarch.
RISC-V:
* Redirect AMO load/store access fault traps to guest
* perf kvm stat support
* Use guest files for IMSIC virtualization, when available
ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
extensions is coming through the RISC-V tree.
s390:
* Assortment of tiny fixes which are not time critical
x86:
* Fixes for Xen emulation.
* Add a global struct to consolidate tracking of host values, e.g. EFER
* Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
bus frequency, because TDX.
* Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
* Clean up KVM's handling of vendor specific emulation to consistently act on
"compatible with Intel/AMD", versus checking for a specific vendor.
* Drop MTRR virtualization, and instead always honor guest PAT on CPUs
that support self-snoop.
* Update to the newfangled Intel CPU FMS infrastructure.
* Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
'0' and writes from userspace are ignored.
* Misc cleanups
x86 - MMU:
* Small cleanups, renames and refactoring extracted from the upcoming
Intel TDX support.
* Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
hold leafs SPTEs.
* Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
page splitting, to avoid stalling vCPUs when splitting huge pages.
* Bug the VM instead of simply warning if KVM tries to split a SPTE that is
non-present or not-huge. KVM is guaranteed to end up in a broken state
because the callers fully expect a valid SPTE, it's all but dangerous
to let more MMU changes happen afterwards.
x86 - AMD:
* Make per-CPU save_area allocations NUMA-aware.
* Force sev_es_host_save_area() to be inlined to avoid calling into an
instrumentable function from noinstr code.
* Base support for running SEV-SNP guests. API-wise, this includes
a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
guest memory, and finalizing it before launching it. Internally,
there are some gmem/mmu hooks needed to prepare gmem-allocated pages
before mapping them into guest private memory ranges.
This includes basic support for attestation guest requests, enough to
say that KVM supports the GHCB 2.0 specification.
There is no support yet for loading into the firmware those signing
keys to be used for attestation requests, and therefore no need yet
for the host to provide certificate data for those keys. To support
fetching certificate data from userspace, a new KVM exit type will be
needed to handle fetching the certificate from userspace. An attempt to
define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
this was introduced in v1 of this patchset, but is still being discussed
by community, so for now this patchset only implements a stub version
of SNP Extended Guest Requests that does not provide certificate data.
x86 - Intel:
* Remove an unnecessary EPT TLB flush when enabling hardware.
* Fix a series of bugs that cause KVM to fail to detect nested pending posted
interrupts as valid wake eents for a vCPU executing HLT in L2 (with
HLT-exiting disable by L1).
* KVM: x86: Suppress MMIO that is triggered during task switch emulation
Explicitly suppress userspace emulated MMIO exits that are triggered when
emulating a task switch as KVM doesn't support userspace MMIO during
complex (multi-step) emulation. Silently ignoring the exit request can
result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
userspace for some other reason prior to purging mmio_needed.
See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if
emulator detects exception") for more details on KVM's limitations with
respect to emulated MMIO during complex emulator flows.
Generic:
* Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
because the special casing needed by these pages is not due to just
unmovability (and in fact they are only unmovable because the CPU cannot
access them).
* New ioctl to populate the KVM page tables in advance, which is useful to
mitigate KVM page faults during guest boot or after live migration.
The code will also be used by TDX, but (probably) not through the ioctl.
* Enable halt poll shrinking by default, as Intel found it to be a clear win.
* Setup empty IRQ routing when creating a VM to avoid having to synchronize
SRCU when creating a split IRQCHIP on x86.
* Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
that arch code can use for hooking both sched_in() and sched_out().
* Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace detect bugs.
* Mark a vCPU as preempted if and only if it's scheduled out while in the
KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
memory when retrieving guest state during live migration blackout.
Selftests:
* Remove dead code in the memslot modification stress test.
* Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
* Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
log for tests that create lots of VMs.
* Make the PMU counters test less flaky when counting LLC cache misses by
doing CLFLUSH{OPT} in every loop iteration.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
+sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
=zepX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
- Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
- Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
of the protocol
- FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
- New command-line parameter to control the WFx trap behavior under
KVM
- Introduce kCFI hardening in the EL2 hypervisor
- Fixes + cleanups for handling presence/absence of FEAT_TCRX
- Miscellaneous fixes + documentation updates
LoongArch:
- Add paravirt steal time support
- Add support for KVM_DIRTY_LOG_INITIALLY_SET
- Add perf kvm-stat support for loongarch
RISC-V:
- Redirect AMO load/store access fault traps to guest
- perf kvm stat support
- Use guest files for IMSIC virtualization, when available
s390:
- Assortment of tiny fixes which are not time critical
x86:
- Fixes for Xen emulation
- Add a global struct to consolidate tracking of host values, e.g.
EFER
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
effective APIC bus frequency, because TDX
- Print the name of the APICv/AVIC inhibits in the relevant
tracepoint
- Clean up KVM's handling of vendor specific emulation to
consistently act on "compatible with Intel/AMD", versus checking
for a specific vendor
- Drop MTRR virtualization, and instead always honor guest PAT on
CPUs that support self-snoop
- Update to the newfangled Intel CPU FMS infrastructure
- Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
it reads '0' and writes from userspace are ignored
- Misc cleanups
x86 - MMU:
- Small cleanups, renames and refactoring extracted from the upcoming
Intel TDX support
- Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
that can't hold leafs SPTEs
- Unconditionally drop mmu_lock when allocating TDP MMU page tables
for eager page splitting, to avoid stalling vCPUs when splitting
huge pages
- Bug the VM instead of simply warning if KVM tries to split a SPTE
that is non-present or not-huge. KVM is guaranteed to end up in a
broken state because the callers fully expect a valid SPTE, it's
all but dangerous to let more MMU changes happen afterwards
x86 - AMD:
- Make per-CPU save_area allocations NUMA-aware
- Force sev_es_host_save_area() to be inlined to avoid calling into
an instrumentable function from noinstr code
- Base support for running SEV-SNP guests. API-wise, this includes a
new KVM_X86_SNP_VM type, encrypting/measure the initial image into
guest memory, and finalizing it before launching it. Internally,
there are some gmem/mmu hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges
This includes basic support for attestation guest requests, enough
to say that KVM supports the GHCB 2.0 specification
There is no support yet for loading into the firmware those signing
keys to be used for attestation requests, and therefore no need yet
for the host to provide certificate data for those keys.
To support fetching certificate data from userspace, a new KVM exit
type will be needed to handle fetching the certificate from
userspace.
An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
exit type to handle this was introduced in v1 of this patchset, but
is still being discussed by community, so for now this patchset
only implements a stub version of SNP Extended Guest Requests that
does not provide certificate data
x86 - Intel:
- Remove an unnecessary EPT TLB flush when enabling hardware
- Fix a series of bugs that cause KVM to fail to detect nested
pending posted interrupts as valid wake eents for a vCPU executing
HLT in L2 (with HLT-exiting disable by L1)
- KVM: x86: Suppress MMIO that is triggered during task switch
emulation
Explicitly suppress userspace emulated MMIO exits that are
triggered when emulating a task switch as KVM doesn't support
userspace MMIO during complex (multi-step) emulation
Silently ignoring the exit request can result in the
WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
for some other reason prior to purging mmio_needed
See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write
exits if emulator detects exception") for more details on KVM's
limitations with respect to emulated MMIO during complex emulator
flows
Generic:
- Rename the AS_UNMOVABLE flag that was introduced for KVM to
AS_INACCESSIBLE, because the special casing needed by these pages
is not due to just unmovability (and in fact they are only
unmovable because the CPU cannot access them)
- New ioctl to populate the KVM page tables in advance, which is
useful to mitigate KVM page faults during guest boot or after live
migration. The code will also be used by TDX, but (probably) not
through the ioctl
- Enable halt poll shrinking by default, as Intel found it to be a
clear win
- Setup empty IRQ routing when creating a VM to avoid having to
synchronize SRCU when creating a split IRQCHIP on x86
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
a flag that arch code can use for hooking both sched_in() and
sched_out()
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace
detect bugs
- Mark a vCPU as preempted if and only if it's scheduled out while in
the KVM_RUN loop, e.g. to avoid marking it preempted and thus
writing guest memory when retrieving guest state during live
migration blackout
Selftests:
- Remove dead code in the memslot modification stress test
- Treat "branch instructions retired" as supported on all AMD Family
17h+ CPUs
- Print the guest pseudo-RNG seed only when it changes, to avoid
spamming the log for tests that create lots of VMs
- Make the PMU counters test less flaky when counting LLC cache
misses by doing CLFLUSH{OPT} in every loop iteration"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
crypto: ccp: Add the SNP_VLEK_LOAD command
KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
KVM: x86: Replace static_call_cond() with static_call()
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
x86/sev: Move sev_guest.h into common SEV header
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
KVM: x86: Suppress MMIO that is triggered during task switch emulation
KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
KVM: Document KVM_PRE_FAULT_MEMORY ioctl
mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
perf kvm: Add kvm-stat for loongarch64
LoongArch: KVM: Add PV steal time support in guest side
...
- Remove support for 40x CPUs & platforms.
- Add support to the 64-bit BPF JIT for cpu v4 instructions.
- Fix PCI hotplug driver crash on powernv.
- Fix doorbell emulation for KVM on PAPR guests (nestedv2).
- Fix KVM nested guest handling of some less used SPRs.
- Online NUMA nodes with no CPU/memory if they have a PCI device attached.
- Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels.
- Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE.
Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King,
Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani,
Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski,
Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm),
Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy
Pearson, Uwe Kleine-König, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaaUNITHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDA+D/4o7OZ+SY0plTlMKSy3hW/SRXVj/byA
CCKdizNY+3Rf/+K7KhuLOUPXhZOemLPE0xfKS3ND4mIEKCswzzXqmi6kjPH0qd8q
qUhkHbt/LNpNJzZOYYw+usaklMTMdZtAl/jD9WEvGwgu2EYHgrujRIq04kEI1b0e
OPiRnXOZcfevRBepQmYZKHvFlCRRa5vvsQcvLfY64yFqD0AsKTHgIi/48Dn33pb2
hqHYyV1tZA3uT86Z1TgF1OG83VOSDsgc19Sb2xn14O9aJJ7lD2TOgVa4P4FfBlXA
TXYYGQwK31ymGVWGcGfebVdC1ECeTem9n28vlk5I0NO9xNgPok/Ov4DAiZ+u1G0E
3CXRDx9Uz2yPcGBJI2dpxfp2iw83Ad2DtBzAdukMD36xnC7xfrQz+W9SQfbcPJ8e
I5SMAstWuLNgrX7YkjAOnXh1N41kht/mdV6KHdcMxPc7jOtAD65gUOZcgwYLeXlT
Av17Ax0PMbiQ1BpFe2KNr/0T9Ba5k5rN7oDSKncDAq4uX8LcZKHj4bSHT9KroT1C
q+GERspoCYp2VDMO742Jm7KTmQDHsS5y4Q+iSdOR8cQBXF613FaryDxSoJZhg2pf
C2zIVED13RGcjIFcWlv73iA6QpBsphM+WWFz7mjULyJhxFQwm6BYt+Wy6jFu84oH
sOgvPH8YyaK2uA==
=eHVd
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove support for 40x CPUs & platforms
- Add support to the 64-bit BPF JIT for cpu v4 instructions
- Fix PCI hotplug driver crash on powernv
- Fix doorbell emulation for KVM on PAPR guests (nestedv2)
- Fix KVM nested guest handling of some less used SPRs
- Online NUMA nodes with no CPU/memory if they have a PCI device
attached
- Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels
- Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE
Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian
King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra,
Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna
Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler,
Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat,
Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and
Vaibhav Jain.
* tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits)
Documentation/powerpc: Mention 40x is removed
powerpc: Remove 40x leftovers
macintosh/therm_windtunnel: fix module unload.
powerpc: Check only single values are passed to CPU/MMU feature checks
powerpc/xmon: Fix disassembly CPU feature checks
powerpc: Drop clang workaround for builtin constant checks
powerpc64/bpf: jit support for signed division and modulo
powerpc64/bpf: jit support for sign extended mov
powerpc64/bpf: jit support for sign extended load
powerpc64/bpf: jit support for unconditional byte swap
powerpc64/bpf: jit support for 32bit offset jmp instruction
powerpc/pci: Hotplug driver bridge support
pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC
powerpc: add missing MODULE_DESCRIPTION() macros
macintosh/mac_hid: add MODULE_DESCRIPTION()
KVM: PPC: add missing MODULE_DESCRIPTION() macros
powerpc/kexec: Use of_property_read_reg()
powerpc/64s/radix/kfence: map __kfence_pool at page granularity
powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API
...
When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).
For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.
See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501085210.2213060-21-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VM Service Module (SVSM).
When running over a SVSM, different services can run at different
protection levels, apart from the guest OS but still within the
secure SNP environment. They can provide services to the guest, like
a vTPM, for example.
This series adds the required facilities to interface with such a SVSM
module.
- The usual fixlets, refactoring and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaWQuoACgkQEsHwGGHe
VUrmEw/+KqM5DK5cfpue3gn0RfH6OYUoFxOdYhGkG53qUMc3c3ka5zPVqLoHPkzp
WPXha0Z5pVdrcD9mKtVUW9RIuLjInCM/mnoNc3tIUL+09xxemAjyG1+O+4kodiU7
sZ5+HuKUM2ihoC4Rrm+ApRrZfH4+WcgQNvFky77iObWVBo4yIscS7Pet/MYFvuuz
zNaGp2SGGExDeoX/pMQNI3S9FKYD26HR17AUI3DHpS0teUl2npVi4xDjFVYZh0dQ
yAhTKbSX3Q6ekDDkvAQUbxvWTJw9qoIsvLO9dvZdx6SSWmzF9IbuECpQKGQwYcp+
pVtcHb+3MwfB+nh5/fHyssRTOZp1UuI5GcmLHIQhmhQwCqPgzDH6te4Ud1ovkxOu
3GoBre7KydnQIyv12I+56/ZxyPbjHWmn8Fg106nAwGTdGbBJhfcVYfPmPvwpI4ib
nXpjypvM8FkLzLAzDK6GE9QiXqJJlxOn7t66JiH/FkXR4gnY3eI8JLMfnm5blAb+
97LC7oyeqtstWth9/4tpCILgPR2tirrMQGjUXttgt+2VMzqnEamnFozsKvR95xok
4j6ulKglZjdpn0ixHb2vAzAcOJvD7NP147jtCmXH7M6/f9H1Lih3MKdxX98MVhWB
wSp16udXHzu5lF45J0BJG8uejSgBI2y51jc92HLX7kRULOGyaEo=
=u15r
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Add support for running the kernel in a SEV-SNP guest, over a Secure
VM Service Module (SVSM).
When running over a SVSM, different services can run at different
protection levels, apart from the guest OS but still within the
secure SNP environment. They can provide services to the guest, like
a vTPM, for example.
This series adds the required facilities to interface with such a
SVSM module.
- The usual fixlets, refactoring and cleanups
[ And as always: "SEV" is AMD's "Secure Encrypted Virtualization".
I can't be the only one who gets all the newer x86 TLA's confused,
can I?
- Linus ]
* tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly
x86/sev: Do RMP memory coverage check after max_pfn has been set
x86/sev: Move SEV compilation units
virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch
x86/sev: Allow non-VMPL0 execution when an SVSM is present
x86/sev: Extend the config-fs attestation support for an SVSM
x86/sev: Take advantage of configfs visibility support in TSM
fs/configfs: Add a callback to determine attribute visibility
sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated
virt: sev-guest: Choose the VMPCK key based on executing VMPL
x86/sev: Provide guest VMPL level to userspace
x86/sev: Provide SVSM discovery support
x86/sev: Use the SVSM to create a vCPU when not in VMPL0
x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
x86/sev: Use kernel provided SVSM Calling Areas
x86/sev: Check for the presence of an SVSM in the SNP secrets page
x86/irqflags: Provide native versions of the local_irq_save()/restore()
Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD
hack, and instead always honor guest PAT on CPUs that support self-snoop.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRuwAACgkQOlYIJqCj
N/32Gg/+Nnnz6TCRno2vursPJme7gvtLdqSxjazAj3u2ZO8IApGYWMyfVpS+ymC9
Wdpj6gRe2ukSxgTsUI2CYoy5V2NxDaA9YgdTPZUVQvqwujVrqZCJ7L393iPYYnC9
No3LXZ+SOYRmomiCzknjC6GOlT2hAZHzQsyaXDlEYok7NAA2L6XybbLonEdA4RYi
V1mS62W5PaA4tUesuxkJjPujXo1nXRWD/aXOruJWjPESdSFSALlx7reFAf2Nwn7K
Uw8yZqhq6vWAZSph0Nz8OrZOS/kULKA3q2zl1B/qJJ0ToAt2VdXS6abXky52RExf
KvP+jBAWMO5kHbIqaMRtCHjbIkbhH8RdUIYNJQEUQ5DdydM5+/RDa+KprmLPcmUn
qvJq+3uyH0MEENtneGegs8uxR+sn6fT32cGMIw790yIywddh562+IJ4Z+C3BuYJi
yszD71odqKT8+knUd2CaZjE9UZyoQNDfj2OCCTzzZOC/6TuJWCh9CYQ1csssHbQR
KcvZCKE6ht8tWwi+2HWj0laOdg1reX2kV869k3xH4uCwEaFIj2Wk+/Bw/lg2Tn5h
5uTnQ01dx5XhAV1klr6IY3VXJ/A8G8895wRfkZEelsA9Wj8qZvNgXhsoXReIUIrn
aR0ppsFcbqHzC50qE2JT4juTD1EPx95LL9zKT8pI9mGKwxCAxUM=
=yb10
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-mtrrs-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MTRR virtualization removal
Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD
hack, and instead always honor guest PAT on CPUs that support self-snoop.
- Add a global struct to consolidate tracking of host values, e.g. EFER, and
move "shadow_phys_bits" into the structure as "maxphyaddr".
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
bus frequency, because TDX.
- Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
- Clean up KVM's handling of vendor specific emulation to consistently act on
"compatible with Intel/AMD", versus checking for a specific vendor.
- Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRub0ACgkQOlYIJqCj
N/2LMxAArGzhcWZ6Qdo2aMRaMIPtSBJHmbEgEuHvHMumgsTZQzDcn9cxDi/hNSrc
l8ODOwAM2qNcq95YfwjU7F0ae3E+HRzGvKcBnmZWuQeCDp2HhVEoCphFu1sHst+t
XEJTL02b6OgyJUEU3h40mYk12eiq2S4FCnFYXPCqijwwuL6Y5KQvvTqek3c2/SDn
c+VneutYGax/S0GiiCkYh4wrwWh9g7qm0IX70ycBwJbW5qBFKgyglvHxvL8JLJC9
Nkkw/p2657wcOdraH+fOBuRy2dMwE5fv++1tOjWwB5WAAhSOJPZh0BGYvgA2yfN7
OE+k7APKUQd9Xxtud8H3LrTPoyMA4hz2sdDFyqrrWK9yjpBY7zXNyN50Fxi7VVsm
T8nTIiKAGyRbjotY+m7krXQPXjfZYhVqrJ/jtxESOZLZ93q2gSWU2p/ZXpUPVHnH
+YOBAI1owP3wepaYlrthtI4LQx9lF422dnmeSflztfKFGabRbQZxg3uHMCCxIaGc
lJ6CD546+D45f/uBXRDMqk//qFTqXhKUbDk9sutmU/C2oWufMwW0R8kOyItGPyvk
9PP1vd8vSsIHj+tpwg+i04jBqYDaAcPBOcTZaHm9SYYP+1e11Uu5Vjep37JL1bkA
xJWxnDZOCGcfKQi2jkh51HJ/dOAHXY1GQKMfyAoPQOSonYHvGVY=
=Cf2R
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.11
- Add a global struct to consolidate tracking of host values, e.g. EFER, and
move "shadow_phys_bits" into the structure as "maxphyaddr".
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
bus frequency, because TDX.
- Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
- Clean up KVM's handling of vendor specific emulation to consistently act on
"compatible with Intel/AMD", versus checking for a specific vendor.
- Misc cleanups
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
- Setup empty IRQ routing when creating a VM to avoid having to synchronize
SRCU when creating a split IRQCHIP on x86.
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
that arch code can use for hooking both sched_in() and sched_out().
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace detect bugs.
- Mark a vCPU as preempted if and only if it's scheduled out while in the
KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
memory when retrieving guest state during live migration blackout.
- A few minor cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRuOYACgkQOlYIJqCj
N/1UnQ/8CI5Qfr+/0gzYgtWmtEMczGG+rMNpzD3XVqPjJjXcMcBiQnplnzUVLhha
vlPdYVK7vgmEt003XGzV55mik46LHL+DX/v4hI3HEdblfyCeNLW3fKEWVRB44qJe
o+YUQwSK42SORUp9oXuQINxhA//U9EnI7CQxlJ8w8wenv5IJKfIGr01DefmfGPAV
PKm9t6WLcNqvhZMEyy/zmzM3KVPCJL0NcwI97x6sHxFpQYIDtL0E/VexA4AFqMoT
QK7cSDC/2US41Zvem/r/GzM/ucdF6vb9suzZYBohwhxtVhwJe2CDeYQZvtNKJ1U7
GOHPaKL6nBWdZCm/yyWbbX2nstY1lHqxhN3JD0X8wqU5rNcwm2b8Vfyav0Ehc7H+
jVbDTshOx4YJmIgajoKjgM050rdBK59TdfVL+l+AAV5q/TlHocalYtvkEBdGmIDg
2td9UHSime6sp20vQfczUEz4bgrQsh4l2Fa/qU2jFwLievnBw0AvEaMximkSGMJe
b8XfjmdTjlOesWAejANKtQolfrq14+1wYw0zZZ8PA+uNVpKdoovmcqSOcaDC9bT8
GO/NFUvoG+lkcvJcIlo1SSl81SmGLosijwxWfGvFAqsgpR3/3l3dYp0QtztoCNJO
d3+HnjgYn5o5FwufuTD3eUOXH4AFjG108DH0o25XrIkb2Kymy0o=
=BalU
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.11
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
- Setup empty IRQ routing when creating a VM to avoid having to synchronize
SRCU when creating a split IRQCHIP on x86.
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
that arch code can use for hooking both sched_in() and sched_out().
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace detect bugs.
- Mark a vCPU as preempted if and only if it's scheduled out while in the
KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
memory when retrieving guest state during live migration blackout.
- A few minor cleanups
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
- Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
- Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
the protocol
- FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
- New command-line parameter to control the WFx trap behavior under KVM
- Introduce kCFI hardening in the EL2 hypervisor
- Fixes + cleanups for handling presence/absence of FEAT_TCRX
- Miscellaneous fixes + documentation updates
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZpTCAxccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFjChAQCWs9ucJag4USgvXpg5mo9sxzly
kBZZ1o49N/VLxs4cagEAtq3KVNQNQyGXelYH6gr20aI85j6VnZW5W5z+sy5TAgk=
=sSOt
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.11
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
- Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
- Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
the protocol
- FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
- New command-line parameter to control the WFx trap behavior under KVM
- Introduce kCFI hardening in the EL2 hypervisor
- Fixes + cleanups for handling presence/absence of FEAT_TCRX
- Miscellaneous fixes + documentation updates
1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmaOS6UWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImehejD/9pACGe3h3krXLcFVWXOFIu5Hpc
5kQLP0lSPJ/o5Xs8t/oPLrnDX70z90wXI1LOmltc7h32MSwFa2l8COQh+sN5eJBQ
PNyt7u7bMipp0yJS4Gl3LQQ5vklcGOSpQc/gbeXnVx8J/tz+Mo9YGGLIXVRXRM6W
Ri8D2VVFiwzQQYeTpPo1u1Ob8C6mA4KOppwvhscMTM3vj4NMbsinBzRnR0lG0Tdw
meFhxDPly1Ksxsbnj9UGO6UnEY0A2SLONs6MiO4y4DtoqoDlw/lbqFJuYo4vvbx1
pxtjyirD/PX/wjslQFWUOuU0hMfAodera+JupZ5BZWfcG8FltA4DQfDsm/U9RjK/
7gGNnr8Xk2/tp6+4AVV+HU2iTgRvq+mXCL72zSy2Y4r7ElBAANDfk4n+Zn/PWisn
U9wwV8Ue7tVB15BRpRsg77NzBidiCFEe/6flWYiX2y24ke71gwDJBGUy8hMdKt6t
4Cq8atsU0MvDAzfYMsK9JjskJp4UFq6wb1tXbbuADM4TDhnzlK6s6h3vM+pFlh/f
my7fDH8/2qsCWhBDM4pmsJskVp+I1GOk/80RjTQISwx7iHktJWvxNYTaisK2fvD5
Qs1IUWfNFbDX0Lr0QpN6j6X4rZkghR4R6XoFkd4nkicwi+UHVn3oK9GSqv24QJn9
7+Ev3dfRTUYLd6mC4Q==
=DpIK
-----END PGP SIGNATURE-----
Merge tag 'loongarch-kvm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
- Rejecting memory region operations for ucontrol mode VMs
- Rewind the PSW on host intercepts for VSIE
- Remove unneeded include
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmaOeL8ACgkQ41TmuOI4
ufgL/RAAs5AKJIeV/iuxUDTyJAA0Jhu61xhFbO4u50K5Bxc+pr+DCBWvOGPMtI1D
dYUqUGN1Ii47tHJ4oztQzAQnX+PCCyxemOjuUinzzyhJKuwusJHYW57wXPBE8a4t
HI/6huTkL7zHJ1nml/S9YkTpdzVA0a6AOEWzV/+tinmjlDRrLEkKGGto2sNN+rym
+K+QLt8+RGHDWORCE1fry51sgv4liKna+V9kEgJBlO17jR1tWNIpdaozyZ7agLLT
Fi45nd4eqZqAqDixQ0avggPk/spfUxwqJackqqWPDjGGyQ9kgjOs4AuuMFD5naoF
UbueRjjYtpPRPw1XVfvblXLiDhNWveS3vF4D0Dg8+2TVwYXmg1Yhy/pAACNja+wJ
uZSTjqSU/soYgIWfVWo6mTnCNdxvgBZXpPA6t6feky4RZzsnSrDf8EIHvYUHcqRo
nNRfseWKhi0Kq0t1Cy2WBBdjJnupQLnTz8ft+RRGf9XU6mYaiJ+XO4dL3P9pNFGc
qHL6QkGG2EsRHXD4n3b+rB0qOu8BzmwcgqFZGlv9hAgnXFYDmgMYEm0CBhMqigDy
5d2CTwvGTqU2nxXxxPsYpNDAEam4WoC6aLJgephPfBqbHv4BCtUmwByNk9DBkFsv
czQp6XOlyimRAIniD0Mwl/aigEc9rB232WlU7zZzzg5qYlxHj+g=
=D4sp
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-6.11-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Assortment of tiny fixes which are not time critical:
- Rejecting memory region operations for ucontrol mode VMs
- Rewind the PSW on host intercepts for VSIE
- Remove unneeded include
Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1]
It populates guest memory. It doesn't do extra operations on the
underlying technology-specific initialization [2]. For example,
CoCo-related operations won't be performed. Concretely for TDX, this API
won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs
are required for such operations.
The key point is to adapt of vcpu ioctl instead of VM ioctl. First,
populating guest memory requires vcpu. If it is VM ioctl, we need to pick
one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this
ioctl in parallel. It helps to scale regarding guest memory size, e.g.,
hundreds of GB.
[1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
[2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This change rejects the KVM_SET_USER_MEMORY_REGION and
KVM_SET_USER_MEMORY_REGION2 ioctls when called on a ucontrol VM.
This is necessary since ucontrol VMs have kvm->arch.gmap set to 0 and
would thus result in a null pointer dereference further in.
Memory management needs to be performed in userspace and using the
ioctls KVM_S390_UCAS_MAP and KVM_S390_UCAS_UNMAP.
Also improve s390 specific documentation for KVM_SET_USER_MEMORY_REGION
and KVM_SET_USER_MEMORY_REGION2.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Fixes: 27e0393f15fc ("KVM: s390: ucontrol: per vcpu address spaces")
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240624095902.29375-1-schlameuss@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: commit message spelling fix, subject prefix fix]
Message-ID: <20240624095902.29375-1-schlameuss@linux.ibm.com>
In arch/arm64/include/uapi/asm/kvm.h, we have
#define KVM_VGIC_V2_CPU_SIZE 0x2000
So the CPU interface address space should cover 8 KByte not 4 KByte.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Link: https://lore.kernel.org/r/20240623164542.2999626-3-changyuanl@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The expression `irq_type[n]` may confuse readers to interpret `n`
as the bit position and think of CPU = 1 << 0, SPI = 1 << 1, and
PPI = 1 << 2.
Since arch/arm64/include/uapi/asm/kvm.h already has macro definitions
for the allowed values, this commit uses these symbols to clear up
the ambiguity.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Link: https://lore.kernel.org/r/20240623164542.2999626-2-changyuanl@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Currently, the sev-guest driver uses the vmpck-0 key by default. When an
SVSM is present, the kernel is running at a VMPL other than 0 and the
vmpck-0 key is no longer available. If a specific vmpck key has not be
requested by the user via the vmpck_id module parameter, choose the
vmpck key based on the active VMPL level.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/b88081c5d88263176849df8ea93e90a404619cab.1717600736.git.thomas.lendacky@amd.com
When a vCPU is interrupted by a signal while running a nested guest,
KVM will exit to userspace with L2 state. However, userspace has no
way to know whether it sees L1 or L2 state (besides calling
KVM_GET_STATS_FD, which does not have a stable ABI).
This causes multiple problems:
The simplest one is L2 state corruption when userspace marks the sregs
as dirty. See this mailing list thread [1] for a complete discussion.
Another problem is that if userspace decides to continue by emulating
instructions, it will unknowingly emulate with L2 state as if L1
doesn't exist, which can be considered a weird guest escape.
Introduce a new flag, KVM_RUN_X86_GUEST_MODE, in the kvm_run data
structure, which is set when the vCPU exited while running a nested
guest. Also introduce a new capability, KVM_CAP_X86_GUEST_MODE, to
advertise the functionality to userspace.
[1] https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@cyberus-technology.de/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Link: https://lore.kernel.org/r/20240508132502.184428-1-julian.stecklina@cyberus-technology.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
Improve the description for the KVM_CAP_X86_BUS_LOCK_EXIT capability to
fix a few typos and grammar issues, and to clarify the purpose of the
capability.
Signed-off-by: Carlos López <clopez@suse.de>
Link: https://lore.kernel.org/r/20240424105616.29596-1-clopez@suse.de
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
The patch adds a one-reg register identifier which can be used to
read and set the virtual HASHPKEYR for the guest during enter/exit
with KVM_REG_PPC_HASHPKEYR. The specific SPR KVM API documentation
too updated.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171759285547.1480.12374595786792346073.stgit@linux.ibm.com
The patch adds a one-reg register identifier which can be used to
read and set the virtual HASHKEYR for the guest during enter/exit
with KVM_REG_PPC_HASHKEYR. The specific SPR KVM API documentation
too updated.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171759283170.1480.12904332463112769129.stgit@linux.ibm.com
The patch adds a one-reg register identifier which can be used to
read and set the DEXCR for the guest during enter/exit with
KVM_REG_PPC_DEXCR. The specific SPR KVM API documentation
too updated.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171759279613.1480.12873911783530175699.stgit@linux.ibm.com
Drop KVM's emulation of CR0.CD=1 on Intel CPUs now that KVM no longer
honors guest MTRR memtypes, as forcing UC memory for VMs with
non-coherent DMA only makes sense if the guest is using something other
than PAT to configure the memtype for the DMA region.
Furthermore, KVM has forced WB memory for CR0.CD=1 since commit
fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED"), and no known
VMM in existence disables KVM_X86_QUIRK_CD_NW_CLEARED, let alone does
so with non-coherent DMA.
Lastly, commit fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") was
from the same author as commit b18d5431acc7 ("KVM: x86: fix CR0.CD
virtualization"), and followed by a mere month. I.e. forcing UC memory
was likely the result of code inspection or perhaps misdiagnosed failures,
and not the necessitate by a concrete use case.
Update KVM's documentation to note that KVM_X86_QUIRK_CD_NW_CLEARED is now
AMD-only, and to take an erratum for lack of CR0.CD virtualization on
Intel.
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240309010929.1403984-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Remove KVM's support for virtualizing guest MTRR memtypes, as full MTRR
adds no value, negatively impacts guest performance, and is a maintenance
burden due to it's complexity and oddities.
KVM's approach to virtualizating MTRRs make no sense, at all. KVM *only*
honors guest MTRR memtypes if EPT is enabled *and* the guest has a device
that may perform non-coherent DMA access. From a hardware virtualization
perspective of guest MTRRs, there is _nothing_ special about EPT. Legacy
shadowing paging doesn't magically account for guest MTRRs, nor does NPT.
Unwinding and deciphering KVM's murky history, the MTRR virtualization
code appears to be the result of misdiagnosed issues when EPT + VT-d with
passthrough devices was enabled years and years ago. And importantly, the
underlying bugs that were fudged around by honoring guest MTRR memtypes
have since been fixed (though rather poorly in some cases).
The zapping GFNs logic in the MTRR virtualization code came from:
commit efdfe536d8c643391e19d5726b072f82964bfbdb
Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Date: Wed May 13 14:42:27 2015 +0800
KVM: MMU: fix MTRR update
Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
the original root shadow page will be reused
2) the cache type is set on the last spte, that means we should sync
the last sptes when MTRR is changed
This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR
which was a fix for:
commit 0bed3b568b68e5835ef5da888a372b9beabf7544
Author: Sheng Yang <sheng@linux.intel.com>
AuthorDate: Thu Oct 9 16:01:54 2008 +0800
Commit: Avi Kivity <avi@redhat.com>
CommitDate: Wed Dec 31 16:51:44 2008 +0200
KVM: Improve MTRR structure
As well as reset mmu context when set MTRR.
which was part of a "MTRR/PAT support for EPT" series that also added:
+ if (mt_mask) {
+ mt_mask = get_memory_type(vcpu, gfn) <<
+ kvm_x86_ops->get_mt_mask_shift();
+ spte |= mt_mask;
+ }
where get_memory_type() was a truly gnarly helper to retrieve the guest
MTRR memtype for a given memtype. And *very* subtly, at the time of that
change, KVM *always* set VMX_EPT_IGMT_BIT,
kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
VMX_EPT_WRITABLE_MASK |
VMX_EPT_DEFAULT_MT << VMX_EPT_MT_EPTE_SHIFT |
VMX_EPT_IGMT_BIT);
which came in via:
commit 928d4bf747e9c290b690ff515d8f81e8ee226d97
Author: Sheng Yang <sheng@linux.intel.com>
AuthorDate: Thu Nov 6 14:55:45 2008 +0800
Commit: Avi Kivity <avi@redhat.com>
CommitDate: Tue Nov 11 21:00:37 2008 +0200
KVM: VMX: Set IGMT bit in EPT entry
There is a potential issue that, when guest using pagetable without vmexit when
EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory,
which would be inconsistent with host side and would cause host MCE due to
inconsistent cache attribute.
The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default
memory type to protect host (notice that all memory mapped by KVM should be WB).
Note the CommitDates! The AuthorDates strongly suggests Sheng Yang added
the whole "ignoreIGMT things as a bug fix for issues that were detected
during EPT + VT-d + passthrough enabling, but it was applied earlier
because it was a generic fix.
Jumping back to 0bed3b568b68 ("KVM: Improve MTRR structure"), the other
relevant code, or rather lack thereof, is the handling of *host* MMIO.
That fix came in a bit later, but given the author and timing, it's safe
to say it was all part of the same EPT+VT-d enabling mess.
commit 2aaf69dcee864f4fb6402638dd2f263324ac839f
Author: Sheng Yang <sheng@linux.intel.com>
AuthorDate: Wed Jan 21 16:52:16 2009 +0800
Commit: Avi Kivity <avi@redhat.com>
CommitDate: Sun Feb 15 02:47:37 2009 +0200
KVM: MMU: Map device MMIO as UC in EPT
Software are not allow to access device MMIO using cacheable memory type, the
patch limit MMIO region with UC and WC(guest can select WC using PAT and
PCD/PWT).
In addition to the host MMIO and IGMT issues, KVM's MTRR virtualization
was obviously never tested on NPT until much later, which lends further
credence to the theory/argument that this was all the result of
misdiagnosed issues.
Discussion from the EPT+MTRR enabling thread[*] more or less confirms that
Sheng Yang was trying to resolve issues with passthrough MMIO.
* Sheng Yang
: Do you mean host(qemu) would access this memory and if we set it to guest
: MTRR, host access would be broken? We would cover this in our shadow MTRR
: patch, for we encountered this in video ram when doing some experiment with
: VGA assignment.
And in the same thread, there's also what appears to be confirmation of
Intel running into issues with Windows XP related to a guest device driver
mapping DMA with WC in the PAT.
* Avi Kavity
: Sheng Yang wrote:
: > Yes... But it's easy to do with assigned devices' mmio, but what if guest
: > specific some non-mmio memory's memory type? E.g. we have met one issue in
: > Xen, that a assigned-device's XP driver specific one memory region as buffer,
: > and modify the memory type then do DMA.
: >
: > Only map MMIO space can be first step, but I guess we can modify assigned
: > memory region memory type follow guest's?
: >
:
: With ept/npt, we can't, since the memory type is in the guest's
: pagetable entries, and these are not accessible.
[*] https://lore.kernel.org/all/1223539317-32379-1-git-send-email-sheng@linux.intel.com
So, for the most part, what likely happened is that 15 years ago, a few
engineers (a) fixed a #MC problem by ignoring guest PAT and (b) initially
"fixed" passthrough device MMIO by emulating *guest* MTRRs. Except for
the below case, everything since then has been a result of those two
intertwined changes.
The one exception, which is actually yet more confirmation of all of the
above, is the revert of Paolo's attempt at "full" virtualization of guest
MTRRs:
commit 606decd67049217684e3cb5a54104d51ddd4ef35
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu Oct 1 13:12:47 2015 +0200
Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"
This reverts commit fd717f11015f673487ffc826e59b2bad69d20fe5.
It was reported to cause Machine Check Exceptions (bug 104091).
...
commit fd717f11015f673487ffc826e59b2bad69d20fe5
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue Jul 7 14:38:13 2015 +0200
KVM: x86: apply guest MTRR virtualization on host reserved pages
Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true.
However, the guest could prefer a different page type than UC for
such pages. A good example is that pass-throughed VGA frame buffer is
not always UC as host expected.
This patch enables full use of virtual guest MTRRs.
I.e. Paolo tried to add back KVM's behavior before "Map device MMIO as UC
in EPT" and got the same result: machine checks, likely due to the guest
MTRRs not being trustworthy/sane at all times.
Note, Paolo also tried to enable MTRR virtualization on SVM+NPT, but that
too got reverted. Unfortunately, it doesn't appear that anyone ever found
a smoking gun, i.e. exactly why emulating guest MTRRs via NPT PAT caused
extremely slow boot times doesn't appear to have a definitive root cause.
commit fc07e76ac7ffa3afd621a1c3858a503386a14281
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu Oct 1 13:20:22 2015 +0200
Revert "KVM: SVM: use NPT page attributes"
This reverts commit 3c2e7f7de3240216042b61073803b61b9b3cfb22.
Initializing the mapping from MTRR to PAT values was reported to
fail nondeterministically, and it also caused extremely slow boot
(due to caching getting disabled---bug 103321) with assigned devices.
...
commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue Jul 7 14:32:17 2015 +0200
KVM: SVM: use NPT page attributes
Right now, NPT page attributes are not used, and the final page
attribute depends solely on gPAT (which however is not synced
correctly), the guest MTRRs and the guest page attributes.
However, we can do better by mimicking what is done for VMX.
In the absence of PCI passthrough, the guest PAT can be ignored
and the page attributes can be just WB. If passthrough is being
used, instead, keep respecting the guest PAT, and emulate the guest
MTRRs through the PAT field of the nested page tables.
The only snag is that WP memory cannot be emulated correctly,
because Linux's default PAT setting only includes the other types.
In short, honoring guest MTRRs for VMX was initially a workaround of
sorts for KVM ignoring guest PAT *and* for KVM not forcing UC for host
MMIO. And while there *are* known cases where honoring guest MTRRs is
desirable, e.g. passthrough VGA frame buffers, the desired behavior in
that case is to get WC instead of UC, i.e. at this point it's for
performance, not correctness.
Furthermore, the complete absence of MTRR virtualization on NPT and
shadow paging proves that, while KVM theoretically can do better, it's
by no means necessary for correctnesss.
Lastly, since kernels mostly rely on firmware to do MTRR setup, and the
host typically provides guest firmware, honoring guest MTRRs is effectively
honoring *host* userspace memtypes, which is also backwards. I.e. it
would be far better for host userspace to communicate its desired memtype
directly to KVM (or perhaps indirectly via VMAs in the host kernel), not
through guest MTRRs.
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240309010929.1403984-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.
Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.
To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.
Reported-by: Vishal Annapurve <vannapurve@google.com>
Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Pull base x86 KVM support for running SEV-SNP guests from Michael Roth:
* add some basic infrastructure and introduces a new KVM_X86_SNP_VM
vm_type to handle differences versus the existing KVM_X86_SEV_VM and
KVM_X86_SEV_ES_VM types.
* implement the KVM API to handle the creation of a cryptographic
launch context, encrypt/measure the initial image into guest memory,
and finalize it before launching it.
* implement handling for various guest-generated events such as page
state changes, onlining of additional vCPUs, etc.
* implement the gmem/mmu hooks needed to prepare gmem-allocated pages
before mapping them into guest private memory ranges as well as
cleaning them up prior to returning them to the host for use as
normal memory. Because those cleanup hooks supplant certain
activities like issuing WBINVDs during KVM MMU invalidations, avoid
duplicating that work to avoid unecessary overhead.
This merge leaves out support support for attestation guest requests
and for loading the signing keys to be used for attestation requests.
Default halt_poll_ns_shrink value of 0 always resets polling interval
to 0 on an un-successful poll where vcpu wakeup is not received. This is
mostly to avoid pointless polling for more number of shorter intervals. But
disabled shrink assumes vcpu wakeup is less likely to be received in
subsequent shorter polling intervals. Another side effect of 0 shrink value
is that, even on a successful poll if total block time was greater than
current polling interval, the polling interval starts over from 0 instead
of shrinking by a factor.
Enabling shrink with value of 2 allows the polling interval to gradually
decrement in case of un-successful poll events as well. This gives a fair
chance for successful polling events in subsequent polling intervals rather
than resetting it to 0 and starting over from grow_start.
Below kvm stat log snippet shows interleaved growth and shrinking of
polling interval:
87162647182125: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
87162647637763: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
87162649627943: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
87162650892407: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
87162651540378: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
87162652276768: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
87162652515037: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
87162653383787: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
87162653627670: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (shrink 20000)
87162653796321: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
87162656171645: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (shrink 20000)
87162661607487: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
Having both grow and shrink enabled creates a balance in polling interval
growth and shrink behavior. Tests show improved successful polling attempt
ratio which contribute to VM performance. Power penalty is quite negligible
as shrunk polling intervals create bursts of very short durations.
Performance assessment results show 3-6% improvements in CPU+GPU, Memory
and Storage Android VM workloads whereas 5-9% improvement in average FPS of
gaming VM workloads.
Power penalty is below 1% where host OS is either idle or running a
native workload having 2 VMs enabled. CPU/GPU intensive gaming workloads
as well do not show any increased power overhead with shrink enabled.
Co-developed-by: Rajendran Jaishankar <jaishankar.rajendran@intel.com>
Signed-off-by: Rajendran Jaishankar <jaishankar.rajendran@intel.com>
Signed-off-by: Parshuram Sangle <parshuram.sangle@intel.com>
Link: https://lore.kernel.org/r/20231102154628.2120-2-parshuram.sangle@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Current documentation does not describe how Linux handles the synthetic
interrupt controller (synic) that Hyper-V provides to guest VMs, nor how
VMBus or timer interrupts are handled. Add text describing the synic and
reorganize existing text to make this more clear.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240511133818.19649-2-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240511133818.19649-2-mhklinux@outlook.com>
Update spelling from "VMbus" to "VMBus" to match Hyper-V product
documentation. Also correct typo: "SNP-SEV" should be "SEV-SNP".
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240511133818.19649-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240511133818.19649-1-mhklinux@outlook.com>
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings via
prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
- Install powerpc selftests in sub-directories. Note this changes the way
run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove events.
- Other small features, cleanups and fixes.
Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
/4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
kAflTXzsIXvxBA==
=GoCV
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest which stores the measurement of the guest at launch time.
Also extend the existing SNP firmware data structures to support
disabling the use of Versioned Chip Endorsement Keys (VCEK) by guests as
part of this command.
While finalizing the launch flow, the code also issues the LAUNCH_UPDATE
SNP firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU, which requires setting the RMP entries for those pages
to private, so also add handling to clean up the RMP entries for these
pages whening freeing vCPUs during shutdown.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-8-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.
Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-6-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Misc cleanups extracted from the "exit on missing userspace mapping" series,
which has been put on hold in anticipation of a "KVM Userfault" approach,
which should provide a superset of functionality.
- Remove kvm_make_all_cpus_request_except(), which got added to hack around an
AVIC bug, and then became dead code when a more robust fix came along.
- Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmY+oHQACgkQOlYIJqCj
N/3c/w//dmgqxFGpPoCvZ2+pVarrbpsMdfO5skaMF0EN1a0Rb0oJcVYj7z1zqsjQ
4DCCANxVrcEGVBZG5I8nhk1lDlGS7zOTOBBovgVDNj7wL9p/fzOhR6UlLKG5QMMn
0nWY9raC8ubcrtKgOm/qOtSgZrL9rEWh3QUK1FRPKaF12r1CLPmJIvVvpCm8t//f
YZrqpHj/JqXbc8V8toBHqvi3DaMIOA2gWRvjfwSWfCL+x7ZPYny3Q+nw9fl2fSR6
f6w1lB6VhyDudzscu4l7U4y5wI0LMmYhJ5p5tvQBB5qtbAJ7vpIUxxYh4CT/YdbH
WLQCIBr2wR0Mkl0g4FwNlnnt6a5Sa6V4nVKfzkl37L0Ucyu+SpP8t6YO4nb/dJmb
Sicx3qqeHC7N9Y9VVKzK3Kb33KVaBFawvzjIcc+GFXMDFZ35b33vWhYzTl3sJpLY
hjfGpYTB1zHSj6f7a9mW7d15E/lyfqMKCzewZWnko0hISM8Jm1LxU3PMFJa8TR2/
yB6IUDDJnEk6fSwUwaCluAJv3kfnhs/S3fMFw+5cYkcmgW7yaE+K9nJ3aEkx5l7x
9sXjAtc7zbAwEuJZ+5C1+CgwWGKsfLKtXbjqMYAIAYep5oa+UrJ4L77aZyTV1mSD
oRJs0LmNmachV5nxKFHAaijVc6vmZNhcD9ygbM5qeLGoGby+W8g=
=dgM4
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-generic-6.10' of https://github.com/kvm-x86/linux into HEAD
KVM cleanups for 6.10:
- Misc cleanups extracted from the "exit on missing userspace mapping" series,
which has been put on hold in anticipation of a "KVM Userfault" approach,
which should provide a superset of functionality.
- Remove kvm_make_all_cpus_request_except(), which got added to hack around an
AVIC bug, and then became dead code when a more robust fix came along.
- Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
- Move a lot of state that was previously stored on a per vcpu
basis into a per-CPU area, because it is only pertinent to the
host while the vcpu is loaded. This results in better state
tracking, and a smaller vcpu structure.
- Add full handling of the ERET/ERETAA/ERETAB instructions in
nested virtualisation. The last two instructions also require
emulating part of the pointer authentication extension.
As a result, the trap handling of pointer authentication has
been greattly simplified.
- Turn the global (and not very scalable) LPI translation cache
into a per-ITS, scalable cache, making non directly injected
LPIs much cheaper to make visible to the vcpu.
- A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
- Allocate PPIs and SGIs outside of the vcpu structure, allowing
for smaller EL2 mapping and some flexibility in implementing
more or less than 32 private IRQs.
- Purge stale mpidr_data if a vcpu is created after the MPIDR
map has been created.
- Preserve vcpu-specific ID registers across a vcpu reset.
- Various minor cleanups and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmY/PT4ACgkQI9DQutE9
ekNwSA/7BTro0n5gP5/SfSFJeEedigpmHQJtHJk9og0LBzjXZTvYqKpI5J1HnpWE
AFsDf3aDRPaSCvI+S14LkkK+TmGtVEXUg8YGytQo08IcO2x6xBT/YjpkVOHy23kq
SGgNMPNUH2sycb7hTcz9Z/V0vBeYwFzYEAhmpvtROvmaRd8ZIyt+ofcclwUZZAQ2
SolOXR2d+ynCh8ZCOexqyZ67keikW1NXtW5aNWWFc6S6qhmcWdaWJGDcSyHauFac
+YuHjPETJYh7TNpwYTmKclRh1fk/CgA/e+r71Hlgdkg+DGCyVnEZBQxqMi6GTzNC
dzy3qhTtRT61SR54q55yMVIC3o6uRSkht+xNg1Nd+UghiqGKAtoYhvGjduodONW2
1Eas6O+vHipu98HgFnkJRPlnF1HR3VunPDwpzIWIZjK0fIXEfrWqCR3nHFaxShOR
dniTEPfELguxOtbl3jCZ+KHCIXueysczXFlqQjSDkg/P1l0jKBgpkZzMPY2mpP1y
TgjipfSL5gr1GPdbrmh4WznQtn5IYWduKIrdEmSBuru05OmBaCO4geXPUwL4coHd
O8TBnXYBTN/z3lORZMSOj9uK8hgU1UWmnOIkdJ4YBBAL8DSS+O+KtCRkHQP0ghl+
whl0q1SWTu4LtOQzN5CUrhq9Tge11erEt888VyJbBJmv8x6qJjE=
=CEfD
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 6.10
- Move a lot of state that was previously stored on a per vcpu
basis into a per-CPU area, because it is only pertinent to the
host while the vcpu is loaded. This results in better state
tracking, and a smaller vcpu structure.
- Add full handling of the ERET/ERETAA/ERETAB instructions in
nested virtualisation. The last two instructions also require
emulating part of the pointer authentication extension.
As a result, the trap handling of pointer authentication has
been greattly simplified.
- Turn the global (and not very scalable) LPI translation cache
into a per-ITS, scalable cache, making non directly injected
LPIs much cheaper to make visible to the vcpu.
- A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
- Allocate PPIs and SGIs outside of the vcpu structure, allowing
for smaller EL2 mapping and some flexibility in implementing
more or less than 32 private IRQs.
- Purge stale mpidr_data if a vcpu is created after the MPIDR
map has been created.
- Preserve vcpu-specific ID registers across a vcpu reset.
- Various minor cleanups and improvements.
The GHCB protocol version may be different from one guest to the next.
Add a field to track it for each KVM instance and extend KVM_SEV_INIT2
to allow it to be configured by userspace.
Now that all SEV-ES support for GHCB protocol version 2 is in place, go
ahead and default to it when creating SEV-ES guests through the new
KVM_SEV_INIT2 interface. Keep the older KVM_SEV_ES_INIT interface
restricted to GHCB protocol version 1.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501071048.2208265-5-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation mentions KVM_CAP_PPC_RADIX_MMU, but the defines in the
kvm headers spell it KVM_CAP_PPC_MMU_RADIX. Similarly with
KVM_CAP_PPC_MMU_HASH_V3.
Fixes: c92701322711 ("KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230411061446.26324-1-joel@jms.id.au
The KVM_CREATE_GUEST_MEMFD ioctl returns a file descriptor, and is
documented as such in the description. However, the "Returns" field
in the documentation states that the ioctl returns 0 on success.
Update this to match the description.
Signed-off-by: Carlos López <clopez@suse.de>
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Link: https://lore.kernel.org/r/20240424103317.28522-1-clopez@suse.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
* kvm-arm64/pkvm-6.10: (25 commits)
: .
: At last, a bunch of pKVM patches, courtesy of Fuad Tabba.
: From the cover letter:
:
: "This series is a bit of a bombay-mix of patches we've been
: carrying. There's no one overarching theme, but they do improve
: the code by fixing existing bugs in pKVM, refactoring code to
: make it more readable and easier to re-use for pKVM, or adding
: functionality to the existing pKVM code upstream."
: .
KVM: arm64: Force injection of a data abort on NISV MMIO exit
KVM: arm64: Restrict supported capabilities for protected VMs
KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap()
KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst
KVM: arm64: Rename firmware pseudo-register documentation file
KVM: arm64: Reformat/beautify PTP hypercall documentation
KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit
KVM: arm64: Introduce and use predicates that check for protected VMs
KVM: arm64: Add is_pkvm_initialized() helper
KVM: arm64: Simplify vgic-v3 hypercalls
KVM: arm64: Move setting the page as dirty out of the critical section
KVM: arm64: Change kvm_handle_mmio_return() return polarity
KVM: arm64: Fix comment for __pkvm_vcpu_init_traps()
KVM: arm64: Prevent kmemleak from accessing .hyp.data
KVM: arm64: Do not map the host fpsimd state to hyp in pKVM
KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE
KVM: arm64: Support TLB invalidation in guest context
KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE
KVM: arm64: Check for PTE validity when checking for executable/cacheable
KVM: arm64: Avoid BUG-ing from the host abort path
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.
However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.
This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.
Finally, document this tweak to the API.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-31-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.
Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-28-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.
Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-27-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.
Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-26-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The idea that no parameter would ever be necessary when enabling SEV or
SEV-ES for a VM was decidedly optimistic. In fact, in some sense it's
already a parameter whether SEV or SEV-ES is desired. Another possible
source of variability is the desired set of VMSA features, as that affects
the measurement of the VM's initial state and cannot be changed
arbitrarily by the hypervisor.
Create a new sub-operation for KVM_MEMORY_ENCRYPT_OP that can take a struct,
and put the new op to work by including the VMSA features as a field of the
struct. The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of
supported VMSA features for backwards compatibility.
The struct also includes the usual bells and whistles for future
extensibility: a flags field that must be zero for now, and some padding
at the end.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240404121327.3107131-13-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>