IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
1GfXkXs=
=gt6P
-----END PGP SIGNATURE-----
Merge tag 'v6.1-rc6' into x86/core, to resolve conflicts
Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c:
# upstream:
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
# retbleed work in x86/core:
5d8213864ade ("x86/retbleed: Add SKL return thunk")
... and these commits in include/linux/bpf.h:
# upstram:
18acb7fac22f ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")")
# x86/core commits:
931ab63664f0 ("x86/ibt: Implement FineIBT")
bea75b33895f ("x86/Kconfig: Introduce function padding")
The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream.
Conflicts:
arch/x86/kernel/asm-offsets.c
include/linux/bpf.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES.
This is different from how other CPU features are enumerated i.e. via
CPUID. Currently, a call to tsx_ctrl_is_supported() is required for
enumerating the feature. In the absence of a feature bit for TSX control,
any code that relies on checking feature bits directly will not work.
In preparation for adding a feature bit check in MSR save/restore
during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when
MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the
new feature bit to avoid any overhead of reading the MSR.
[ bp: Remove tsx_ctrl_is_supported(), add room for two more feature
bits in word 11 which are coming up in the next merge window. ]
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
Existing implementation (broken) of IFS used a header format (for IFS
test images) which was very similar to microcode format, but didn’t
accommodate extended signatures. This meant same IFS test image had to
be duplicated for different steppings and the validation code in the
driver was only looking at the primary header parameters. Going forward,
IFS test image headers have been tweaked to become fully compatible with
the microcode format.
Newer IFS test image headers will use header version 2 in order to
distinguish it from microcode images and older IFS test images.
In light of the above, reuse struct microcode_header_intel directly in
the IFS driver and reuse microcode functions for validation and sanity
checking.
[ bp: Massage commit message. ]
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20221117225039.30166-1-jithu.joseph@intel.com
Intel is using microcode file format for IFS test images too.
IFS test images use one of the existing reserved fields in microcode
header to indicate the size of the region in the file allocated for
metadata structures.
In preparation for this, rename first of the existing reserved fields
in microcode header to metasize. In subsequent patches IFS specific
code will make use of this field while parsing IFS images.
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20221117035935.4136738-10-jithu.joseph@intel.com
IFS test images and microcode blobs use the same header format.
Microcode blobs use header type of 1, whereas IFS test images
will use header type of 2.
In preparation for IFS reusing intel_microcode_sanity_check(),
add header type as a parameter for sanity check.
[ bp: Touchups. ]
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20221117035935.4136738-9-jithu.joseph@intel.com
IFS test image carries the same microcode header as regular Intel
microcode blobs.
Reuse microcode_sanity_check() in the IFS driver to perform sanity check
of the IFS test images too.
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20221117035935.4136738-8-jithu.joseph@intel.com
IFS uses test images provided by Intel that can be regarded as firmware.
An IFS test image carries microcode header with an extended signature
table.
Reuse find_matching_signature() for verifying if the test image header
or the extended signature table indicate whether that image is fit to
run on a system.
No functional changes.
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20221117035935.4136738-6-jithu.joseph@intel.com
Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
both in extended 'nested controls' in VMCB and VP assist page.
According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
- HV_SVM_EXITCODE_ENL exit_code.
- HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.
Note: VP assist page is cached in 'struct kvm_vcpu_hv' so
recalc_intercepts() doesn't need to read from guest's memory. KVM
needs to update the case upon each VMRUN and after svm_set_nested_state
(svm_get_nested_state_pages()) to handle the case when the guest got
migrated while L2 was running.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-29-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable L2 TLB flush feature on nVMX when:
- Enlightened VMCS is in use.
- The feature flag is enabled in eVMCS.
- The feature flag is enabled in partition assist page.
Perform synthetic vmexit to L1 after processing TLB flush call upon
request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).
Note: nested_evmcs_l2_tlb_flush_enabled() uses cached VP assist page copy
which gets updated from nested_vmx_handle_enlightened_vmptrld(). This is
also guaranteed to happen post migration with eVMCS backed L2 running.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-27-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In preparation to enabling L2 TLB flush, cache VP assist page in
'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry()
to nested_get_evmptr() and make it return eVMCS GPA directly.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-26-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
also needed to handle post-flush exit to L1 upon request.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-20-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.
Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2
requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings,
i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so
can remain an on-stack allocation.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-19-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To handle L2 TLB flush requests, KVM needs to use a separate fifo from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.
No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-18-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.
Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-13-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To allow flushing individual GVAs instead of always flushing the whole
VPID a per-vCPU structure to pass the requests is needed. Use standard
'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095
following GFNs in the lower 12 bits) and 'flush all'.
The size of the fifo is arbitrarily set to '16'.
Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and
kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the
queue before returning -EOPNOTSUPP (which triggers full TLB flush) so
the functional change is very small but the infrastructure is prepared
to handle individual GVA flush requests.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-10-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
request in kvm_vcpu_flush_tlb_guest().
The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest().
No functional change intended.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent,
rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change
eliminates the use of confusing 'direct' and adds the missing underscore.
No functional change.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that KVM isn't littered with "struct hv_enlightenments" casts, rename
the struct to "hv_vmcb_enlightenments" to highlight the fact that the
struct is specifically for SVM's VMCB.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a union to provide hv_enlightenments side-by-side with the sw_reserved
bytes that Hyper-V's enlightenments overlay. Casting sw_reserved
everywhere is messy, confusing, and unnecessarily unsafe.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the
definitions come directly from the TLFS[*], not from KVM.
No functional change intended.
[*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields
[vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of
hyperv-tlfs.h, keep svm/hyperv.h]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.
This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly
necessary in the core ftrace code.
This patch changes the prototype of arch_ftrace_set_direct_caller() to
take ftrace_regs rather than pt_regs, and moves the extraction of the
pt_regs into arch_ftrace_set_direct_caller().
On x86, arch_ftrace_set_direct_caller() can be used even when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines
struct ftrace_regs. Due to this, it's necessary to define
arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete
type. I've also moved the body of arch_ftrace_set_direct_caller() after
the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct
ftrace_regs.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The EFI runtime map code is only wired up on x86, which is the only
architecture that has a need for it in its implementation of kexec.
So let's move this code under arch/x86 and drop all references to it
from generic code. To ensure that the efi_runtime_map_init() is invoked
at the appropriate time use a 'sync' subsys_initcall() that will be
called right after the EFI initcall made from generic code where the
original invocation of efi_runtime_map_init() resided.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Dave Young <dyoung@redhat.com>
The EFI memory map is a description of the memory layout as provided by
the firmware, and only x86 manipulates it in various different ways for
its own memory bookkeeping. So let's move the memmap routines that are
only used by x86 into the x86 arch tree.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The EFI fake memmap support is specific to x86, which manipulates the
EFI memory map in various different ways after receiving it from the EFI
stub. On other architectures, we have managed to push back on this, and
the EFI memory map is kept pristine.
So let's move the fake memmap code into the x86 arch tree, where it
arguably belongs.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Now that we have support for calling protocols that need additional
marshalling for mixed mode, wire up the initrd command line loader.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rework the EFI stub macro wrappers around protocol method calls and
other indirect calls in order to allow return types other than
efi_status_t. This means the widening should be conditional on whether
or not the return type is efi_status_t, and should be omitted otherwise.
Also, switch to _Generic() to implement the type based compile time
conditionals, which is more concise, and distinguishes between
efi_status_t and u64 properly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The RNG always mixes in the Linux version extremely early in boot. It
also always includes a cycle counter, not only during early boot, but
each and every time it is invoked prior to being fully initialized.
Together, this means that the use of additional xors inside of the
various stackprotector.h files is superfluous and over-complicated.
Instead, we can get exactly the same thing, but better, by just calling
`get_random_canary()`.
Acked-by: Guo Ren <guoren@kernel.org> # for csky
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
To support TDX attestation, the TDX guest driver exposes an IOCTL
interface to allow userspace to get the TDREPORT0 (a.k.a. TDREPORT
subtype 0) from the TDX module via TDG.MR.TDREPORT TDCALL.
In order to get the TDREPORT0 in the TDX guest driver, instead of using
a low level function like __tdx_module_call(), add a
tdx_mcall_get_report0() wrapper function to handle it.
This is a preparatory patch for adding attestation support.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/all/20221116223820.819090-2-sathyanarayanan.kuppuswamy%40linux.intel.com
Now that the PCI/MSI core code does early checking for multi-MSI support
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.
Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
What a zoo:
PCI_MSI
select GENERIC_MSI_IRQ
PCI_MSI_IRQ_DOMAIN
def_bool y
depends on PCI_MSI
select GENERIC_MSI_IRQ_DOMAIN
Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
just an indirection to PCI_MSI.
Match the reality and just admit that PCI_MSI requires
GENERIC_MSI_IRQ_DOMAIN.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de
clocksource/hyperv_timer.h is included into the VDSO build. It includes
asm/mshyperv.h which in turn includes the world and some more. This worked
so far by chance, but any subtle change in the include chain results in a
build breakage because VDSO builds are building user space libraries.
Include asm/hyperv-tlfs.h instead which contains everything what the VDSO
build needs except the hv_get_raw_timer() define. Move this define into a
separate header file, which contains the prerequisites (msr.h) and is
included by clocksource/hyperv_timer.h.
Fixup drivers/hv/vmbus_drv.c which relies on the indirect include of
asm/mshyperv.h.
With that the VDSO build only pulls in the minimum requirements.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/87fsemtut0.ffs@tglx
Use the preferred BIT() and BIT_ULL() to construct the PFERR masks
rather than open-coding the bit shifting.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-6-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When compiling linux 6.1.0-rc3 configured with CONFIG_64BIT=y and
CONFIG_PARAVIRT_SPINLOCKS=y on x86_64 using LLVM 11.0, an error:
"<inline asm> error: changed section flags for .spinlock.text,
expected:: 0x6" occurred.
The reason is the .spinlock.text in kernel/locking/qspinlock.o
is used many times, but its flags are omitted in subsequent use.
LLVM 11.0 assembler didn't permit to
leave out flags in subsequent uses of the same sections.
So this patch adds the corresponding flags to avoid above error.
Fixes: 501f7f69bca1 ("locking: Add __lockfunc to slow path functions")
Signed-off-by: Guo Jin <guoj17@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221108060126.2505-1-guoj17@chinatelecom.cn
Include percpu.h to pick up the definition of DECLARE_PER_CPU() and
friends instead of relying on the parent to provide the #include. E.g.
swapping the order of includes in arch/x86/kvm/vmx/nested.c (simulating
KVM code movement being done for other purposes) results in build errors:
In file included from arch/x86/kvm/vmx/nested.c:3:
arch/x86/include/asm/debugreg.h:9:32: error: unknown type name âcpu_dr7â=99
9 | DECLARE_PER_CPU(unsigned long, cpu_dr7);
| ^~~~~~~
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110201707.1976032-1-seanjc@google.com
DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.
Unify and correct naming while at it.
Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.
arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12: \
warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
[-Wmissing-prototypes] \
int __weak kvm_cpu_dirty_log_size(void)
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-3-gshan@redhat.com
Instead of explicitly calling cache_ap_init() in
identify_secondary_cpu() use a CPU hotplug callback instead. By
registering the callback only after having started the non-boot CPUs
and initializing cache_aps_delayed_init with "true", calling
set_cache_aps_delayed_init() at boot time can be dropped.
It should be noted that this change results in cache_ap_init() being
called a little bit later when hotplugging CPUs. By using a new
hotplug slot right at the start of the low level bringup this is not
problematic, as no operations requiring a specific caching mode are
performed that early in CPU initialization.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-15-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Today, PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as a Xen PV guest which doesn't support
MTRR.
The reason for this coupling is that both PAT MSR changes and MTRR
changes require a similar sequence and so full PAT support was added
using the already available MTRR handling.
Xen PV PAT handling can work without MTRR, as it just needs to consume
the PAT MSR setting done by the hypervisor without the ability and need
to change it. This in turn has resulted in a convoluted initialization
sequence and wrong decisions regarding cache mode availability due to
misguiding PAT availability flags.
Fix all of that by allowing to use PAT without MTRR and by reworking
the current PAT initialization sequence to match better with the newly
introduced generic cache initialization.
This removes the need of the recently added pat_force_disabled flag, so
remove the remnants of the patch adding it.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-14-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Instead of having a stop_machine() handler for either a specific
MTRR register or all state at once, add a handler just for calling
cache_cpu_init() if appropriate.
Add functions for calling stop_machine() with this handler as well.
Add a generic replacement for mtrr_bp_restore() and a wrapper for
mtrr_bp_init().
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-13-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
In order to prepare decoupling MTRR and PAT replace the MTRR-specific
mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init
one.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-12-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add a main cache_cpu_init() init routine which initializes MTRR and/or
PAT support depending on what has been detected on the system.
Leave the MTRR-specific initialization in a MTRR-specific init function
where the smp_changes_mask setting happens now with caches disabled.
This global mask update was done with caches enabled before probably
because atomic operations while running uncached might have been quite
expensive.
But since only systems with a broken BIOS should ever require to set any
bit in smp_changes_mask, hurting those devices with a penalty of a few
microseconds during boot shouldn't be a real issue.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-8-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Split the MTRR-specific actions from cache_disable() and cache_enable()
into new functions mtrr_disable() and mtrr_enable().
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-6-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Rename the currently MTRR-specific functions prepare_set() and
post_set() in preparation to move them. Make them non-static and put
their prototypes into cacheinfo.h, where they will end after moving them
to their final position anyway.
Expand the comment before the functions with an introductory line and
rename two related static variables, too.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-5-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
In MTRR code use_intel() is only used in one source file, and the
relevant use_intel_if member of struct mtrr_ops is set only in
generic_mtrr_ops.
Replace use_intel() with a single flag in cacheinfo.c which can be
set when assigning generic_mtrr_ops to mtrr_if. This allows to drop
use_intel_if from mtrr_ops, while preparing to decouple PAT from MTRR.
As another preparation for the PAT/MTRR decoupling use a bit for MTRR
control and one for PAT control. For now set both bits together, this
can be changed later.
As the new flag will be set only if mtrr_enabled is set, the test for
mtrr_enabled can be dropped at some places.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221102074713.21493-4-jgross@suse.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Defer reprogramming counters and handling overflow via KVM_REQ_PMU
when incrementing counters. KVM skips emulated WRMSR in the VM-Exit
fastpath, the fastpath runs with IRQs disabled, skipping instructions
can increment and reprogram counters, reprogramming counters can
sleep, and sleeping is disallowed while IRQs are disabled.
[*] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
[*] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2981888, name: CPU 15/KVM
[*] preempt_count: 1, expected: 0
[*] RCU nest depth: 0, expected: 0
[*] INFO: lockdep is turned off.
[*] irq event stamp: 0
[*] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[*] hardirqs last disabled at (0): [<ffffffff8121222a>] copy_process+0x146a/0x62d0
[*] softirqs last enabled at (0): [<ffffffff81212269>] copy_process+0x14a9/0x62d0
[*] softirqs last disabled at (0): [<0000000000000000>] 0x0
[*] Preemption disabled at:
[*] [<ffffffffc2063fc1>] vcpu_enter_guest+0x1001/0x3dc0 [kvm]
[*] CPU: 17 PID: 2981888 Comm: CPU 15/KVM Kdump: 5.19.0-rc1-g239111db364c-dirty #2
[*] Call Trace:
[*] <TASK>
[*] dump_stack_lvl+0x6c/0x9b
[*] __might_resched.cold+0x22e/0x297
[*] __mutex_lock+0xc0/0x23b0
[*] perf_event_ctx_lock_nested+0x18f/0x340
[*] perf_event_pause+0x1a/0x110
[*] reprogram_counter+0x2af/0x1490 [kvm]
[*] kvm_pmu_trigger_event+0x429/0x950 [kvm]
[*] kvm_skip_emulated_instruction+0x48/0x90 [kvm]
[*] handle_fastpath_set_msr_irqoff+0x349/0x3b0 [kvm]
[*] vmx_vcpu_run+0x268e/0x3b80 [kvm_intel]
[*] vcpu_enter_guest+0x1d22/0x3dc0 [kvm]
Add a field to kvm_pmc to track the previous counter value in order
to defer overflow detection to kvm_pmu_handle_event() (the counter must
be paused before handling overflow, and that may increment the counter).
Opportunistically shrink sizeof(struct kvm_pmc) a bit.
Suggested-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-6-likexu@tencent.com
[sean: avoid re-triggering KVM_REQ_PMU on overflow, tweak changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus
deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming
a counter multiple times during a single VM-Exit.
Deferring programming will also allow KVM to fix a bug where immediately
reprogramming a counter can result in sleeping (taking a mutex) while
interrupts are disabled in the VM-Exit fastpath.
Introduce kvm_pmu_request_counter_reprogam() to make it obvious that
KVM is _requesting_ a reprogram and not actually doing the reprogram.
Opportunistically refine related comments to avoid misunderstandings.
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>