linux

iv/linux

Author	SHA1	Message	Date
Sean Christopherson	2bb8cafea8	KVM: vVMX: signal failure for nested VMEntry if emulation_required Fail a nested VMEntry with EXIT_REASON_INVALID_STATE if L2 guest state is invalid, i.e. vmcs12 contained invalid guest state, and unrestricted guest is disabled in L0 (and by extension disabled in L1). WARN_ON_ONCE in handle_invalid_guest_state() if we're attempting to emulate L2, i.e. nested_run_pending is true, to aid debug in the (hopefully unlikely) scenario that we somehow skip the nested VMEntry consistency check, e.g. due to a L0 bug. Note: KVM relies on hardware to detect the scenario where unrestricted guest is enabled in L0 but disabled in L1 and vmcs12 contains invalid guest state, i.e. checking emulation_required in prepare_vmcs02 is required only to handle the case were unrestricted guest is disabled in L0 since L0 never actually attempts VMLAUNCH/VMRESUME with vmcs02. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-16 22:01:38 +01:00
Sean Christopherson	e1de91ccab	KVM: VMX: WARN on a MOV CR3 exit w/ unrestricted guest CR3 load/store exiting are always off when unrestricted guest is enabled. WARN on the associated CR3 VMEXIT to detect code that would re-introduce CR3 load/store exiting for unrestricted guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:37 +01:00
Sean Christopherson	b4d185175b	KVM: VMX: give unrestricted guest full control of CR3 Now CR3 is not forced to a host-controlled value when paging is disabled in an unrestricted guest, CR3 load/store exiting can be left disabled (for an unrestricted guest). And because CR0.WP and CR4.PAE/PSE are also not force to host-controlled values, all of ept_update_paging_mode_cr0() is no longer needed, i.e. skip ept_update_paging_mode_cr0() for an unrestricted guest. Because MOV CR3 no longer exits when paging is disabled for an unrestricted guest, vmx_decache_cr3() must always read GUEST_CR3 from the VMCS for an unrestricted guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:36 +01:00
Sean Christopherson	5dc1f044a3	KVM: VMX: don't force CR4.PAE/PSE for unrestricted guest CR4.PAE - Unrestricted guest can only be enabled when EPT is enabled, and vmx_set_cr4() clears hardware CR0.PAE based on the guest's CR4.PAE, i.e. CR4.PAE always follows the guest's value when unrestricted guest is enabled. CR4.PSE - Unrestricted guest no longer uses the identity mapped IA32 page tables since CR0.PG can be cleared in hardware, thus there is no need to set CR4.PSE when paging is disabled in the guest (and EPT is enabled). Define KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST (to X86_CR4_VMXE) and use it in lieu of KVM_*MODE_VM_CR4_ALWAYS_ON when unrestricted guest is enabled, which removes the forcing of CR4.PAE. Skip the manipulation of CR4.PAE/PSE for EPT when unrestricted guest is enabled, as CR4.PAE isn't forced and so doesn't need to be manually cleared, and CR4.PSE does not need to be set when paging is disabled since the identity mapped IA32 page tables are not used. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:35 +01:00
Sean Christopherson	1706bd0c02	KVM: VMX: remove CR0.WP from ..._ALWAYS_ON_UNRESTRICTED_GUEST Unrestricted guest can only be enabled when EPT is enabled, and when EPT is enabled, ept_update_paging_mode_cr0() will clear hardware CR0.WP based on the guest's CR0.WP, i.e. CR0.WP always follows the guest's value when unrestricted guest is enabled. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:35 +01:00
Sean Christopherson	e90008df16	KVM: VMX: don't configure EPT identity map for unrestricted guest An unrestricted guest can run with hardware CR0.PG==0, i.e. IA32 paging disabled, in which case there is no need to load the guest's CR3 with identity mapped IA32 page tables since hardware will effectively ignore CR3. If unrestricted guest is enabled, don't configure the identity mapped IA32 page table and always load the guest's desired CR3. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:34 +01:00
Sean Christopherson	f7eaeb0ad8	KVM: VMX: don't configure RM TSS for unrestricted guest An unrestricted guest can run with CR0.PG==0 and/or CR0.PE==0, e.g. it can run in Real Mode without requiring host emulation. The RM TSS is only used for emulating RM, i.e. it will never be used when unrestricted guest is enabled and so doesn't need to be configured. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:33 +01:00
Vitaly Kuznetsov	915e6f78bd	x86/kvm/hyper-v: inject #GP only when invalid SINTx vector is unmasked Hyper-V 2016 on KVM with SynIC enabled doesn't boot with the following trace: kvm_entry: vcpu 0 kvm_exit: reason MSR_WRITE rip 0xfffff8000131c1e5 info 0 0 kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000090 data 0x10000 host 0 kvm_msr: msr_write 40000090 = 0x10000 (#GP) kvm_inj_exception: #GP (0x0) KVM acts according to the following statement from TLFS: " 11.8.4 SINTx Registers ... Valid values for vector are 16-255 inclusive. Specifying an invalid vector number results in #GP. " However, I checked and genuine Hyper-V doesn't #GP when we write 0x10000 to SINTx. I checked with Microsoft and they confirmed that if either the Masked bit (bit 16) or the Polling bit (bit 18) is set to 1, then they ignore the value of Vector. Make KVM act accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:33 +01:00
Vitaly Kuznetsov	98f65ad458	x86/kvm/hyper-v: remove stale entries from vec_bitmap/auto_eoi_bitmap on vector change When a new vector is written to SINx we update vec_bitmap/auto_eoi_bitmap but we forget to remove old vector from these masks (in case it is not present in some other SINTx). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:32 +01:00
Vitaly Kuznetsov	a2e164e7f4	x86/kvm/hyper-v: add reenlightenment MSRs support Nested Hyper-V/Windows guest running on top of KVM will use TSC page clocksource in two cases: - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]). - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]). Exposing invariant TSC effectively blocks migration to hosts with different TSC frequencies, providing reenlightenment support will be needed when we start migrating nested workloads. Implement rudimentary support for reenlightenment MSRs. For now, these are just read/write MSRs with no effect. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:31 +01:00
KarimAllah Ahmed	ddd6f0e94d	KVM: x86: Update the exit_qualification access bits while walking an address ... to avoid having a stale value when handling an EPT misconfig for MMIO regions. MMIO regions that are not passed-through to the guest are handled through EPT misconfigs. The first time a certain MMIO page is touched it causes an EPT violation, then KVM marks the EPT entry to cause an EPT misconfig instead. Any subsequent accesses to the entry will generate an EPT misconfig. Things gets slightly complicated with nested guest handling for MMIO regions that are not passed through from L0 (i.e. emulated by L0 user-space). An EPT violation for one of these MMIO regions from L2, exits to L0 hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and realize it is not present (or not sufficient to serve the request). Then L0 injects an EPT violation to L1. L1 would then update its EPT mappings. The EXIT_QUALIFICATION value for L1 would come from exit_qualification variable in "struct vcpu". The problem is that this variable is only updated on EPT violation and not on EPT misconfig. So if an EPT violation because of a read happened first, then an EPT misconfig because of a write happened afterwards. The L0 hypervisor will still contain exit_qualification value from the previous read instead of the write and end up injecting an EPT violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION. The EPT violation that is injected from L0 to L1 needs to have the correct EXIT_QUALIFICATION specially for the access bits because the individual access bits for MMIO EPTs are updated only on actual access of this specific type. So for the example above, the L1 hypervisor will keep updating only the read bit in the EPT then resume the L2 guest. The L2 guest would end up causing another exit where the L0 again will inject another EPT violation to L1 hypervisor with again an out of date exit_qualification which indicates a read and not a write. Then this ping-pong just keeps happening without making any forward progress. The behavior of mapping MMIO regions changed in: commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)") ... where an EPT violation for a read would also fixup the write bits to avoid another EPT violation which by acciddent would fix the bug mentioned above. This commit fixes this situation and ensures that the access bits for the exit_qualifcation is up to date. That ensures that even L1 hypervisor running with a KVM version before the commit mentioned above would still work. ( The description above assumes EPT to be available and used by L1 hypervisor + the L1 hypervisor is passing through the MMIO region to the L2 guest while this MMIO region is emulated by the L0 user-space ). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:30 +01:00
Matthias Kaehlcke	1df372f473	KVM: x86: Make enum conversion explicit in kvm_pdptr_read() The type 'enum kvm_reg_ex' is an extension of 'enum kvm_reg', however the extension is only semantical and the compiler doesn't know about the relationship between the two types. In kvm_pdptr_read() a value of the extended type is passed to kvm_x86_ops->cache_reg(), which expects a value of the base type. Clang raises the following warning about the type mismatch: arch/x86/kvm/kvm_cache_regs.h:44:32: warning: implicit conversion from enumeration type 'enum kvm_reg_ex' to different enumeration type 'enum kvm_reg' [-Wenum-conversion] kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR); Cast VCPU_EXREG_PDPTR to 'enum kvm_reg' to make the compiler happy. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:30 +01:00
Vitaly Kuznetsov	0bcc3fb95b	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use Devices which use level-triggered interrupts under Windows 2016 with Hyper-V role enabled don't work: Windows disables EOI broadcast in SPIV unconditionally. Our in-kernel IOAPIC implementation emulates an old IOAPIC version which has no EOI register so EOI never happens. The issue was discovered and discussed a while ago: https://www.spinics.net/lists/kvm/msg148098.html While this is a guest OS bug (it should check that IOAPIC has the required capabilities before disabling EOI broadcast) we can workaround it in KVM: advertising DIRECTED_EOI with in-kernel IOAPIC makes little sense anyway. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:29 +01:00
Janakarajan Natarajan	c51eb52b8f	KVM: x86: Add support for AMD Core Perf Extension in guest Add support for AMD Core Performance counters in the guest. The base event select and counter MSRs are changed. In addition, with the core extension, there are 2 extra counters available for performance measurements for a total of 6. With the new MSRs, the logic to map them to the gp_counters[] is changed. New functions are added to check the validity of the get/set MSRs. If the guest has the X86_FEATURE_PERFCTR_CORE cpuid flag set, the number of counters available to the vcpu is set to 6. It the flag is not set then it is 4. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> [Squashed "Expose AMD Core Perf Extension flag to guests" - Radim.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:28 +01:00
Janakarajan Natarajan	e84b7119e8	x86/msr: Add AMD Core Perf Extension MSRs Add the EventSelect and Counter MSRs for AMD Core Perf Extension. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:17 +01:00
Borislav Petkov	bb8c13d61a	x86/microcode: Fix CPU synchronization routine Emanuel reported an issue with a hang during microcode update because my dumb idea to use one atomic synchronization variable for both rendezvous - before and after update - was simply bollocks: microcode: microcode_reload_late: late_cpus: 4 microcode: __reload_late: cpu 2 entered microcode: __reload_late: cpu 1 entered microcode: __reload_late: cpu 3 entered microcode: __reload_late: cpu 0 entered microcode: __reload_late: cpu 1 left microcode: Timeout while waiting for CPUs rendezvous, remaining: 1 CPU1 above would finish, leave and the others will still spin waiting for it to join. So do two synchronization atomics instead, which makes the code a lot more straightforward. Also, since the update is serialized and it also takes quite some time per microcode engine, increase the exit timeout by the number of CPUs on the system. That's ok because the moment all CPUs are done, that timeout will be cut short. Furthermore, panic when some of the CPUs timeout when returning from a microcode update: we can't allow a system with not all cores updated. Also, as an optimization, do not do the exit sync if microcode wasn't updated. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de	2018-03-16 20:55:51 +01:00
Borislav Petkov	2613f36ed9	x86/microcode: Attempt late loading only when new microcode is present Return UCODE_NEW from the scanning functions to denote that new microcode was found and only then attempt the expensive synchronization dance. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-1-bp@alien8.de	2018-03-16 20:55:51 +01:00
Peter Zijlstra	edb39592a5	perf: Fix sibling iteration Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net	2018-03-16 20:44:12 +01:00
Rajvi Jingar	fc804f65d4	x86/tsc: Convert ART in nanoseconds to TSC Device drivers use get_device_system_crosststamp() to produce precise system/device cross-timestamps. The PHC clock and ALSA interfaces, for example, make the cross-timestamps available to user applications. On Intel platforms, get_device_system_crosststamp() requires a TSC value derived from ART (Always Running Timer) to compute the monotonic raw and realtime system timestamps. Starting with Intel Goldmont platforms, the PCIe root complex supports the PTM time sync protocol. PTM requires all timestamps to be in units of nanoseconds. The Intel root complex hardware propagates system time derived from ART in units of nanoseconds performing the conversion as follows: ART_NS = ART * 1e9 / <crystal frequency> When user software requests a cross-timestamp, the system timestamps (generally read from device registers) must be converted to TSC by the driver software as follows: TSC = ART_NS * TSC_KHZ / 1e6 This is valid when CPU feature flag X86_FEATURE_TSC_KNOWN_FREQ is set indicating that tsc_khz is derived from CPUID[15H]. Drivers should check whether this flag is set before conversion to TSC is attempted. Suggested-by: Christopher S. Hall <christopher.s.hall@intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/1520530116-4925-1-git-send-email-rajvi.jingar@intel.com	2018-03-16 15:14:35 +01:00
Tom Lendacky	daaf216c06	KVM: x86: Fix device passthrough when SME is active When using device passthrough with SME active, the MMIO range that is mapped for the device should not be mapped encrypted. Add a check in set_spte() to insure that a page is not mapped encrypted if that page is a device MMIO page as indicated by kvm_is_mmio_pfn(). Cc: <stable@vger.kernel.org> # 4.14.x- Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-16 14:32:23 +01:00
Alexander Sergeyev	e3b3121fa8	x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist In accordance with Intel's microcode revision guidance from March 6 MCU rev 0xc2 is cleared on both Skylake H/S and Skylake Xeon E3 processors that share CPUID 506E3. Signed-off-by: Alexander Sergeyev <sergeev917@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jia Zhang <qianyue.zj@alibaba-inc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kyle Huey <me@kylehuey.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180313193856.GA8580@localhost.localdomain	2018-03-16 12:33:11 +01:00
Dave Martin	266da65e91	signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions Some architectures cannot always report accurately what kind of floating-point exception triggered a floating-point exception trap. This can occur with fp exceptions occurring on lanes in a vector instruction on arm64 for example. Rather than have every architecture come up with its own way of describing such a condition, this patch adds a common FPE_FLTUNK si_code value to report that an fp exception caused a trap but we cannot be certain which kind of fp exception it was. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2018-03-15 16:04:25 -05:00
Toshi Kani	565977a3d9	x86/mm: Remove pointless checks in vmalloc_fault vmalloc_fault() sets user's pgd or p4d from the kernel page table. Once it's set, all tables underneath are identical. There is no point of following the same page table with two separate pointers and make sure they see the same with BUG(). Remove the pointless checks in vmalloc_fault(). Also rename the kernel pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in the file. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@alien8.de> Cc: Gratian Crisan <gratian.crisan@ni.com> Link: https://lkml.kernel.org/r/20180314205932.7193-1-toshi.kani@hpe.com	2018-03-15 15:27:47 +01:00
Benjamin Gaignard	13cc36d76b	x86/rtc: Stop using deprecated functions rtc_time_to_tm() and rtc_tm_to_time() are deprecated because they rely on 32bits variables and that will make rtc break in y2038/2016. Use the proper y2038 safe functions. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Link: https://lkml.kernel.org/r/1520620971-9567-5-git-send-email-john.stultz@linaro.org	2018-03-15 09:47:24 +01:00
Dan Williams	a7e6c7015b	x86, memremap: fix altmap accounting at free Commit 24b6d4164348 "mm: pass the vmem_altmap to vmemmap_free" converted the vmemmap_free() path to pass the altmap argument all the way through the call chain rather than looking it up based on the page. Unfortunately that ends up over freeing altmap allocated pages in some cases since free_pagetable() is used to free both memmap space and pte space, where only the memmap stored in huge pages uses altmap allocations. Given that altmap allocations for memmap space are special cased in vmemmap_populate_hugepages() add a symmetric / special case free_hugepage_table() to handle altmap freeing, and cleanup the unneeded passing of altmap to leaf functions that do not require it. Without this change the sanity check accounting in devm_memremap_pages_release() will throw a warning with the following signature. nd_pmem pfn10.1: devm_memremap_pages_release: failed to free all reserved pages WARNING: CPU: 44 PID: 3539 at kernel/memremap.c:310 devm_memremap_pages_release+0x1c7/0x220 CPU: 44 PID: 3539 Comm: ndctl Tainted: G L 4.16.0-rc1-linux-stable #7 RIP: 0010:devm_memremap_pages_release+0x1c7/0x220 [..] Call Trace: release_nodes+0x225/0x270 device_release_driver_internal+0x15d/0x210 bus_remove_device+0xe2/0x160 device_del+0x130/0x310 ? klist_release+0x56/0x100 ? nd_region_notify+0xc0/0xc0 [libnvdimm] device_unregister+0x16/0x60 This was missed in testing since not all configurations will trigger this warning. Fixes: 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free") Reported-by: Jane Chu <jane.chu@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-14 14:46:23 -07:00
Thomas Gleixner	745dd37f9d	Merge branch 'x86/urgent' into x86/mm to pick up dependencies	2018-03-14 20:23:25 +01:00
Toshi Kani	18a955219b	x86/mm: Fix vmalloc_fault to use pXd_large Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS is not set since the function inadvertently uses pXn_huge(), which always return 0 in this case. ioremap() does not depend on CONFIG_HUGETLBFS. Fix vmalloc_fault() to call pXd_large() instead. Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages properly") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.kani@hpe.com	2018-03-14 20:22:42 +01:00
Andy Whitcroft	a14bff1311	x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels In the following commit: 9e0e3c5130e9 ("x86/speculation, objtool: Annotate indirect calls/jumps for objtool") ... we added annotations for CALL_NOSPEC/JMP_NOSPEC on 64-bit x86 kernels, but we did not annotate the 32-bit path. Annotate it similarly. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180314112427.22351-1-apw@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-14 13:24:31 +01:00
Andy Lutomirski	b506978245	x86/vm86/32: Fix POPF emulation POPF would trap if VIP was set regardless of whether IF was set. Fix it. Suggested-by: Stas Sergeev <stsp@list.ru> Reported-by: Bart Oldeman <bartoldeman@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 5ed92a8ab71f ("x86/vm86: Use the normal pt_regs area for vm86") Link: http://lkml.kernel.org/r/ce95f40556e7b2178b6bc06ee9557827ff94bd28.1521003603.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-14 09:21:01 +01:00
Peter Zijlstra	8343aae661	perf/core: Remove perf_event::group_entry Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 15:28:49 +01:00
Andy Shevchenko	0242874263	x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms When switching to ACPI HW reduced platforms we still want to initialize timers. Override x86_init.acpi.reduced_hw_init to achieve that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-3-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:32:57 +01:00
Andy Shevchenko	81b53e5ff2	ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback Some ACPI hardware reduced platforms need to initialize certain devices defined by the ACPI hardware specification even though in principle those devices should not be present in an ACPI hardware reduced platform. To allow that to happen, make it possible to override the generic x86_init callbacks and provide a custom legacy_pic value, add a new ->reduced_hw_early_init() callback to struct x86_init_acpi and make acpi_reduced_hw_init() use it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-2-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:32:57 +01:00
Andy Shevchenko	50beba07a0	ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export This is a preparation patch to allow override the hardware reduced initialization on ACPI enabled platforms. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:32:57 +01:00
Dmitry Vyukov	ac605bee0b	locking/atomic, asm-generic, x86: Add comments for atomic instrumentation The comments are factored out from the code changes to make them easier to read. Add them separately to explain some non-obvious aspects. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/cc595efc644bb905407012d82d3eb8bac3368e7a.1517246437.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:15:35 +01:00
Dmitry Vyukov	8bf705d130	locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h Add arch_ prefix to all atomic operations and include <asm-generic/atomic-instrumented.h>. This will allow to add KASAN instrumentation to all atomic ops. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/54f0eb64260b84199e538652e079a89b5423ad41.1517246437.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:15:35 +01:00
Kirill A. Shutemov	24c517856a	x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf MKTME_KEY_PROG allows to manipulate MKTME keys in the CPU. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:10:54 +01:00
Kirill A. Shutemov	be7825c19b	x86/pconfig: Detect PCONFIG targets Intel PCONFIG targets are enumerated via new CPUID leaf 0x1b. This patch detects all supported targets of PCONFIG and implements helper to check if the target is supported. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:10:54 +01:00
Kirill A. Shutemov	cb06d8e3d0	x86/tme: Detect if TME and MKTME is activated by BIOS IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has enabled TME and MKTME. It includes which encryption policy/algorithm is selected for TME or available for MKTME. For MKTME, the MSR also enumerates how many KeyIDs are available. We would need to exclude KeyID bits from physical address bits. detect_tme() would adjust cpuinfo_x86::x86_phys_bits accordingly. We have to do this even if we are not going to use KeyID bits ourself. VM guests still have to know that these bits are not usable for physical address. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:10:54 +01:00
Ingo Molnar	3c76db70eb	Merge branch 'x86/pti' into x86/mm, to pick up dependencies Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:10:03 +01:00
Kirill A. Shutemov	7958b2246f	x86/cpufeatures: Add Intel PCONFIG cpufeature CPUID.0x7.0x0:EDX[18] indicates whether Intel CPU support PCONFIG instruction. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:09:53 +01:00
Kirill A. Shutemov	1da961d72a	x86/cpufeatures: Add Intel Total Memory Encryption cpufeature CPUID.0x7.0x0:ECX[13] indicates whether CPU supports Intel Total Memory Encryption. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 12:09:53 +01:00
Kirill A. Shutemov	194a9749c7	x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G This patch addresses a shortcoming in current boot process on machines that supports 5-level paging. If a bootloader enables 64-bit mode with 4-level paging, we might need to switch over to 5-level paging. The switching requires the disabling paging. It works fine if kernel itself is loaded below 4G. But if the bootloader put the kernel above 4G (not sure if anybody does this), we would lose control as soon as paging is disabled, because the code becomes unreachable to the CPU. This patch implements a trampoline in lower memory to handle this situation. We only need the memory for a very short time, until the main kernel image sets up own page tables. We go through the trampoline even if we don't have to: if we're already in 5-level paging mode or if we don't need to switch to it. This way the trampoline gets tested on every boot. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:49:25 +01:00
Kirill A. Shutemov	0a1756bd28	x86/boot/compressed/64: Use page table in trampoline memory If a bootloader enables 64-bit mode with 4-level paging, we might need to switch over to 5-level paging. The switching requires the disabling paging. It works fine if kernel itself is loaded below 4G. But if the bootloader put the kernel above 4G (i.e. in kexec() case), we would lose control as soon as paging is disabled, because the code becomes unreachable to the CPU. To handle the situation, we need a trampoline in lower memory that would take care of switching on 5-level paging. Apart from the trampoline code itself we also need a place to store top-level page table in lower memory as we don't have a way to load 64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there as we only use the very first entry of the page table. But we allocate a whole page anyway. This patch switches 32-bit code to use page table in trampoline memory. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:49:25 +01:00
Kirill A. Shutemov	f7ff53e470	x86/boot/compressed/64: Use stack from trampoline memory As the first step on using trampoline memory, let's make 32-bit code use stack there. Separate stack is required to return back from trampoline and we cannot user stack from 64-bit mode as it may be above 4G. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:49:24 +01:00
Kirill A. Shutemov	7beebaccd5	x86/boot/compressed/64: Make sure we have a 32-bit code segment When kernel starts in 64-bit mode we inherit the GDT from the bootloader. It may cause a problem if the GDT doesn't have a 32-bit code segment where we expect it to be. Load our own GDT with known segments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:49:24 +01:00
Sai Praneeth	03781e4089	x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3 Use helper function efi_switch_mm() to switch to/from efi_mm when invoking any UEFI runtime services. Likewise, we need to switch back to previous mm (mm context stolen by efi_mm) after the above calls return successfully. We can use efi_switch_mm() helper function only with x86_64 kernel and "efi=old_map" disabled because, x86_32 and efi=old_map do not use efi_pgd, rather they use swapper_pg_dir. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> [ardb: add #include of sched/task.h for task_lock/_unlock] Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:05:05 +01:00
Sai Praneeth	3ede3417f8	x86/efi: Replace efi_pgd with efi_mm.pgd Since the previous patch added support for efi_mm, let's handle efi_pgd through efi_mm and remove global variable efi_pgd. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 11:05:05 +01:00
Kirill A. Shutemov	a5b162b2ec	x86/mm: Do not use paravirtualized calls in native_set_p4d() In 4-level paging mode, native_set_p4d() updates the entry in the top-level page table. With PTI, update to the top-level kernel page table requires update to the userspace copy of the table as well, using pti_set_user_pgd(). native_set_p4d() uses p4d_val() and pgd_val() to convert types between p4d_t and pgd_t. p4d_val() and pgd_val() are paravirtualized and we must not use them in native helpers, as they crash the boot in paravirtualized environments. Replace p4d_val() and pgd_val() with native_p4d_val() and native_pgd_val() in native_set_p4d(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 91f606a8fa68 ("x86/mm: Replace compile-time checks for 5-level paging with runtime-time checks") Link: http://lkml.kernel.org/r/20180305081641.4290-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 10:30:48 +01:00
Ard Biesheuvel	36b649760e	efi: Use string literals for efi_char16_t variable initializers Now that we unambiguously build the entire kernel with -fshort-wchar, it is no longer necessary to open code efi_char16_t[] initializers as arrays of characters, and we can move to the L"xxx" notation instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180312084500.10764-6-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 10:05:02 +01:00
Sai Praneeth	7e904a91bf	efi: Use efi_mm in x86 as well as ARM Presently, only ARM uses mm_struct to manage EFI page tables and EFI runtime region mappings. As this is the preferred approach, let's make this data structure common across architectures. Specially, for x86, using this data structure improves code maintainability and readability. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> [ardb: don't #include the world to get a declaration of struct mm_struct] Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180312084500.10764-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 10:05:01 +01:00

... 4 5 6 7 8 ...

29528 Commits