linux

iv/linux

Author	SHA1	Message	Date
Andy Lutomirski	657c1eea00	x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on When I rewrote entry_INT80_32, I thought that int80 was an interrupt gate. It's a trap gate. facepalm Thanks to Brian Gerst for pointing out that it's better to change the entry code than to change the gate type. Suggested-by: Brian Gerst <brgerst@gmail.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 150ac78d63af ("x86/entry/32: Switch INT80 to the new C syscall path") Link: http://lkml.kernel.org/r/dc09d9b574a5c1dcca996847875c73f8341ce0ad.1445035014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-18 12:11:16 +02:00
Jiang Liu	4d6b4e69a2	x86/PCI/ACPI: Use common interface to support PCI host bridge Use common interface to simplify ACPI PCI host bridge implementation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-16 22:18:52 +02:00
Jiang Liu	a3669868d9	ACPI/PCI: Reset acpi_root_dev->domain to 0 when pci_ignore_seg is set Reset acpi_root_dev->domain to 0 when pci_ignore_seg is set to keep consistence between ACPI PCI root device and PCI host bridge device. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-16 22:18:52 +02:00
Rafael J. Wysocki	7855e10294	Merge back earlier cpufreq material for v4.4.	2015-10-16 22:12:02 +02:00
Vitaly Kuznetsov	c0ff971ef9	x86/ioapic: Disable interrupts when re-routing legacy IRQs A sporadic hang with consequent crash is observed when booting Hyper-V Gen1 guests: Call Trace: <IRQ> [<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8107b616>] queue_work_on+0x46/0x90 [<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0 ... <EOI> [<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40 [<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0 [<ffffffff8104adbb>] mp_register_handler+0x4b/0x70 [<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0 [<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0 [<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0 [<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0 [<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0 [<ffffffff8104c157>] pin_2_irq+0x47/0x80 [<ffffffff81744253>] setup_IO_APIC+0xfe/0x802 ... [<ffffffff814631c0>] ? rest_init+0x140/0x140 The issue is easily reproducible with a simple instrumentation: if mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing IRQ0. The issue seems to be caused by the fact that we don't disable interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually happens. Protect the setup sequence against concurrent interrupts. [ tglx: Make the protection unconditional and not only for legacy interrupts ] Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-10-16 16:31:24 +02:00
Paolo Bonzini	f5f3497cad	x86/setup: Extend low identity map to cover whole kernel range On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb 0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) 4 0x32b7070: 0x00000000 [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Updated changelog. ]	2015-10-16 10:52:29 +01:00
Marcelo Tosatti	7cae2bedcb	KVM: x86: move steal time initialization to vcpu entry time As reported at https://bugs.launchpad.net/qemu/+bug/1494350, it is possible to have vcpu->arch.st.last_steal initialized from a thread other than vcpu thread, say the iothread, via KVM_SET_MSRS. Which can cause an overflow later (when subtracting from vcpu threads sched_info.run_delay). To avoid that, move steal time accumulation to vcpu entry time, before copying steal time data to guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:16 +02:00
Takuya Yoshikawa	5225fdf8c8	KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level() Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be avoided since getting a slot by binary search may not be negligible, especially for virtual machines with many memory slots. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:02 +02:00
Takuya Yoshikawa	d8aacf5df8	KVM: x86: MMU: Remove mapping_level_dirty_bitmap() Now that it has only one caller, and its name is not so helpful for readers, remove it. The new memslot_valid_for_gpte() function makes it possible to share the common code between gfn_to_memslot_dirty_bitmap() and mapping_level(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:01 +02:00
Takuya Yoshikawa	fd13690218	KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level() This is necessary to eliminate an extra memory slot search later. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:00 +02:00
Takuya Yoshikawa	5ed5c5c8fd	KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)() As a bonus, an extra memory slot search can be eliminated when is_self_change_mapping is true. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:00 +02:00
Takuya Yoshikawa	cd1872f028	KVM: x86: MMU: Make force_pt_level bool This will be passed to a function later. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:33:59 +02:00
Joerg Roedel	6092d3d3e6	kvm: svm: Only propagate next_rip when guest supports it Currently we always write the next_rip of the shadow vmcb to the guests vmcb when we emulate a vmexit. This could confuse the guest when its cpuid indicated no support for the next_rip feature. Fix this by only propagating next_rip if the guest actually supports it. Cc: Bandan Das <bsd@redhat.com> Cc: Dirk Mueller <dmueller@suse.com> Tested-By: Dirk Mueller <dmueller@suse.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:32:17 +02:00
Paolo Bonzini	951f9fd74f	KVM: x86: manually unroll bad_mt_xwr loop The loop is computing one of two constants, it can be simpler to write everything inline. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:32:16 +02:00
Wanpeng Li	089d7b6ec5	KVM: nVMX: expose VPID capability to L1 Expose VPID capability to L1. For nested guests, we don't do anything specific for single context invalidation. Hence, only advertise support for global context invalidation. The major benefit of nested VPID comes from having separate vpids when switching between L1 and L2, and also when L2's vCPUs not sched in/out on L1. Reviewed-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:30:55 +02:00
Wanpeng Li	5c614b3583	KVM: nVMX: nested VPID emulation VPID is used to tag address space and avoid a TLB flush. Currently L0 use the same VPID to run L1 and all its guests. KVM flushes VPID when switching between L1 and L2. This patch advertises VPID to the L1 hypervisor, then address space of L1 and L2 can be separately treated and avoid TLB flush when swithing between L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid w/ an invvpid. Performance: run lmbench on L2 w/ 3.5 kernel. Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- kernel Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000 nested VPID kernel Linux 3.5.0-1 1.2600 1.4300 1.5600 12.7 12.9 3.49000 7.46000 vanilla Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:30:35 +02:00
Wanpeng Li	99b83ac893	KVM: nVMX: emulate the INVVPID instruction Add the INVVPID instruction emulation. Reviewed-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:30:24 +02:00
Srinivas Pandruvada	6a35fc2d6c	cpufreq: intel_pstate: get P1 from TAR when available After Ivybridge, the max non turbo ratio obtained from platform info msr is not always guaranteed P1 on client platforms. The max non turbo activation ratio (TAR), determines the max for the current level of TDP. The ratio in platform info is physical max. The TAR MSR can be locked, so updating this value is not possible on all platforms. This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if available, but also do some sanity checking to make sure that this value is correct. The sanity check involves reading the TDP ratio for the current tdp control value when platform has configurable TDP present and matching TAC with this. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-15 01:53:18 +02:00
Lukasz Anaczkowski	d81056b527	x86, ACPI: Handle apic/x2apic entries in MADT in correct order ACPI specifies the following rules when listing APIC IDs: (1) Boot processor is listed first (2) For multi-threaded processors, BIOS should list the first logical processor of each of the individual multi-threaded processors in MADT before listing any of the second logical processors. (3) APIC IDs < 0xFF should be listed in APIC subtable, APIC IDs >= 0xFF should be listed in X2APIC subtable Because of above, when there's more than 0xFF logical CPUs, BIOS interleaves APIC/X2APIC subtables. Assuming, there's 72 cores, 72 hyper-threads each, 288 CPUs total, listing is like this: APIC (0,4,8, .., 252) X2APIC (258,260,264, .. 284) APIC (1,5,9,...,253) X2APIC (259,261,265,...,285) APIC (2,6,10,...,254) X2APIC (260,262,266,..,286) APIC (3,7,11,...,251) X2APIC (255,261,262,266,..,287) Now, before this patch, due to how ACPI MADT subtables were parsed (BSP then X2APIC then APIC), kernel enumerated CPUs in reverted order (i.e. high APIC IDs were getting low logical IDs, and low APIC IDs were getting high logical IDs). This is wrong for the following reasons: () it's hard to predict how cores and threads are enumerated () when it's hard to predict, s/w threads cannot be properly affinitized causing significant performance impact due to e.g. inproper cache sharing () enumeration is inconsistent with how threads are enumerated on other Intel Xeon processors So, order in which MADT APIC/X2APIC handlers are passed is reverse and both handlers are passed to be called during same MADT table to walk to achieve correct CPU enumeration. In scenario when someone boots kernel with options 'maxcpus=72 nox2apic', in result less cores may be booted, since some of the CPUs the kernel will try to use will have APIC ID >= 0xFF. In such case, one should not pass 'nox2apic'. Disclimer: code parsing MADT APIC/X2APIC has not been touched since 2009, when X2APIC support was initially added. I do not know why MADT parsing code was added in the reversed order in the first place. I guess it didn't matter at that time since nobody cared about cores with APIC IDs >= 0xFF, right? This patch is based on work of "Yinghai Lu <yinghai@kernel.org>" previously published at https://lkml.org/lkml/2013/1/21/563 Here's the explanation why parsing interface needs to be changed and why simpler approach will not work https://lkml.org/lkml/2015/9/7/285 Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> (commit message) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-15 01:31:06 +02:00
Linus Torvalds	cfed1e3de4	Bug fixes for system management mode emulation. The first two patches fix SMM emulation on Nehalem processors. The others fix some cases that became apparent as work progressed on the firmware side. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWHms0AAoJEL/70l94x66DfQkIAIpya6c/1UAthxSTqJ1wFOf8 ZKp3GCMjUjtm9k88kk6JGOlPiAvWz7CG9BVbptpkJGpgoDzquvr6ZKGG2BV88F17 MnkZCid4IBW6VeKYy7R2otkKw7+Pw8DTHRQks+VI6BN/KkeaZLzh5J8+FAl4ZaWk YX/VulRce6SfZPYuUTRkkK8aebsopZNVG8mwWIGuBYwyH54R3KH1k/euX2joUPwm oopzmQLgEWW7e3RsO67T36rIRgEorJLZaiiexvj1djI+e0kEEudvhJ9nC6eB52qa oZ9nR0nkkmBmrBF8gldKDZBC+Y/ci1cJLAaoi7tdsp0wVCebPxubwbPOXxKwD8g= =ij8Q -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Bug fixes for system management mode emulation. The first two patches fix SMM emulation on Nehalem processors. The others fix some cases that became apparent as work progressed on the firmware side" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix RSM into 64-bit protected mode KVM: x86: fix previous commit for 32-bit KVM: x86: fix SMI to halted VCPU KVM: x86: clean up kvm_arch_vcpu_runnable KVM: x86: map/unmap private slots in __x86_set_memory_region KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region	2015-10-14 10:01:32 -07:00
Andy Lutomirski	612bece654	um/x86: Fix build after x86 syscall changes I didn't realize that um didn't include x86's asm/syscall.h. Re-add a missing typedef. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 034042cc1e28 ("x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h") Link: http://lkml.kernel.org/r/8d15b9a88f4fd49e3342757e0a34624ee5ce9220.1444696194.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:56:28 +02:00
Andy Lutomirski	af22aa7c76	x86/asm: Remove the xyz_cfi macros from dwarf2.h They are currently unused, and I don't think that anyone was ever particularly happy with them. They had the unfortunate property that they made it easy to CFI-annotate things without thinking about them -- when pushing, do you want to just update the CFA offset, or do you also want to update the saved location of the register being pushed? Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447bfbd10bb268b4593b32534ecefa1f4df287e.1444696194.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:56:28 +02:00
Ingo Molnar	790a2ee242	* Make the EFI System Resource Table (ESRT) driver explicitly non-modular by ripping out the module_* code since Kconfig doesn't allow it to be built as a module anyway - Paul Gortmaker * Make the x86 efi=debug kernel parameter, which enables EFI debug code and output, generic and usable by arm64 - Leif Lindholm * Add support to the x86 EFI boot stub for 64-bit Graphics Output Protocol frame buffer addresses - Matt Fleming * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled in the firmware and set an efi.flags bit so the kernel knows when it can apply more strict runtime mapping attributes - Ard Biesheuvel * Auto-load the efi-pstore module on EFI systems, just like we currently do for the efivars module - Ben Hutchings * Add "efi_fake_mem" kernel parameter which allows the system's EFI memory map to be updated with additional attributes for specific memory ranges. This is useful for testing the kernel code that handles the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware doesn't include support - Taku Izumi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G 8xb/rET4JrhCG4gFMUZ7 =1Oee -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi Pull v4.4 EFI updates from Matt Fleming: - Make the EFI System Resource Table (ESRT) driver explicitly non-modular by ripping out the module_* code since Kconfig doesn't allow it to be built as a module anyway. (Paul Gortmaker) - Make the x86 efi=debug kernel parameter, which enables EFI debug code and output, generic and usable by arm64. (Leif Lindholm) - Add support to the x86 EFI boot stub for 64-bit Graphics Output Protocol frame buffer addresses. (Matt Fleming) - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled in the firmware and set an efi.flags bit so the kernel knows when it can apply more strict runtime mapping attributes - Ard Biesheuvel - Auto-load the efi-pstore module on EFI systems, just like we currently do for the efivars module. (Ben Hutchings) - Add "efi_fake_mem" kernel parameter which allows the system's EFI memory map to be updated with additional attributes for specific memory ranges. This is useful for testing the kernel code that handles the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware doesn't include support. (Taku Izumi) Note: there is a semantic conflict between the following two commits: 8a53554e12e9 ("x86/efi: Fix multiple GOP device support") ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses") I fixed up the interaction in the merge commit, changing the type of current_fb_base from u32 to u64. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:51:34 +02:00
Wanpeng Li	dd5f5341a3	KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid Introduce __vmx_flush_tlb() to handle specific vpid. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:09 +02:00
Wanpeng Li	991e7a0eed	KVM: VMX: adjust interface to allocate/free_vpid Adjust allocate/free_vid so that they can be reused for the nested vpid. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:09 +02:00
Radim Krčmář	13db77347d	KVM: x86: don't notify userspace IOAPIC on edge EOI On real hardware, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. Do the same here. Staying in guest/kernel mode after edge EOI is what we want for most devices. If some bugs could be nicely worked around with edge EOI notifications, we should invest in a better interface. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:08 +02:00
Radim Krčmář	db2bdcbbbd	KVM: x86: fix edge EOI and IOAPIC reconfig race KVM uses eoi_exit_bitmap to track vectors that need an action on EOI. The problem is that IOAPIC can be reconfigured while an interrupt with old configuration is pending and eoi_exit_bitmap only remembers the newest configuration; thus EOI from the pending interrupt is not recognized. (Reconfiguration is not a problem for level interrupts, because IOAPIC sends interrupt with the new configuration.) For an edge interrupt with ACK notifiers, like i8254 timer; things can happen in this order 1) IOAPIC inject a vector from i8254 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap on original VCPU gets cleared 3) guest's handler for the vector does EOI 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is not in that VCPU's eoi_exit_bitmap 5) i8254 stops working A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the vector is in PIR/IRR/ISR. This creates an unwanted situation if the vector is reused by a non-IOAPIC source, but I think it is so rare that we don't want to make the solution more sophisticated. The simple solution also doesn't work if we are reconfiguring the vector. (Shouldn't happen in the wild and I'd rather fix users of ACK notifiers instead of working around that.) The are no races because ioapic injection and reconfig are locked. Fixes: b053b2aef25d ("KVM: x86: Add EOI exit bitmap inference") [Before b053b2aef25d, this bug happened only with APICv.] Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc: <stable@vger.kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:08 +02:00
Radim Krčmář	c77f3fab44	kvm: x86: set KVM_REQ_EVENT when updating IRR After moving PIR to IRR, the interrupt needs to be delivered manually. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:08 +02:00
Paolo Bonzini	bff98d3b01	Merge branch 'kvm-master' into HEAD Merge more important SMM fixes.	2015-10-14 16:40:46 +02:00
Paolo Bonzini	b10d92a54d	KVM: x86: fix RSM into 64-bit protected mode In order to get into 64-bit protected mode, you need to enable paging while EFER.LMA=1. For this to work, CS.L must be 0. Currently, we load the segments before CR0 and CR4, which means that if RSM returns into 64-bit protected mode CS.L is already 1 and everything breaks. Luckily, CS.L=0 is always the case when executing RSM, because it is forbidden to execute RSM from 64-bit protected mode. Hence it is enough to load CR0 and CR4 first, and only then the segments. Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:39:52 +02:00
Paolo Bonzini	25188b9986	KVM: x86: fix previous commit for 32-bit Unfortunately I only noticed this after pushing. Fixes: f0d648bdf0a5bbc91da6099d5282f77996558ea4 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:39:25 +02:00
Ingo Molnar	c7d77a7980	Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:05:18 +02:00
Kővágó, Zoltán	8a53554e12	x86/efi: Fix multiple GOP device support When multiple GOP devices exists, but none of them implements ConOut, the code should just choose the first GOP (according to the comments). But currently 'fb_base' will refer to the last GOP, while other parameters to the first GOP, which will likely result in a garbled display. I can reliably reproduce this bug using my ASRock Z87M Extreme4 motherboard with CSM and integrated GPU disabled, and two PCIe video cards (NVidia GT640 and GTX980), booting from efi-stub (booting from grub works fine). On the primary display the ASRock logo remains and on the secondary screen it is garbled up completely. Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1444659236-24837-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:02:43 +02:00
Martin Schwidefsky	a7b7617493	mm: add architecture primitives for software dirty bit clearing There are primitives to create and query the software dirty bits in a pte or pmd. But the clearing of the software dirty bits is done in common code with x86 specific page table functions. Add the missing architecture primitives to clear the software dirty bits to allow the feature to be used on non-x86 systems, e.g. the s390 architecture. Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-10-14 14:32:05 +02:00
Andy Shevchenko	3435dd0809	x86/early_printk: Set __iomem address space for IO There are following warnings on unpatched code: arch/x86/kernel/early_printk.c:198:32: warning: incorrect type in initializer (different address spaces) arch/x86/kernel/early_printk.c:198:32: expected void [noderef] <asn:2>vaddr arch/x86/kernel/early_printk.c:198:32: got unsigned int [usertype] <noident> arch/x86/kernel/early_printk.c:205:32: warning: incorrect type in initializer (different address spaces) arch/x86/kernel/early_printk.c:205:32: expected void [noderef] <asn:2>vaddr arch/x86/kernel/early_printk.c:205:32: got unsigned int [usertype] <noident> Annotate it proper. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/1444646837-42615-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-10-13 21:45:56 +02:00
Paolo Bonzini	58f800d5ac	Merge branch 'kvm-master' into HEAD This merge brings in a couple important SMM fixes, which makes it easier to test latest KVM with unrestricted_guest=0 and to test the in-progress work on SMM support in the firmware. Conflicts: arch/x86/kvm/x86.c	2015-10-13 21:32:50 +02:00
Linus Torvalds	6006d4521b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Fix AVX detection to prevent use of non-existent AESNI. - Some SPARC ciphers did not set their IV size which may lead to memory corruption" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ahash - ensure statesize is non-zero crypto: camellia_aesni_avx - Fix CPU feature checks crypto: sparc - initialize blkcipher.ivsize	2015-10-13 10:18:54 -07:00
Paolo Bonzini	7391773933	KVM: x86: fix SMI to halted VCPU An SMI to a halted VCPU must wake it up, hence a VCPU with a pending SMI must be considered runnable. Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-13 18:29:41 +02:00
Paolo Bonzini	5d9bc648b9	KVM: x86: clean up kvm_arch_vcpu_runnable Split the huge conditional in two functions. Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16 Cc: stable@vger.kernel.org Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-13 18:28:59 +02:00
Paolo Bonzini	f0d648bdf0	KVM: x86: map/unmap private slots in __x86_set_memory_region Otherwise, two copies (one of them never populated and thus bogus) are allocated for the regular and SMM address spaces. This breaks SMM with EPT but without unrestricted guest support, because the SMM copy of the identity page map is all zeros. By moving the allocation to the caller we also remove the last vestiges of kernel-allocated memory regions (not accessible anymore in userspace since commit b74a07beed0e, "KVM: Remove kernel-allocated memory regions", 2010-06-21); that is a nice bonus. Reported-by: Alexandre DERUMIER <aderumier@odiso.com> Cc: stable@vger.kernel.org Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69 Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-13 18:28:58 +02:00
Paolo Bonzini	1d8007bdee	KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region The next patch will make x86_set_memory_region fill the userspace_addr. Since the struct is not used untouched anymore, it makes sense to build it in x86_set_memory_region directly; it also simplifies the callers. Reported-by: Alexandre DERUMIER <aderumier@odiso.com> Cc: stable@vger.kernel.org Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69 Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-13 18:28:46 +02:00
Borislav Petkov	0399f73299	x86/microcode/amd: Do not overwrite final patch levels A certain number of patch levels of applied microcode should not be overwritten by the microcode loader, otherwise bad things will happen. Check those and abort update if the current core has one of those final patch levels applied by the BIOS. 32-bit needs special handling, of course. See https://bugzilla.suse.com/show_bug.cgi?id=913996 for more info. Tested-by: Peter Kirchgeßner <pkirchgessner@t-online.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1444641762-9437-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:48 +02:00
Borislav Petkov	2eff73c0a1	x86/microcode/amd: Extract current patch level read to a function Pave the way for checking the current patch level of the microcode in a core. We want to be able to do stuff depending on the patch level - in this case decide whether to update or not. But that will be added in a later patch. Drop unused local var uci assignment, while at it. Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai: Use native_rdmsr() in check_current_patch_level() because with CONFIG_PARAVIRT enabled and on 32-bit, where we run before paging has been enabled, we cannot deref pv_info yet. Or we could, but we'd need to access its physical address. This way of fixing it is simpler. See: https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Takashi Iwai <tiwai@suse.com>: Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:48 +02:00
Aravind Gopalakrishnan	fa20a2ed6f	x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC Bank 4 MCEs are logged and reported only on the node base core (NBC) in a socket. Refer to the D18F3x44[NbMcaToMstCpuEn] field in Fam10h and later BKDGs. The node base core (NBC) is the lowest numbered core in the node. This patch ensures that we inject the error on the NBC for bank 4 errors. Otherwise, triggering #MC or APIC interrupts on a core which is not the NBC would not have any effect on the system, i.e. we would not see any relevant output on kernel logs for the error we just injected. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanup comments. ] [ Add a missing dependency on AMD_NB caught by Randy Dunlap. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1443190851-2172-4-git-send-email-Aravind.Gopalakrishnan@amd.com Link: http://lkml.kernel.org/r/1444641762-9437-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:48 +02:00
Aravind Gopalakrishnan	a1300e5052	x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts Add the capability to trigger deferred error interrupts and threshold interrupts in order to test the APIC interrupt handler functionality for these type of errors. Update README section about the same too. Reported by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanup comments. ] [ Include asm/irq_vectors.h directly so that misc randbuilds don't fail. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1443190851-2172-3-git-send-email-Aravind.Gopalakrishnan@amd.com Link: http://lkml.kernel.org/r/1444641762-9437-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:47 +02:00
Aravind Gopalakrishnan	85c9306d44	x86/ras/mce_amd_inj: Return early on invalid input Invalid inputs such as these are currently reported in dmesg as failing: $> echo sweet > flags [ 122.079139] flags_write: Invalid flags value: et even though the 'flags' attribute has been updated correctly: $> cat flags sw This is because userspace keeps writing the remaining buffer until it encounters an error. However, the input as a whole is wrong and we should not be writing anything to the file. Therefore, correct flags_write() to return -EINVAL immediately on bad input strings. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1443190851-2172-2-git-send-email-Aravind.Gopalakrishnan@amd.com Link: http://lkml.kernel.org/r/1444641762-9437-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:47 +02:00
Taku Izumi	0f96a99dab	efi: Add "efi_fake_mem" boot option This patch introduces new boot option named "efi_fake_mem". By specifying this parameter, you can add arbitrary attribute to specific memory range. This is useful for debugging of Address Range Mirroring feature. For example, if "efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000" is specified, the original (firmware provided) EFI memmap will be updated so that the specified memory regions have EFI_MEMORY_MORE_RELIABLE attribute (0x10000): <original> efi: mem36: [Conventional Memory\| \| \| \| \| \| \|WB\|WT\|WC\|UC] range=[0x0000000100000000-0x00000020a0000000) (129536MB) <updated> efi: mem36: [Conventional Memory\| \|MR\| \| \| \| \|WB\|WT\|WC\|UC] range=[0x0000000100000000-0x0000000180000000) (2048MB) efi: mem37: [Conventional Memory\| \| \| \| \| \| \|WB\|WT\|WC\|UC] range=[0x0000000180000000-0x00000010a0000000) (61952MB) efi: mem38: [Conventional Memory\| \|MR\| \| \| \| \|WB\|WT\|WC\|UC] range=[0x00000010a0000000-0x0000001120000000) (2048MB) efi: mem39: [Conventional Memory\| \| \| \| \| \| \|WB\|WT\|WC\|UC] range=[0x0000001120000000-0x00000020a0000000) (63488MB) And you will find that the following message is output: efi: Memory: 4096M/131455M mirrored memory Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-10-12 14:20:09 +01:00
Taku Izumi	0bbea1ce98	x86/efi: Rename print_efi_memmap() to efi_print_memmap() This patch renames print_efi_memmap() to efi_print_memmap() and make it global function so that we can invoke it outside of arch/x86/platform/efi/efi.c Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-10-12 14:20:08 +01:00
Matt Fleming	ae2ee627dc	efifb: Add support for 64-bit frame buffer addresses The EFI Graphics Output Protocol uses 64-bit frame buffer addresses but these get truncated to 32-bit by the EFI boot stub when storing the address in the 'lfb_base' field of 'struct screen_info'. Add a 'ext_lfb_base' field for the upper 32-bits of the frame buffer address and set VIDEO_TYPE_CAPABILITY_64BIT_BASE when the field is useable. It turns out that the reason no one has required this support so far is that there's actually code in tianocore to "downgrade" PCI resources that have option ROMs and 64-bit BARS from 64-bit to 32-bit to cope with legacy option ROMs that can't handle 64-bit addresses. The upshot is that basically all GOP devices in the wild use a 32-bit frame buffer address. Still, it is possible to build firmware that uses a full 64-bit GOP frame buffer address. Chad did, which led to him reporting this issue. Add support in anticipation of GOP devices using 64-bit addresses more widely, and so that efifb works out of the box when that happens. Reported-by: Chad Page <chad.page@znyx.com> Cc: Pete Hawkins <pete.hawkins@znyx.com> Acked-by: Peter Jones <pjones@redhat.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-10-12 14:20:06 +01:00
Leif Lindholm	12dd00e83f	efi/x86: Move efi=debug option parsing to core fed6cefe3b6e ("x86/efi: Add a "debug" option to the efi= cmdline") adds the DBG flag, but does so for x86 only. Move this early param parsing to core code. Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Salter <msalter@redhat.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-10-12 14:20:05 +01:00

... 2 3 4 5 6 ...

23162 Commits