148435 Commits

Author SHA1 Message Date
Linus Torvalds
ebad825cdd ia64: mark special ia64 memory areas anonymous
Commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") made
newly allocated vma's have a dummy vm_ops field so that they wouldn't be
mistaken for anonymous mappings, and if you wanted an anonymous vma you
had to explicitly say so by calling "vma_set_anonymous()" on it.

However, it missed the two special vmas that ia64 processes have: the
register backing store and the NaT page.  So they wouldn't actually act
like anonymous ranges, and page faults on them caused a SIGBUS rather
than the creation of a new anon page in them.

That obviously will make any ia64 binary very unhappy indeed, and the
boot fails early.

Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01 09:57:50 -07:00
Linus Torvalds
f67077deb4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several smallish fixes, I don't think any of this requires another -rc
  but I'll leave that up to you:

   1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric
      Dumazet.

   2) Route leak in xfrm_lookup_route(), from Tommi Rantala.

   3) Premature poll() returns in AF_XDP, from Björn Töpel.

   4) devlink leak in netdevsim, from Jakub Kicinski.

   5) Don't BUG_ON in fib_compute_spec_dst, the condition can
      legitimately happen. From Lorenzo Bianconi.

   6) Fix some spectre v1 gadgets in generic socket code, from Jeremy
      Cline.

   7) Don't allow user to bind to out of range multicast groups, from
      Dmitry Safonov with a follow-up by Dmitry Safonov.

   8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  netlink: Don't shift with UB on nlk->ngroups
  net/ipv6: fix metrics leak
  xen-netfront: wait xenbus state change when load module manually
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  openvswitch: meter: Fix setting meter id for new entries
  netlink: Do not subscribe to non-existent groups
  NET: stmmac: align DMA stuff to largest cache line length
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  net: socket: Fix potential spectre v1 gadget in sock_is_registered
  net: socket: fix potential spectre v1 gadget in socketcall
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  enic: handle mtu change for vf properly
  net: lan78xx: fix rx handling before first packet is send
  nfp: flower: fix port metadata conversion bug
  bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()
  bpf: fix bpf_skb_load_bytes_relative pkt length check
  perf build: Build error in libbpf missing initialization
  net: ena: Fix use of uninitialized DMA address bits field
  bpf: btf: Use exact btf value_size match in map_check_btf()
  ...
2018-07-30 21:40:37 -07:00
Thomas Petazzoni
12be1036c5 sparc: use asm-generic version of msi.h
This is necessary to be able to include <linux/msi.h> when
CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled. Without this, a build with
CONFIG_GENERIC_MSI_IRQ_DOMAIN fails with:

   In file included from drivers//ata/ahci.c:45:0:
>> include/linux/msi.h:226:10: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
             msi_alloc_info_t *arg);
             ^~~~~~~~~~~~~~~~
             sg_alloc_fn
   include/linux/msi.h:230:9: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
            msi_alloc_info_t *arg);
            ^~~~~~~~~~~~~~~~
            sg_alloc_fn
   include/linux/msi.h:239:12: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
               msi_alloc_info_t *arg);
               ^~~~~~~~~~~~~~~~
               sg_alloc_fn
   include/linux/msi.h:240:22: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
     void  (*msi_finish)(msi_alloc_info_t *arg, int retval);
                         ^~~~~~~~~~~~~~~~
                         sg_alloc_fn
   include/linux/msi.h:241:20: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
     void  (*set_desc)(msi_alloc_info_t *arg,
                       ^~~~~~~~~~~~~~~~
                       sg_alloc_fn
   include/linux/msi.h:316:18: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
           int nvec, msi_alloc_info_t *args);
                     ^~~~~~~~~~~~~~~~
                     sg_alloc_fn
   include/linux/msi.h:318:29: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
            int virq, int nvec, msi_alloc_info_t *args);
                                ^~~~~~~~~~~~~~~~
                                sg_alloc_fn

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 13:00:56 -07:00
Thomas Petazzoni
f0afc6b18d sparc: move MSI related definitions to where they are used
The definitions in arch/sparc/include/asm/msi.h are only used in
arch/sparc/mm/srmmu.c, so it makes sense to have them in the C file
directly.

In addition, having a custom arch/sparc/include/asm/msi.h prevents
from using the asm-generic version of this header, which is necessary
to be able to include <linux/msi.h> when CONFIG_GENERIC_MSI_IRQ_DOMAIN
is enabled.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 13:00:56 -07:00
Steven Rostedt (VMware)
6f57ed681e sparc/time: Add missing __init to init_tick_ops()
Code that was added to force gcc not to inline any function that isn't
explicitly declared as inline uncovered that init_tick_ops() isn't
marked as "__init". It is only called by __init functions and more
importantly it too calls an __init function which would require it to be
__init as well.

Link: http://lkml.kernel.org/r/201806060444.hdHcKOBy%fengguang.wu@intel.com

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 12:48:29 -07:00
Linus Torvalds
527838d470 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - a build race fix

   - a Xen entry fix

   - a TSC_DEADLINE quirk future-proofing fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix if_changed build flip/flop bug
  x86/entry/64: Remove %ebx handling from error_entry/exit
  x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
2018-07-30 12:16:03 -07:00
Linus Torvalds
0634922a78 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - AMD IBS data corruptor fix (uncovered by UBSAN)

   - an Intel PEBS entry unwind error fix

   - a HW-tracing crash fix

   - a MAINTAINERS update"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix crash when using HW tracing kernel filters
  perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
  MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer
  perf/x86/amd/ibs: Don't access non-started event
2018-07-30 11:45:30 -07:00
Linus Torvalds
fb20c03d37 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
  i2c/mux, locking/core: Annotate the nested rt_mutex usage
  locking/rtmutex: Allow specifying a subclass for nested locking
2018-07-30 11:37:16 -07:00
Linus Torvalds
d464b0314c Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
 "An UEFI variables fix for SEV guests"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Access EFI MMIO data as unencrypted when SEV is active
2018-07-30 11:07:34 -07:00
David S. Miller
958b4cd8fa Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-07-28

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) API fixes for libbpf's BTF mapping of map key/value types in order
   to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
   markings, from Martin.

2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
   consumer pointer of the RX queue, from Björn.

3) Fix __xdp_return() to check for NULL pointer after the rhashtable
   lookup that retrieves the allocator object, from Taehee.

4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
   by 4 bytes which got removed from overall stack usage, from Wang.

5) Fix bpf_skb_load_bytes_relative() length check to use actual
   packet length, from Daniel.

6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
   handler, from Thomas.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28 21:02:21 -07:00
Linus Torvalds
7648c44680 One more fix for 4.18:
- Revert an errata workaround for the BCM5300X platform that was
     merged for v4.18-rc2 but has been found to cause hangs on at least
     systems using the BCM4718A1.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW1zA0hUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN2QyAEAxicOismTbPgSI+2oiaOjJbF5KjXi
 cPNCtDkjwkUafU4A/1jvJlXpw6UHRwFAr6fcfnPMcK6QyYiQ9NnSWz46QYkI
 =X4uV
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.18_5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Paul Burton:
 "Here's one more MIPS fix, reverting an errata workaround that was
  merged for v4.18-rc2 but has since been found to cause system hangs on
  some BCM4718A1-based systems by the OpenWRT project"

* tag 'mips_fixes_4.18_5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"
2018-07-28 12:32:28 -07:00
Linus Torvalds
864af0d40c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kvm, mm: account shadow page tables to kmemcg
  zswap: re-check zswap_is_full() after do zswap_shrink()
  include/linux/eventfd.h: include linux/errno.h
  mm: fix vma_is_anonymous() false-positives
  mm: use vma_init() to initialize VMAs on stack and data segments
  mm: introduce vma_init()
  mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
  ipc/sem.c: prevent queue.status tearing in semop
  mm: disallow mappings that conflict for devm_memremap_pages()
  kasan: only select SLUB_DEBUG with SYSFS=y
  delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
2018-07-27 10:30:47 -07:00
Linus Torvalds
6284c99cf1 More arm64 fixes:
- Fix disabling of kpti on Thunder-X machines
 
 - Fix premature BUILD_BUG_ON() found with randconfig
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJbWu8qAAoJELescNyEwWM09ugIAKGyQrDM0gEUEuv5eoAU2/qf
 lpcsUiKKn4J0QNJFf8fEeNDHA/Ji56ev/VG0DZiBupfMIbN/q59IPft/NLcQn44L
 +XCYunHjIrgDoSuCYjTpvElhaPOD6d1PkMeJKGDoo6GLM9+T7JlIb+tczq+TWnnK
 YfoowNaq/OJqCk920kvazJ1bi/9KFF67JuP5jP07zvkk2PwI/CaWVFvlovJ/1sZL
 qtLaC2wx8ViHoONx+HIRuFZBUfckTWd3uAgSGFhZ4yl0gSCxIK1ABQPjUlbGicK/
 QiF1Pb7MtdfLyfTgrnOJLbKiH8NAu65boidvJiIRpCu6qWaHXqNvRhZaYm8fmVM=
 =V+76
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Inevitably, after saying that I hoped we would be done on the fixes
  front, a couple of issues have cropped up over the last week. Next
  time I'll stay schtum.

  We've fixed an over-eager BUILD_BUG_ON() which Arnd ran into with
  arndconfig, as well as ensuring that KPTI really is disabled on
  Thunder-X1, where the cure is worse than the disease (this regressed
  when we reworked the heterogeneous CPU feature checking).

  Summary:

   - Fix disabling of kpti on Thunder-X machines

   - Fix premature BUILD_BUG_ON() found with randconfig"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
  arm64: Check for errata before evaluating cpu features
2018-07-27 10:26:02 -07:00
Rafał Miłecki
d5ea019f8a
Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"
This reverts commit 2a027b47dba6 ("MIPS: BCM47XX: Enable 74K Core
ExternalSync for PCIe erratum").

Enabling ExternalSync caused a regression for BCM4718A1 (used e.g. in
Netgear E3000 and ASUS RT-N16): it simply hangs during PCIe
initialization. It's likely that BCM4717A1 is also affected.

I didn't notice that earlier as the only BCM47XX devices with PCIe I
own are:
1) BCM4706 with 2 x 14e4:4331
2) BCM4706 with 14e4:4360 and 14e4:4331
it appears that BCM4706 is unaffected.

While BCM5300X-ES300-RDS.pdf seems to document that erratum and its
workarounds (according to quotes provided by Tokunori) it seems not even
Broadcom follows them.

According to the provided info Broadcom should define CONF7_ES in their
SDK's mipsinc.h and implement workaround in the si_mips_init(). Checking
both didn't reveal such code. It *could* mean Broadcom also had some
problems with the given workaround.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Michael Marley <michael@michaelmarley.com>
Patchwork: https://patchwork.linux-mips.org/patch/20032/
URL: https://bugs.openwrt.org/index.php?do=details&task_id=1688
Cc: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-07-27 10:07:32 -07:00
Shakeel Butt
d97e5e6160 kvm, mm: account shadow page tables to kmemcg
The size of kvm's shadow page tables corresponds to the size of the
guest virtual machines on the system.  Large VMs can spend a significant
amount of memory as shadow page tables which can not be left as system
memory overhead.  So, account shadow page tables to the kmemcg.

[shakeelb@google.com: replace (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT]
  Link: http://lkml.kernel.org/r/20180629140224.205849-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627181349.149778-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Kirill A. Shutemov
2c4541e24c mm: use vma_init() to initialize VMAs on stack and data segments
Make sure to initialize all VMAs properly, not only those which come
from vm_area_cachep.

Link: http://lkml.kernel.org/r/20180724121139.62570-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Wang YanQing
9e4e5b5c86 bpf, x32: Fix regression caused by commit 24dea04767e6
Commit 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
removed the 4 /* Extra space for skb_copy_bits buffer */
from _STACK_SIZE, but it didn't fix the concerned code
in emit_prologue and emit_epilogue, and this error will
bring very strange kernel runtime errors. This patch
fixes it.

Fixes: 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
Reported-by: Meelis Roos <mroos@linux.ee>
Bisected-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-26 02:51:12 +02:00
Johannes Weiner
7b0eb6b41a arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:

  /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
  /git/arm-soc/include/linux/compiler.h:357:38: error: call to
  '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
  failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)

The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:

  #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
  BITS_PER_LONG - NR_PAGEFLAGS
  #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
  #else
  #define LAST_CPUPID_WIDTH 0
  #endif

  #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
  #define LAST_CPUPID_NOT_IN_PAGE_FLAGS
  #endif

which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.

However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.

Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.

Says Arnd:

    Further experiments show that the build error already existed before,
    but was only triggered with larger values of CONFIG_NR_CPU and/or
    CONFIG_NODES_SHIFT that might be used in actual configurations but
    not in randconfig builds.

    With longer CPU and node masks, I could recreate the problem with
    kernels as old as linux-4.7 when arm64 NUMA support got added.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear region")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-25 13:32:30 +01:00
Dirk Mueller
dc0e36581e arm64: Check for errata before evaluating cpu features
Since commit d3aec8a28be3b8 ("arm64: capabilities: Restrict KPTI
detection to boot-time CPUs") we rely on errata flags being already
populated during feature enumeration. The order of errata and
features was flipped as part of commit ed478b3f9e4a ("arm64:
capabilities: Group handling of features and errata workarounds").

Return to the orginal order of errata and feature evaluation to
ensure errata flags are present during feature evaluation.

Fixes: ed478b3f9e4a ("arm64: capabilities: Group handling of
    features and errata workarounds")
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
CC: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-25 13:30:04 +01:00
Kees Cook
92a4728608 x86/boot: Fix if_changed build flip/flop bug
Dirk Gouders reported that two consecutive "make" invocations on an
already compiled tree will show alternating behaviors:

$ make
  CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready  (#48)
  Building modules, stage 2.
  MODPOST 165 modules

$ make
  CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  LD      arch/x86/boot/compressed/vmlinux
  ZOFFSET arch/x86/boot/zoffset.h
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  OBJCOPY arch/x86/boot/vmlinux.bin
  BUILD   arch/x86/boot/bzImage
Setup is 15644 bytes (padded to 15872 bytes).
System is 6663 kB
CRC 3eb90f40
Kernel: arch/x86/boot/bzImage is ready  (#48)
  Building modules, stage 2.
  MODPOST 165 modules

He bisected it back to:

    commit 98f78525371b ("x86/boot: Refuse to build with data relocations")

The root cause was the use of the "if_changed" kbuild function multiple
times for the same target. It was designed to only be used once per
target, otherwise it will effectively always trigger, flipping back and
forth between the two commands getting recorded by "if_changed". Instead,
this patch merges the two commands into a single function to get stable
build artifacts (i.e. .vmlinux.cmd), and a single build behavior.

Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net>
Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 12:00:08 +02:00
Peter Zijlstra
6cbc304f2f perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
Vince reported the perf_fuzzer giving various unwinder warnings and
Josh reported:

> Deja vu.  Most of these are related to perf PEBS, similar to the
> following issue:
>
>   b8000586c90b ("perf/x86/intel: Cure bogus unwind from PEBS entries")
>
> This is basically the ORC version of that.  setup_pebs_sample_data() is
> assembling a franken-pt_regs which ORC isn't happy about.  RIP is
> inconsistent with some of the other registers (like RSP and RBP).

And where the previous unwinder only needed BP,SP ORC also requires
IP. But we cannot spoof IP because then the sample will get displaced,
entirely negating the point of PEBS.

So cure the whole thing differently by doing the unwind early; this
does however require a means to communicate we did the unwind early.
We (ab)use an unused sample_type bit for this, which we set on events
that fill out the data->callchain before the normal
perf_prepare_sample().

Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:46:21 +02:00
Waiman Long
c0dc373a78 locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
The LOCK_PREFIX macro should be used in the __raw_callee_save___pv_queued_spin_unlock()
assembly code, so that the lock prefix can be patched out on UP systems.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joe Mario <jmario@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1531858560-21547-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:22:20 +02:00
Linus Torvalds
9981b4fb86 A couple more MIPS fixes for 4.18:
- Fix an off-by-one in reporting PCI resource sizes to userland which
     regressed in v3.12.
 
   - Fix writes to DDR controller registers used to flush write buffers,
     which regressed with some refactoring in v4.2.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW1eaYhUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN3KEQD/R35j0nU28kbNtIG4bDABZmjN42qy
 KRwLdryf3eRLVYkA/3QUGskbEKzJjvasM2Ia+hyx0CMmgigjtrr6iK5B+8sJ
 =0pT0
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A couple more MIPS fixes for 4.18:

   - Fix an off-by-one in reporting PCI resource sizes to userland which
     regressed in v3.12.

   - Fix writes to DDR controller registers used to flush write buffers,
     which regressed with some refactoring in v4.2"

* tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: ath79: fix register address in ath79_ddr_wb_flush()
  MIPS: Fix off-by-one in pci_resource_to_user()
2018-07-24 18:11:15 -07:00
Linus Torvalds
0723090656 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle stations tied to AP_VLANs properly during mac80211 hw
    reconfig. From Manikanta Pubbisetty.

 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.

 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
    Ben Elisha.

 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.

 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.

 6) Fix cached netdev name leak in nf_tables, from Florian Westphal.

 7) Fix memory leaks on chain rename, also from Florian Westphal.

 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
    Cheng.

 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.

10) Fix link local address handling with VRF, from David Ahern.

11) Don't clobber 'err' on a successful call to __skb_linearize() in
    skb_segment(). From Eric Dumazet.

12) Fix vxlan fdb notification races, from Roopa Prabhu.

13) Hash UDP fragments consistently, from Paolo Abeni.

14) If TCP receives lots of out of order tiny packets, we do really
    silly stuff. Make the out-of-order queue ending more robust to this
    kind of behavior, from Eric Dumazet.

15) Don't leak netlink dump state in nf_tables, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: axienet: Fix double deregister of mdio
  qmi_wwan: fix interface number for DW5821e production firmware
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  bnx2x: Fix invalid memory access in rss hash config path.
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  r8169: restore previous behavior to accept BIOS WoL settings
  cfg80211: never ignore user regulatory hint
  sock: fix sg page frag coalescing in sk_alloc_sg
  netfilter: nf_tables: move dumper state allocation into ->start
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  ip: hash fragments consistently
  ipv6: use fib6_info_hold_safe() when necessary
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  ...
2018-07-24 17:31:47 -07:00
Andy Lutomirski
b3681dd548 x86/entry/64: Remove %ebx handling from error_entry/exit
error_entry and error_exit communicate the user vs. kernel status of
the frame using %ebx.  This is unnecessary -- the information is in
regs->cs.  Just use regs->cs.

This makes error_entry simpler and makes error_exit more robust.

It also fixes a nasty bug.  Before all the Spectre nonsense, the
xen_failsafe_callback entry point returned like this:

        ALLOC_PT_GPREGS_ON_STACK
        SAVE_C_REGS
        SAVE_EXTRA_REGS
        ENCODE_FRAME_POINTER
        jmp     error_exit

And it did not go through error_entry.  This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.

Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used.  As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks.  Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes.  This was introduced by:

    commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")

With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.

I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.

[ Historical note: I wrote this patch as a cleanup before I was aware
  of the bug it fixed. ]

[ Note to stable maintainers: this should probably get applied to all
  kernels.  If you're nervous about that, a more conservative fix to
  add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
  also fix the problem. ]

Reported-and-tested-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24 10:07:36 +02:00
Len Brown
d9e6dbcf28 x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
All SKX with stepping higher than 4 support the TSC_DEADLINE,
no matter the microcode version.

Without this patch, upcoming SKX steppings will not be able to use
their TSC_DEADLINE timer.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: <stable@kernel.org> # v4.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 616dd5872e ("x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping")
Link: http://lkml.kernel.org/r/d0c7129e509660be9ec6b233284b8d42d90659e8.1532207856.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24 10:05:13 +02:00
Thomas Gleixner
d2753e6b48 perf/x86/amd/ibs: Don't access non-started event
Paul Menzel reported the following bug:

> Enabling the undefined behavior sanitizer and building GNU/Linux 4.18-rc5+
> (with some unrelated commits) with GCC 8.1.0 from Debian Sid/unstable, the
> warning below is shown.
>
> > [    2.111913]
> > ================================================================================
> > [    2.111917] UBSAN: Undefined behaviour in arch/x86/events/amd/ibs.c:582:24
> > [    2.111919] member access within null pointer of type 'struct perf_event'
> > [    2.111926] CPU: 0 PID: 144 Comm: udevadm Not tainted 4.18.0-rc5-00316-g4864b68cedf2 #104
> > [    2.111928] Hardware name: ASROCK E350M1/E350M1, BIOS TIMELESS 01/01/1970
> > [    2.111930] Call Trace:
> > [    2.111943]  dump_stack+0x55/0x89
> > [    2.111949]  ubsan_epilogue+0xb/0x33
> > [    2.111953]  handle_null_ptr_deref+0x7f/0x90
> > [    2.111958]  __ubsan_handle_type_mismatch_v1+0x55/0x60
> > [    2.111964]  perf_ibs_handle_irq+0x596/0x620

The code dereferences event before checking the STARTED bit. Patch
below should cure the issue.

The warning should not trigger, if I analyzed the thing correctly.
(And Paul's testing confirms this.)

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Menzel <pmenzel+linux-x86@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807200958390.1580@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24 09:51:10 +02:00
Martin Schwidefsky
2fba3573f1 s390: disable gcc plugins
The s390 build currently fails with the latent entropy plugin:

arch/s390/kernel/als.o: In function `verify_facilities':
als.c:(.init.text+0x24): undefined reference to `latent_entropy'
als.c:(.init.text+0xae): undefined reference to `latent_entropy'
make[3]: *** [arch/s390/boot/compressed/vmlinux] Error 1
make[2]: *** [arch/s390/boot/compressed/vmlinux] Error 2
make[1]: *** [bzImage] Error 2

This will be fixed with the early boot rework from Vasily, which
is planned for the 4.19 merge window.

For 4.18 the simplest solution is to disable the gcc plugins and
reenable them after the early boot rework is upstream.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-24 08:10:52 +02:00
Al Viro
f88a333b44 alpha: fix osf_wait4() breakage
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)

[ Also, fix the prototype of kernel_wait4() to have that __user
  annotation   - Linus ]

Fixes: 92ebce5ac55d ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-22 11:51:30 -07:00
Brijesh Singh
9b788f32be x86/efi: Access EFI MMIO data as unencrypted when SEV is active
SEV guest fails to update the UEFI runtime variables stored in the
flash.

The following commit:

  1379edd59673 ("x86/efi: Access EFI data as encrypted when SEV is active")

unconditionally maps all the UEFI runtime data as 'encrypted' (C=1).

When SEV is active the UEFI runtime data marked as EFI_MEMORY_MAPPED_IO
should be mapped as 'unencrypted' so that both guest and hypervisor can
access the data.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org> # 4.15.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 1379edd59673 ("x86/efi: Access EFI data as encrypted ...")
Link: http://lkml.kernel.org/r/20180720012846.23560-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-22 14:10:38 +02:00
Linus Torvalds
45ae4df922 ARM: SoC fixes for 4.18-rc
- Fix interrupt type on ethernet switch for i.MX-based RDU2
  - GPC on i.MX exposed too large a register window which resulted in
    userspace being able to crash the machine.
  - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl
    on droid4.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAltToboPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3gP4P/3Mntu351wZpaczDQZFQxSvxnmT2Ocr9YKFR
 u5UOoE1hTCxOHcrZO7C/EbIKaqD1QhpxMPevcqpOid0glAiEj6D0c0qewAML2vwH
 gGENHX5z5phwrK7RDJZhBiH2jKCg8ttOn0QSoHxGGZNSPAL2nimMwD0IbiqTI5dx
 SkqecCPBwmizpfltdOCRRhN9RCiIvzcqoyLz0HjZ/sff1Y+t3U+alq227rZkQOki
 bj9uD+XkKYZzgiECd6HfMtPHUSUusSXcpF/TyfdnHeyHpF1E3InPVC7dbTASFnxb
 C6zrX99c2Fu11TV7Kkkn1LTwA0rRuXQmSV7ZWZMOqQBrONqGpy0CPIY+LA1xYGCd
 8VtgP7qj0m7XKkPyEriwNDSKKE+c7cCYn9VpR6Kg5xmw0DUCTohMQmeZRo2sMylT
 UlYMjNKQ53IuPullwRaJVM63kA3CuFo3fyStg18SYcx2lRFO8lcGJYqjqd91KkDF
 ZW/tG9V6v7lz/3J3XUOFTJNWwi1CUKEIMM3ObtfDAZToyS1zbe5kX+kiTcUnvGty
 wv3aWCknQnru++vYhtIYLsqwu/NwoJLTWppEmX4YxoV8fW4Yw95e3zjwzWn7rczQ
 dNv7b4Hz/gpDZk7o3dpBpajTCzhh549bDfY9yxBpkd+otUKvgjKkKvsiqCkFV+j7
 dcs5FMT5
 =VFnX
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:

 - Fix interrupt type on ethernet switch for i.MX-based RDU2

 - GPC on i.MX exposed too large a register window which resulted in
   userspace being able to crash the machine.

 - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl on
   droid4.

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch
  soc: imx: gpc: restrict register range for regmap access
  ARM: dts: omap4-droid4: fix dts w.r.t. pwm
2018-07-21 17:27:42 -07:00
Linus Torvalds
ef81e63e17 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "A single fix for a MCE-polling regression, which prevented the
  disabling of polling"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE: Remove min interval polling limitation
2018-07-21 17:25:49 -07:00
Linus Torvalds
43227e098c Merge branch 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Ingo Molnar:
 "An APM fix, and a BTS hardware-tracing fix related to PTI changes"

* 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apm: Don't access __preempt_count with zeroed fs
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
2018-07-21 17:23:58 -07:00
Linus Torvalds
ea75a2c715 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core kernel fixes from Ingo Molnar:
 "This is mostly the copy_to_user_mcsafe() related fixes from Dan
  Williams, and an ORC fix for Clang"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
  lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
  lib/iov_iter: Document _copy_to_iter_flushcache()
  lib/iov_iter: Document _copy_to_iter_mcsafe()
  objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
2018-07-21 16:52:08 -07:00
Linus Torvalds
ffb48e7924 powerpc fixes for 4.18 #4
Two regression fixes, one for xmon disassembly formatting and the other to fix
 the E500 build.
 
 Two commits to fix a potential security issue in the VFIO code under obscure
 circumstances.
 
 And finally a fix to the Power9 idle code to restore SPRG3, which is user
 visible and used for sched_getcpu().
 
 Thanks to:
   Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy, James Clarke.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbUrFqExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCW
 mQ//eYpaYIkEthXH0uHUN2wpWZlhbg0wUtjclsT5RUonDwMC2PM9BUuLk61RtIlD
 jbi39GrUISHf13U1Ydhb3e1I5B+TpDe6hgb/dGUMAXPe6rt+jFREogZR+vgIU0ep
 Q8ta1GIgbI6La0zXn5o3apUtqR7bAQ3cWD2T8vN4tQmAQZPw1fV13cZZ2kFs5JFO
 aYX8pD76wkAUz8Im+bweziRSRYdAIi8Oxt1Cdyg9Oti5y4fp4LuQKa/qbfkswkkk
 2ycG3TWr4Ln8RM/GaUPW1UPh1Zd8b6et7vUnhxO1g/JdEXlnm9A+DifJFrUk/+y8
 DsyofdXUj+u+LXX/H8uqqse7ysfvkC53R15Jo8irISsIgYZcv4yKsZKGJR27wHGV
 h9KBDA0ZK+czU2jK4gZAdMTs1IICgXGUVIL+nI8U1sRBep3CI3HguqBVoC/MGes3
 WR2+8i2diL1m2jAHR5Nd3+ArBpI876ZalhDmM1Mv3GRUAvjo9nsehk32/nfBDYUM
 nPQd1vi3F6w1MSlj7aqA3quTeGANwlnOcdC4QB++tz7oO1sY6ZYZtii3jsGSluXL
 vP/Mxvz2AG3v9KPCDQJqzx0n3vMAmGphetddxYl1X1dSwbwLNx0GggKNe0xw6sjp
 6lbin0w/Ot79VoSEncuRdm1hrtIktwubsH5CGB7Gxke5Kgk=
 =Sfmf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Two regression fixes, one for xmon disassembly formatting and the
  other to fix the E500 build.

  Two commits to fix a potential security issue in the VFIO code under
  obscure circumstances.

  And finally a fix to the Power9 idle code to restore SPRG3, which is
  user visible and used for sched_getcpu().

  Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
  James Clarke"

* tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  powerpc/Makefile: Assemble with -me500 when building for E500
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  powerpc/xmon: Fix disassembly since printf changes
2018-07-21 16:46:53 -07:00
Linus Torvalds
490fc05386 mm: make vm_area_alloc() initialize core fields
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 15:24:03 -07:00
Linus Torvalds
3928d4f5ee mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 13:48:51 -07:00
Olof Johansson
5858610f0d i.MX fixes for 4.18, round 4:
- A fix for i.MX6 RDU2 board on the wrong IRQ type of Marvell switch,
    which might result in a race condition in the interrupt handler and
    cause the OS to miss all future events.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJbUVhyAAoJEFBXWFqHsHzOss4H/3nHBKfbjC0twTK3J4ou3jDO
 3JboghAt6bxKb/aS1zi8h3d7HDchV5FRkp87TX0qWss6RpS/cMPvQv2DCtgJIYMr
 M/M59oxJJsZpen105tMiUFermrPEGz7vmy4FkmG8t2giSQj78XZYQnZsp77AcTyC
 IP2wNcVBYwfis3GvDuKgBduZlAV42tqL0U02HsaOvmHjhGcqLzJxlwDAa2es6/zU
 KmbBatTR78oP2xf68BXQVB+x8WEjLxNI9J3c4uuLjYTxDxCKU+QNi57XS1VXp13q
 72x0lxhe9uTOC+tipvTvj449RigOIfqhlyg7IIE/5xOIKZFUfZZSYZmQ00lx1O4=
 =grcI
 -----END PGP SIGNATURE-----

Merge tag 'imx-fixes-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes

i.MX fixes for 4.18, round 4:
 - A fix for i.MX6 RDU2 board on the wrong IRQ type of Marvell switch,
   which might result in a race condition in the interrupt handler and
   cause the OS to miss all future events.

* tag 'imx-fixes-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-20 14:22:11 -07:00
Linus Torvalds
2a0ea7df1f ARC fixes for 4.18-rc6
- Fix CONFIG_SWAP [Alexey]
 
  - Robistify cmpxchg emulation for systems w/o atomics [Alexey / PeterZ]
 
  - Allow mprotext(PROT_EXEC) for stack mappings [Vineet]
 
  - HSDK platform enable PCIe, APG GPIO [Gustavo]
 
  - miscll other fixes, config updates etc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbUimaAAoJEGnX8d3iisJeBRIP/23CVvREA/TzEd+F5s+5aI3E
 ZnC87eyibuzYmZdLNq65V7WVhjjpP+ijSEss4xoTgMjdrsFylVZTxdwZcSKjhZp1
 kvNKmpxYOWfrvYAqKeMEL/LhzDxQB7Q3sHMulYMAHQBz1vN4lZndMwzlwS/dHXMM
 bm8b4Z3IhxVfQM2IbfTXa2NapEFx5U1K06sF35a5wTJ9acu4Jp4nr/iREmwplPy9
 eKjN6UUv7RpRpLdF/bzi3rLZ6rQtuu+SP6e11t5YO+Yrdq6Qg/M0hiGvnxhHDljP
 lDYed4k1+daq0Cogg8nOX06vAi7JYy+YGgYAVjXAMHNibwF+Cqk/KKZoS4M6C4co
 3QulXi9hbrcGszUXvbJKbwQztmFa1wuLt3wLsZKJJ+E/B78t1TpXlXuQSZbtjVrv
 r5CRAi/q1EeeRP3d3IsKYL9LEQkpksCv3fAhNmbNwKy0TmXEhyerO55umMnJ0Pc+
 fd86QIu+91aNL8LH6IesuFFFccrS65vIFu2neNQcV0q2neSVoFKQrYrQ/g14nVw/
 ycXOjVKJsTTosKFJhWFDOptUTRiySa0k6VEy92jGtSb/TQRfWLNoTGNVkU20oNrn
 a4ckeUnYyaG+63Y0YW3yCUkMHBVtx4i0PPYv8BZt4CXB8dYaPKiRTKQE5uundgkO
 e09YDZW8qPB9Y/T5WK88
 =qvRf
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "ARC is back after radio silence in 4.17:

   - Fix CONFIG_SWAP [Alexey]

   - Robustify cmpxchg emulation for systems w/o atomics [Alexey /
     PeterZ]

   - Allow mprotext(PROT_EXEC) for stack mappings [Vineet]

   - HSDK platform enable PCIe, APG GPIO [Gustavo]

   - miscll other fixes, config updates etc"

* tag 'arc-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: [plat-hsdk]: Save accl reg pair by default
  ARC: mm: allow mprotect to make stack mappings executable
  ARC: Fix CONFIG_SWAP
  ARC: [arcompact] entry.S: minor code movement
  ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
  ARC: configs: remove no longer needed CONFIG_DEVPTS_MULTIPLE_INSTANCES
  ARC: Improve cmpxchg syscall implementation
  ARC: [plat-hsdk]: Configure APB GPIO controller on ARC HSDK platform
  ARC: [plat-hsdk] Add PCIe support
  ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP
  ARC: Explicitly add -mmedium-calls to CFLAGS
2018-07-20 11:33:22 -07:00
Linus Torvalds
293bccc5b2 nds32 patches for 4.18
Here is the nds32 patch set based on 4.18-rc1.
 Contained in here are the bug fixes and building error fixes for nds32.
 
 These are the LTP20170427 testing results.
 
 Total Tests: 1902
 Total Skipped Tests: 593
 Total Failures: 418
 Kernel Version: 4.18.0-rc1-00006-g987553894f0c-dirty
 Machine Architecture: nds32
 
 Signed-off-by: Greentime Hu <greentime@andestech.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.17 (GNU/Linux)
 
 iQIcBAABAgAGBQJbQxfIAAoJEHfB0l0b2JxE5kYP/A3jMMfWMxaW6DqYjszxT1ew
 TFfTKX6FcRTiUqLchMJumgdFKq5CKjESq9Ir2tp1QNvF19LH1Z6dfwQja6NQzJlO
 tfJ1t3O1lEM9Qqvvu/Oe8nkXZoGmnVsdPW3hCRkuJmIgcW0oePQBALdNDRnhXAuw
 riwgzavvF9SkRE0uSuGz//0MGB/6KrnJw6jEdCdxuWVHdrQBQ79BD9KfK7ZQFQg+
 wCAeN8bMbP2j7vr1SpclbUvdzj21/Y6mI3OMBHYcXnb9VWkk+Hl0ph2AyYZHTXVh
 nRjkxU1WcCL3S3yWHJYjwxN0SBKBCFhv8Yitt3r0WnW9PWz3DdQGdMPghC2VDikU
 xTmOX/3511fl99aDM0vLDn3nZDnQjLHsb9kF3EEIn7om/Q20WHPpgvZ6zfEVgQKZ
 BPghp3pJilBsB/rKmWqw3RDrVHIBRlzLz70KJOn/S5y1e0jFcKEIs5/f945hDjsz
 18LifgAywPI7GwbKzIT6GvI242yxC+gA7SHw+pRGVa1HRkL7ujiPMAJ+emkUchLN
 Il7jONxN328zLNSEGCdVfJf7uqUN6i5tfM488hpZXPQZ/QP0uTuvtHjruDCNJVWg
 rKaz3eo4YwdMDlYpaFYLfS7/nIVLGKY6wkdPaFvmhgw6xnWfVEATWP+VgEh2056v
 kIzuxUJjjR8Ocwv/ooMG
 =9cq3
 -----END PGP SIGNATURE-----

Merge tag 'nds32-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux

Pull nds32 updates from Greentime Hu:
 "Bug fixes and build ixes for nds32"

* tag 'nds32-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
  nds32: fix build error "relocation truncated to fit: R_NDS32_25_PCREL_RELA" when make allyesconfig
  nds32: To simplify the implementation of update_mmu_cache()
  nds32: Fix the dts pointer is not passed correctly issue.
  nds32: To implement these icache invalidation APIs since nds32 cores don't snoop data cache. This issue is found by Guo Ren. Based on the Documentation/core-api/cachetlb.rst and it says:
  nds32: Fix build error caused by configuration flag rename
  nds32: define __NDS32_E[BL]__ for sparse
2018-07-20 11:18:33 -07:00
Felix Fietkau
bc88ad2efd
MIPS: ath79: fix register address in ath79_ddr_wb_flush()
ath79_ddr_wb_flush_base has the type void __iomem *, so register offsets
need to be a multiple of 4 in order to access the intended register.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 24b0e3e84fbf ("MIPS: ath79: Improve the DDR controller interface")
Patchwork: https://patchwork.linux-mips.org/patch/19912/
Cc: Alban Bedel <albeu@free.fr>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # 4.2+
2018-07-20 10:17:04 -07:00
Uwe Kleine-König
e01a06c808 ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch
The Marvell switches report their interrupts in a level sensitive way.
When using edge sensitive detection a race condition in the interrupt
handler of the swich might result in the OS to miss all future events
which might make the switch non-functional.

The problem is that both mv88e6xxx_g2_irq_thread_fn() and
mv88e6xxx_g1_irq_thread_work() sample the irq cause register
(MV88E6XXX_G2_INT_SRC and MV88E6XXX_G1_STS respectively) once and then
handle the observed sources. If after sampling but before all observed
irq sources are handled a new irq source gets active this is not noticed
by the handler which returns unsuspecting, but the interrupt line stays
active which prevents the edge detector to kick in.

All device trees but imx6qdl-zii-rdu2 get this right (most of them by
not specifying an interrupt parent). So fix imx6qdl-zii-rdu2
accordingly.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: f64992d1a916 ("ARM: dts: imx6: RDU2: Add Switch interrupts")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2018-07-20 10:50:44 +08:00
Daniel Borkmann
b9c1e60e7b bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd
None of the JITs is allowed to implement exit paths from the BPF
insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
in eBPF to cBPF translation to retain old existing behavior where
exceptions may occur; they are also tightly controlled by the
verifier where it disallows some of the features such as BPF to
BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
program. During recent review of all BPF_XADD JIT implementations
I noticed that the ppc64 one is buggy in that it contains two
jumps to exit paths. This is problematic as this can bypass verifier
expectations e.g. pointed out in commit f6b1b3bf0d5f ("bpf: fix
subprog verifier bypass by div/mod by 0 exception"). The first
exit path is obsoleted by the fix in ca36960211eb ("bpf: allow xadd
only on aligned memory") anyway, and for the second one we need to
do a fetch, add and store loop if the reservation from lwarx/ldarx
was lost in the meantime.

Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-07-19 16:08:06 -07:00
Olof Johansson
1f9f163500 One omap dts mismerge fix
The dts patch for droid4 PWM vibrator has added gpio6 entries to the wrong
 node. Let's fix it with a note that there seems to be also other GPIO PWM
 issues to fix still to get the PWM vibrator working. So this can wait for
 v4.19 merge cycle if necessary.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAltQPwERHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXMvERAAqzZoTW6Qd6WnxRQIrIpNHTPpyuiu7uGU
 DdaJUvHCn93JfJyuG6VCwSph8SI8o+YVQGV67Rr7TO7GDW8WGrcuMnfB+0z6Xm6Y
 HbTlXVP3sImykowoAEFF4VJmSclj5qzYk2n72KlpT7pJQnJqe0DmigCV/IkFn9A6
 q1fSKcQH7A0qjPOdefgF/zPVgNbxy1JkO/dzKyXZPc0LmjpwWfNWntbPyqNWi7om
 oazVpRwM+8640S7wBhYBn2T0KoQZus8pU/Oy03UAgrUbL4yhcOhZhYJKrmvXmXcW
 e/V8k071zrLc9XVL1DYVfFmHA4mzve6efmQnem2krpaki5n6jF5YEkayhiXqC0B2
 RF24oKFsuztRTgkkJfGumiuassF3wGdPpAPL2DwFIX5qJwqvcg96AhzdAMc5Af4m
 FTFMhtcYgf5G1/yDI8IyVQNoqvBci/n7/zUd7NBluizcZm5YIrcShzAh/WERgxuO
 3x+fDhCAP1zzeyFAJlOMhNCYWesRTuT/9kQrP95saXjfcbaXZt8hsARJgXA8x2V/
 40ONNcvrAMbC10jqAPzXdTgSdHTFO2cZs3DdCwHSvkFFhfczH/E/2OOKzfxZCxHL
 2OCcCOWuZD3WyUlYs01vZOMWPzN/2gJYi4upAMSxyUDN67ios8TKOBM+VU5Ec2Vh
 5TqC8lQfmew=
 =1jV8
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v4.18/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

One omap dts mismerge fix

The dts patch for droid4 PWM vibrator has added gpio6 entries to the wrong
node. Let's fix it with a note that there seems to be also other GPIO PWM
issues to fix still to get the PWM vibrator working. So this can wait for
v4.19 merge cycle if necessary.

* tag 'omap-for-v4.18/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: omap4-droid4: fix dts w.r.t. pwm

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-19 15:07:12 -07:00
Vineet Gupta
af1fc5baa7 ARCv2: [plat-hsdk]: Save accl reg pair by default
This manifsted as strace segfaulting on HSDK because gcc was targetting
the accumulator registers as GPRs, which kernek was not saving/restoring
by default.

Cc: stable@vger.kernel.org   #4.14+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-19 10:36:45 -07:00
Linus Torvalds
47f7dc4b84 Miscellaneous bugfixes, plus a small patchlet related to Spectre v2.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbTwvXAAoJEL/70l94x66D068H/0lNKsk33AHZGsVOr3qZJNpE
 6NI746ZXurRNNZ6d64hVIBDfTI4P3lurjQmb9/GUSwvoHW0S2zMug0F59TKYQ3EO
 kcX+b9LRmBkUq2h2R8XXTVkmaZ1SqwvXVVzx80T2cXAD3J3kuX6Yj+z1RO7MrXWI
 ZChA3ZT/eqsGEzle+yu/YExAgbv+7xzuBNBaas7QvJE8CHZzPKYjVBEY6DAWx53L
 LMq8C3NsHpJhXD6Rcq9DIyrktbDSi+xRBbYsJrhSEe0MfzmgBkkysl86uImQWZxk
 /2uHUVz+85IYy3C+ZbagmlSmHm1Civb6VyVNu9K3nRxooVtmmgudsA9VYJRRVx4=
 =M0K/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Miscellaneous bugfixes, plus a small patchlet related to Spectre v2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvmclock: fix TSC calibration for nested guests
  KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
  KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
  KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
  x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
  x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
  kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
  x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
  KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
2018-07-18 11:08:44 -07:00
Gautham R. Shenoy
b03897cf31 powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
On 64-bit servers, SPRN_SPRG3 and its userspace read-only mirror
SPRN_USPRG3 are used as userspace VDSO write and read registers
respectively.

SPRN_SPRG3 is lost when we enter stop4 and above, and is currently not
restored.  As a result, any read from SPRN_USPRG3 returns zero on an
exit from stop4 (Power9 only) and above.

Thus in this situation, on POWER9, any call from sched_getcpu() always
returns zero, as on powerpc, we call __kernel_getcpu() which relies
upon SPRN_USPRG3 to report the CPU and NUMA node information.

Fix this by restoring SPRN_SPRG3 on wake up from a deep stop state
with the sprg_vdso value that is cached in PACA.

Fixes: e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-18 20:40:17 +10:00
James Clarke
4e4a4b75cc powerpc/Makefile: Assemble with -me500 when building for E500
Some of the assembly files use instructions specific to BookE or E500,
which are rejected with the now-default -mcpu=powerpc, so we must pass
-me500 to the assembler just as we pass -me200 for E200.

Fixes: 4bf4f42a2feb ("powerpc/kbuild: Set default generic machine type for 32-bit compile")
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-18 20:34:42 +10:00
Peng Hao
e10f780503 kvmclock: fix TSC calibration for nested guests
Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.

However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-18 11:43:17 +02:00
Liran Alon
2307af1c4b KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
by MSR_IA32_VMX_BASIC.

However, even though not explictly documented by TLFS, VMXArea passed
as VMXON argument should still be marked with revision_id reported by
physical CPU.

This issue was found by the following setup:
* L0 = KVM which expose eVMCS to it's L1 guest.
* L1 = KVM which consume eVMCS reported by L0.
This setup caused the following to occur:
1) L1 execute hardware_enable().
2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
3) L0 intercept L1 VMXON and execute handle_vmon() which notes
vmxarea->revision_id != VMCS12_REVISION and therefore fails with
nested_vmx_failInvalid() which sets RFLAGS.CF.
4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
hardware_enable() continues as usual.
5) L1 hardware_enable() then calls ept_sync_global() which executes
INVEPT.
6) L0 intercept INVEPT and execute handle_invept() which notes
!vmx->nested.vmxon and thus raise a #UD to L1.
7) Raised #UD caused L1 to panic.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 773e8a0425c923bc02668a2d6534a5ef5a43cc69
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-18 11:31:28 +02:00