28268 Commits

Author SHA1 Message Date
Ricardo Neri
e27c310af5 ptrace,x86: Make user_64bit_mode() available to 32-bit builds
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
is selected. This implies that code built with CONFIG_X86_64=n cannot use
it. If a piece of code needs to be built for both CONFIG_X86_64=y and
CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
an #ifdef/#endif; potentially, in multiple places.

This can be easily avoided with a single #ifdef/#endif pair within
user_64bit_mode() itself.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Garnier <thgarnie@google.com>
Link: https://lkml.kernel.org/r/1509135945-13762-4-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:08 +01:00
Ricardo Neri
b15d70df6e x86/mpx: Simplify handling of errors when computing linear addresses
When errors occur in the computation of the linear address, -1L is
returned. Rather than having a separate return path for errors, the
variable used to return the computed linear address can be initialized
with the error value. Hence, only one return path is needed. This makes
the function easier to read.

While here, ensure that the error value is -1L, a 64-bit value, rather
than -1, a 32-bit value.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/1509135945-13762-6-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:08 +01:00
Ricardo Neri
ed40a10431 uprobes/x86: Use existing definitions for segment override prefixes
Rather than using hard-coded values of the segment override prefixes,
leverage the existing definitions provided in inat.h.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/1509135945-13762-5-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:08 +01:00
Ricardo Neri
b0ce5b8c95 x86/boot: Relocate definition of the initial state of CR0
Both head_32.S and head_64.S utilize the same value to initialize the
control register CR0. Also, other parts of the kernel might want to access
this initial definition (e.g., emulation code for User-Mode Instruction
Prevention uses this state to provide a sane dummy value for CR0 when
emulating the smsw instruction). Thus, relocate this definition to a
header file from which it can be conveniently accessed.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: linux-mm@kvack.org
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-arch@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/1509135945-13762-3-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:07 +01:00
Ricardo Neri
1067f03099 x86/mm: Relocate page fault error codes to traps.h
Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:07 +01:00
Borislav Petkov
7298f08ea8 x86/mcelog: Get rid of RCU remnants
Jeremy reported a suspicious RCU usage warning in mcelog.

/dev/mcelog is called in process context now as part of the notifier
chain and doesn't need any of the fancy RCU and lockless accesses which
it did in atomic context.

Axe it all in favor of a simple mutex synchronization which cures the
problem reported.

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-edac@vger.kernel.org
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171101164754.xzzmskl4ngrqc5br@pd.tnic
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1498969
2017-11-01 21:24:36 +01:00
Vlastimil Babka
cb0631fd3c x86/mm: fix use-after-free of vma during userfaultfd fault
Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
__do_page_fault() with the following reproducer:

  mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
  mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
  r0 = userfaultfd(0x0)
  ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
  ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
  r1 = gettid()
  syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
  tkill(r1, 0x7)

The vma should be pinned by mmap_sem, but handle_userfault() might (in a
return to userspace scenario) release it and then acquire again, so when
we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
the vma might be gone.

Specifically, per Andrea the scenario is
 "A return to userland to repeat the page fault later with a
  VM_FAULT_NOPAGE retval (potentially after handling any pending signal
  during the return to userland). The return to userland is identified
  whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
  vmf->flags"

However, since commit a3c4fb7c9c2e ("x86/mm: Fix fault error path using
unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
that point, which can thus become use-after-free.  Fix this by moving
the read before calling handle_mm_fault().

Reported-by: syzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-01 08:09:58 -07:00
Kees Cook
e4dca7b7aa treewide: Fix function prototypes for module_param_call()
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:

@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@

 module_param_call(_name, _set_func, _get_func, _arg, _mode);

@fix_set_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _set_func(
-_val_type _val
+const char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

@fix_get_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _get_func(
-_val_type _val
+char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:

	drivers/platform/x86/thinkpad_acpi.c
	fs/lockd/svc.c

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-31 15:30:37 +01:00
Juergen Gross
6f0e8bf167 xen: support 52 bit physical addresses in pv guests
Physical addresses on processors supporting 5 level paging can be up to
52 bits wide. For a Xen pv guest running on such a machine those
physical addresses have to be supported in order to be able to use any
memory on the machine even if the guest itself does not support 5 level
paging.

So when reading/writing a MFN from/to a pte don't use the kernel's
PTE_PFN_MASK but a new XEN_PTE_MFN_MASK allowing full 40 bit wide MFNs.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31 09:06:48 -04:00
K. Y. Srinivasan
7ed4325a44 Drivers: hv: vmbus: Make panic reporting to be more useful
Hyper-V allows the guest to report panic and the guest can pass additional
information. All this is logged on the host. Currently Linux is passing back
information that is not particularly useful. Make the following changes:

1. Windows uses crash MSR P0 to report bugcheck code. Follow the same
convention for Linux as well.
2. It will be useful to know the gust ID of the Linux guest that has
paniced. Pass back this information.

These changes will help in better supporting Linux on Hyper-V

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 13:40:29 +01:00
Gayatri Kammela
c128dbfa0f x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features
Add a few new SSE/AVX/AVX512 instruction groups/features for enumeration
in /proc/cpuinfo: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI,
AVX512_BITALG.

 CPUID.(EAX=7,ECX=0):ECX[bit 6]  AVX512_VBMI2
 CPUID.(EAX=7,ECX=0):ECX[bit 8]  GFNI
 CPUID.(EAX=7,ECX=0):ECX[bit 9]  VAES
 CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ
 CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI
 CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG

Detailed information of CPUID bits for these features can be found
in the Intel Architecture Instruction Set Extensions and Future Features
Programming Interface document (refer to Table 1-1. and Table 1-2.).
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=197239

Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Yang Zhong <yang.zhong@intel.com>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1509412829-23380-1-git-send-email-gayatri.kammela@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-31 11:02:26 +01:00
Matthias Kaehlcke
6c3b56b197 x86/boot: Disable Clang warnings about GNU extensions
The kernel makes use of several GCC extensions, disable Clang warnings
about that in the boot code, as we already do for the rest of the kernel.

This suppresses the following warning when building with clang:

  ./include/linux/cgroup-defs.h:391:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
        struct cgroup cgrp;

Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171030194351.122090-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-31 10:54:30 +01:00
Linus Torvalds
1960e8eabc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes an objtool regression"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/chacha20 - satisfy stack validation 2.0
2017-10-30 09:31:15 -07:00
Baoquan He
15670bfe19 x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'
register_page_bootmem_memmap()'s 3rd 'size' parameter is named
in a somewhat misleading fashion - rename it to 'nr_pages' which
makes the units of it much clearer.

Meanwhile rename the existing local variable 'nr_pages' to
'nr_pmd_pages', a more expressive name, to avoid conflict with
new function parameter 'nr_pages'.

(Also clean up the unnecessary parentheses in which get_order() is called.)

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1509154238-23250-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-30 10:30:23 +01:00
Ingo Molnar
e17bae3266 Linux 4.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZ9kEFAAoJEHm+PkMAQRiGw6wH/0j197qyGd0hkVFMJO6LAgN3
 KQWS4nZ5BkVDocwv0RVnUJTtXqU1eozFgdVEtSoaFXpzlHGuptR2Tau9efDCJ7w3
 /utZxqvhGebZd2T+j+/o/LE8BRQxhADBNJq2D/o0WNt8ecxuG0GIkhkEYt/o3z1v
 /sxlwVwzXB7Dc/h1WcgGJG7cS6L9KzzAzGAS/iNvdFrPOygHBv8c0MxVZIiBIeeK
 1nZdyvbyM8uenSyG+prGt9ENrqXZxxfwUxIchi2V7A9m1WmD5zijNkf1JCWji/O+
 UsA1auxna7MwoxjxqZuGm4MlKOwZ+8xutk4JGgc+aP/ulndJbJYu+4op/3vaFBM=
 =Mhx+
 -----END PGP SIGNATURE-----

Merge tag 'v4.14-rc7' into x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-30 10:30:09 +01:00
Dou Liyang
ca5d376e17 x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
Commit:

  9043442b43b1 ("locking/paravirt: Use new static key for controlling call of virt_spin_lock()")

sets the static virt_spin_lock_key to a value before jump_label_init()
has been called, which will result in a WARN().

Reorder the initialization sequence:

 - Move the native_pv_lock_init() into native_smp_prepare_cpus()
 - set the value in xen_init_lock_cpu()

to avoid calling into the not yet initialized static keys subsystem.

Suggested-by: Juergen Gross <jgross@suse.com>
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: bp@suse.de
Cc: luto@kernel.org
Cc: vkuznets@redhat.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1509170804-3813-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-30 09:17:29 +01:00
Linus Torvalds
d3eab75a7f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - revert a /dev/mem restriction change that crashes with certain boot
     parameters

   - an AMD erratum fix for cases where the BIOS doesn't apply it

   - fix unwinder debuginfo

   - improve ORC unwinder warning printouts"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"
  x86/unwind: Show function name+offset in ORC error messages
  x86/entry: Fix idtentry unwind hint
  x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't
2017-10-27 17:19:39 -07:00
Ingo Molnar
6856b8e536 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 10:31:44 +02:00
Masahiro Yamada
af8e947079 x86/build: Beautify build log of syscall headers
This makes the build log look nicer.

Before:
  SYSTBL  arch/x86/entry/syscalls/../../include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/entry/syscalls/../../include/generated/asm/syscalls_64.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_x32.h

After:
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/1509077470-2735-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 10:22:45 +02:00
Ingo Molnar
90edaac627 Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"
This reverts commit ce56a86e2ade45d052b3228cdfebe913a1ae7381.

There's unanticipated interaction with some boot parameters like 'mem=',
which now cause the new checks via valid_mmap_phys_addr_range() to be too
restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.

So while the motivation of the change is still entirely valid, we
need a few more rounds of testing to get it right - it's way too late
after -rc6, so revert it for now.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Craig Bergstrom <craigb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dsafonov@virtuozzo.com
Cc: kirill.shutemov@linux.intel.com
Cc: mhocko@suse.com
Cc: oleg@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 10:06:49 +02:00
Christian König
fa564ad963 x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)
Manually enable a 64GB 64-bit BAR so we have enough room for graphics
devices with large framebuffers.

Most BIOSes don't enable this for compatibility reasons.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-10-25 16:07:35 -05:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Alexander Shishkin
2eece390bf perf/x86/intel/bts: Fix exclusive event reference leak
Commit:

  d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")

... adds a privilege check in the exactly wrong place in the event init path:
after the 'LBR exclusive' reference has been taken, and doesn't release it
in the case of insufficient privileges. After this, nobody in the system
gets to use PT or LBR afterwards.

This patch moves the privilege check to where it should have been in the
first place.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")
Link: http://lkml.kernel.org/r/20171023123533.16973-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 13:19:27 +02:00
Will Deacon
506458efaf locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 13:17:33 +02:00
Ingo Molnar
9babb091e0 Linux 4.14-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZ7clWAAoJEHm+PkMAQRiG07AH/iKcej+AsurISHx6i/LUEDC1
 a9wo5HAR5kEj+ohdE3JSkD9BHLcyhcCXaqIk9yOrwi9xv1DrPv8U/nGkKzZJzFi2
 mGWK09Zgi+vgSpA+YSErgl05IVGtgaryQQPqQdawpyRpqTUwP0+2pLnKEnJe0f05
 fpv+S4bDKUCuE8GcVNjF9gxXDg8j60fFa+oAcn7QPS6dCun/H6TbDRue5oeky0Y+
 50ZYjjioy9S9DIm2VF7pktMCP/mK/fgb+Q+4Up09VJGHGhq+891SRJ27yDulxo47
 /gq22SRIGBX2PGNllSwhYslgaCRRlYTMBYOIWrBreanA4NpGD662dp+GgWhD154=
 =TAMw
 -----END PGP SIGNATURE-----

Merge tag 'v4.14-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 13:17:20 +02:00
mike.travis@hpe.com
b3270a5210 x86/platform/UV: Mark tsc_check_sync as an init function
Fix build problem:

>> WARNING: vmlinux.o(.text+0x4223a): Section mismatch in
   reference from the function uv_tsc_check_sync() to the function
   .init.text:uv_early_read_mmr() The function uv_tsc_check_sync()
   references the function __init uv_early_read_mmr().  This is often
   because uv_tsc_check_sync lacks a __init

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: https://lkml.kernel.org/r/20171023191841.985614692@stormcage.americas.sgi.com
2017-10-24 09:10:51 +02:00
Josh Poimboeuf
82c62fa0c4 x86/asm: Don't use the confusing '.ifeq' directive
I find the '.ifeq <expression>' directive to be confusing.  Reading it
quickly seems to suggest its opposite meaning, or that it's missing an
argument.

Improve readability by replacing all of its x86 uses with
'.if <expression> == 0'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/757da028e802c7e98d23fbab8d234b1063e161cf.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23 13:31:34 +02:00
Ingo Molnar
bae9525531 Merge branch 'core/objtool' into x86/asm, to pick up dependent changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23 13:31:17 +02:00
Ingo Molnar
f95b23a112 Merge branch 'x86/urgent' into x86/asm, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23 13:30:47 +02:00
Josh Poimboeuf
58c3862b52 x86/unwind: Show function name+offset in ORC error messages
Improve the warning messages to show the relevant function name+offset.
This makes it much easier to diagnose problems with the ORC metadata.

Before:

  WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b

After:

  WARNING: can't dereference iret registers at ffff880178f5ffe0 for ip int3+0x5b/0x60

Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/6bada6b9eac86017e16bd79e1e77877935cb50bb.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23 13:30:36 +02:00
Josh Poimboeuf
98990a33b7 x86/entry: Fix idtentry unwind hint
This fixes the following ORC warning in the 'int3' entry code:

  WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b

The ORC metadata had the wrong stack offset for the iret registers.

Their location on the stack is dependent on whether the exception has an
error code.

Reported-and-tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations")
Link: http://lkml.kernel.org/r/931d57f0551ed7979d5e7e05370d445c8e5137f8.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23 13:30:35 +02:00
Borislav Petkov
bfc1168de9 x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't
Some F14h machines have an erratum which, "under a highly specific
and detailed set of internal timing conditions" can lead to skipping
instructions and RIP corruption.

Add the fix for those machines when their BIOS doesn't apply it or
there simply isn't BIOS update for them.

Tested-by: <mirh@protonmail.ch>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: http://lkml.kernel.org/r/20171022104731.28249-1-bp@alien8.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
[ Added pr_info() that we activated the workaround. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-22 13:06:02 +02:00
Reinette Chatre
87943db7df x86/intel_rdt: Fix potential deadlock during resctrl mount
Sai reported a warning during some MBA tests:

[  236.755559] ======================================================
[  236.762443] WARNING: possible circular locking dependency detected
[  236.769328] 4.14.0-rc4-yocto-standard #8 Not tainted
[  236.774857] ------------------------------------------------------
[  236.781738] mount/10091 is trying to acquire lock:
[  236.787071]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8117f892>] static_key_enable+0x12/0x30
[  236.797058]
               but task is already holding lock:
[  236.803552]  (&type->s_umount_key#37/1){+.+.}, at: [<ffffffff81208b2f>] sget_userns+0x32f/0x520
[  236.813247]
               which lock already depends on the new lock.

[  236.822353]
               the existing dependency chain (in reverse order) is:
[  236.830686]
               -> #4 (&type->s_umount_key#37/1){+.+.}:
[  236.837756]        __lock_acquire+0x1100/0x11a0
[  236.842799]        lock_acquire+0xdf/0x1d0
[  236.847363]        down_write_nested+0x46/0x80
[  236.852310]        sget_userns+0x32f/0x520
[  236.856873]        kernfs_mount_ns+0x7e/0x1f0
[  236.861728]        rdt_mount+0x30c/0x440
[  236.866096]        mount_fs+0x38/0x150
[  236.870262]        vfs_kern_mount+0x67/0x150
[  236.875015]        do_mount+0x1df/0xd50
[  236.879286]        SyS_mount+0x95/0xe0
[  236.883464]        entry_SYSCALL_64_fastpath+0x18/0xad
[  236.889183]
               -> #3 (rdtgroup_mutex){+.+.}:
[  236.895292]        __lock_acquire+0x1100/0x11a0
[  236.900337]        lock_acquire+0xdf/0x1d0
[  236.904899]        __mutex_lock+0x80/0x8f0
[  236.909459]        mutex_lock_nested+0x1b/0x20
[  236.914407]        intel_rdt_online_cpu+0x3b/0x4a0
[  236.919745]        cpuhp_invoke_callback+0xce/0xb80
[  236.925177]        cpuhp_thread_fun+0x1c5/0x230
[  236.930222]        smpboot_thread_fn+0x11a/0x1e0
[  236.935362]        kthread+0x152/0x190
[  236.939536]        ret_from_fork+0x27/0x40
[  236.944097]
               -> #2 (cpuhp_state-up){+.+.}:
[  236.950199]        __lock_acquire+0x1100/0x11a0
[  236.955241]        lock_acquire+0xdf/0x1d0
[  236.959800]        cpuhp_issue_call+0x12e/0x1c0
[  236.964845]        __cpuhp_setup_state_cpuslocked+0x13b/0x2f0
[  236.971242]        __cpuhp_setup_state+0xa7/0x120
[  236.976483]        page_writeback_init+0x43/0x67
[  236.981623]        pagecache_init+0x38/0x3b
[  236.986281]        start_kernel+0x3c6/0x41a
[  236.990931]        x86_64_start_reservations+0x2a/0x2c
[  236.996650]        x86_64_start_kernel+0x72/0x75
[  237.001793]        verify_cpu+0x0/0xfb
[  237.005966]
               -> #1 (cpuhp_state_mutex){+.+.}:
[  237.012364]        __lock_acquire+0x1100/0x11a0
[  237.017408]        lock_acquire+0xdf/0x1d0
[  237.021969]        __mutex_lock+0x80/0x8f0
[  237.026527]        mutex_lock_nested+0x1b/0x20
[  237.031475]        __cpuhp_setup_state_cpuslocked+0x54/0x2f0
[  237.037777]        __cpuhp_setup_state+0xa7/0x120
[  237.043013]        page_alloc_init+0x28/0x30
[  237.047769]        start_kernel+0x148/0x41a
[  237.052425]        x86_64_start_reservations+0x2a/0x2c
[  237.058145]        x86_64_start_kernel+0x72/0x75
[  237.063284]        verify_cpu+0x0/0xfb
[  237.067456]
               -> #0 (cpu_hotplug_lock.rw_sem){++++}:
[  237.074436]        check_prev_add+0x401/0x800
[  237.079286]        __lock_acquire+0x1100/0x11a0
[  237.084330]        lock_acquire+0xdf/0x1d0
[  237.088890]        cpus_read_lock+0x42/0x90
[  237.093548]        static_key_enable+0x12/0x30
[  237.098496]        rdt_mount+0x406/0x440
[  237.102862]        mount_fs+0x38/0x150
[  237.107035]        vfs_kern_mount+0x67/0x150
[  237.111787]        do_mount+0x1df/0xd50
[  237.116058]        SyS_mount+0x95/0xe0
[  237.120233]        entry_SYSCALL_64_fastpath+0x18/0xad
[  237.125952]
               other info that might help us debug this:

[  237.134867] Chain exists of:
                 cpu_hotplug_lock.rw_sem --> rdtgroup_mutex --> &type->s_umount_key#37/1

[  237.148425]  Possible unsafe locking scenario:

[  237.155015]        CPU0                    CPU1
[  237.160057]        ----                    ----
[  237.165100]   lock(&type->s_umount_key#37/1);
[  237.169952]                                lock(rdtgroup_mutex);
[  237.176641]
lock(&type->s_umount_key#37/1);
[  237.184287]   lock(cpu_hotplug_lock.rw_sem);
[  237.189041]
                *** DEADLOCK ***

When the resctrl filesystem is mounted the locks must be acquired in the
same order as was done when the cpus came online:

     cpu_hotplug_lock before rdtgroup_mutex.

This also requires to switch the static_branch_enable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.

[ tglx: Switched to cpus_read_[un]lock ]

Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: https://lkml.kernel.org/r/9c41b91bc2f47d9e95b62b213ecdb45623c47a9f.1508490116.git.reinette.chatre@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-21 17:12:22 +02:00
Reinette Chatre
36b6f9fcb8 x86/intel_rdt: Fix potential deadlock during resctrl unmount
Lockdep warns about a potential deadlock:

[   66.782842] ======================================================
[   66.782888] WARNING: possible circular locking dependency detected
[   66.782937] 4.14.0-rc2-test-test+ #48 Not tainted
[   66.782983] ------------------------------------------------------
[   66.783052] umount/336 is trying to acquire lock:
[   66.783117]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff81032395>] rdt_kill_sb+0x215/0x390
[   66.783193]
               but task is already holding lock:
[   66.783244]  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390
[   66.783305]
               which lock already depends on the new lock.

[   66.783364]
               the existing dependency chain (in reverse order) is:
[   66.783419]
               -> #3 (rdtgroup_mutex){+.+.}:
[   66.783467]        __lock_acquire+0x1293/0x13f0
[   66.783509]        lock_acquire+0xaf/0x220
[   66.783543]        __mutex_lock+0x71/0x9b0
[   66.783575]        mutex_lock_nested+0x1b/0x20
[   66.783610]        intel_rdt_online_cpu+0x3b/0x430
[   66.783649]        cpuhp_invoke_callback+0xab/0x8e0
[   66.783687]        cpuhp_thread_fun+0x7a/0x150
[   66.783722]        smpboot_thread_fn+0x1cc/0x270
[   66.783764]        kthread+0x16e/0x190
[   66.783794]        ret_from_fork+0x27/0x40
[   66.783825]
               -> #2 (cpuhp_state){+.+.}:
[   66.783870]        __lock_acquire+0x1293/0x13f0
[   66.783906]        lock_acquire+0xaf/0x220
[   66.783938]        cpuhp_issue_call+0x102/0x170
[   66.783974]        __cpuhp_setup_state_cpuslocked+0x154/0x2a0
[   66.784023]        __cpuhp_setup_state+0xc7/0x170
[   66.784061]        page_writeback_init+0x43/0x67
[   66.784097]        pagecache_init+0x43/0x4a
[   66.784131]        start_kernel+0x3ad/0x3f7
[   66.784165]        x86_64_start_reservations+0x2a/0x2c
[   66.784204]        x86_64_start_kernel+0x72/0x75
[   66.784241]        verify_cpu+0x0/0xfb
[   66.784270]
               -> #1 (cpuhp_state_mutex){+.+.}:
[   66.784319]        __lock_acquire+0x1293/0x13f0
[   66.784355]        lock_acquire+0xaf/0x220
[   66.784387]        __mutex_lock+0x71/0x9b0
[   66.784419]        mutex_lock_nested+0x1b/0x20
[   66.784454]        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
[   66.784497]        __cpuhp_setup_state+0xc7/0x170
[   66.784535]        page_alloc_init+0x28/0x30
[   66.784569]        start_kernel+0x148/0x3f7
[   66.784602]        x86_64_start_reservations+0x2a/0x2c
[   66.784642]        x86_64_start_kernel+0x72/0x75
[   66.784678]        verify_cpu+0x0/0xfb
[   66.784707]
               -> #0 (cpu_hotplug_lock.rw_sem){++++}:
[   66.784759]        check_prev_add+0x32f/0x6e0
[   66.784794]        __lock_acquire+0x1293/0x13f0
[   66.784830]        lock_acquire+0xaf/0x220
[   66.784863]        cpus_read_lock+0x3d/0xb0
[   66.784896]        rdt_kill_sb+0x215/0x390
[   66.784930]        deactivate_locked_super+0x3e/0x70
[   66.784968]        deactivate_super+0x40/0x60
[   66.785003]        cleanup_mnt+0x3f/0x80
[   66.785034]        __cleanup_mnt+0x12/0x20
[   66.785070]        task_work_run+0x8b/0xc0
[   66.785103]        exit_to_usermode_loop+0x94/0xa0
[   66.786804]        syscall_return_slowpath+0xe8/0x150
[   66.788502]        entry_SYSCALL_64_fastpath+0xab/0xad
[   66.790194]
               other info that might help us debug this:

[   66.795139] Chain exists of:
                 cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex

[   66.800035]  Possible unsafe locking scenario:

[   66.803267]        CPU0                    CPU1
[   66.804867]        ----                    ----
[   66.806443]   lock(rdtgroup_mutex);
[   66.808002]                                lock(cpuhp_state);
[   66.809565]                                lock(rdtgroup_mutex);
[   66.811110]   lock(cpu_hotplug_lock.rw_sem);
[   66.812608]
                *** DEADLOCK ***

[   66.816983] 2 locks held by umount/336:
[   66.818418]  #0:  (&type->s_umount_key#35){+.+.}, at: [<ffffffff81229738>] deactivate_super+0x38/0x60
[   66.819922]  #1:  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390

When the resctrl filesystem is unmounted the locks should be obtain in the
locks in the same order as was done when the cpus came online:

      cpu_hotplug_lock before rdtgroup_mutex.

This also requires to switch the static_branch_disable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.

[ tglx: Switched to cpus_read_[un]lock ]

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com
2017-10-21 17:12:21 +02:00
Reinette Chatre
95953034fb x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled
The platform informs via CPUID.(EAX=0x10, ECX=res#):EBX[31:0] (valid res#
are only 1 for L3 and 2 for L2) which unit of the allocation may be used by
other entities in the platform. This information is valid whether CDP (Code
and Data Prioritization) is enabled or not.

Ensure that the bitmask of shareable resource is initialized when CDP is
enabled.

Fixes: 0dd2d7494cd8 ("x86/intel_rdt: Show bitmask of shareable resource with other executing units"
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/815747bddc820ca221a8924edaf4d1a7324547e4.1508490116.git.reinette.chatre@intel.com
2017-10-21 17:12:21 +02:00
Wanpeng Li
9ffd986c6e KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0
Both Intel SDM and AMD APM mentioned that MCi_STATUS, when the register is
implemented, this register can be cleared by explicitly writing 0s to this
register. Writing 1s to this register will cause a general-protection
exception.

The mce is emulated in qemu, so just the guest attempts to write 1 to this
register should cause a #GP, this patch does it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-20 18:21:15 +02:00
Wanpeng Li
61f1dd9099 KVM: VMX: Fix VPID capability detection
In my setup, EPT is not exposed to L1, the VPID capability is exposed and
can be observed by vmxcap tool in L1:
INVVPID supported                        yes
Individual-address INVVPID               yes
Single-context INVVPID                   yes
All-context INVVPID                      yes
Single-context-retaining-globals INVVPID yes

However, the module parameter of VPID observed in L1 is always N, the
cpu_has_vmx_invvpid() check in L1 KVM fails since vmx_capability.vpid
is 0 and it is not read from MSR due to EPT is not exposed.

The VPID can be used to tag linear mappings when EPT is not enabled. However,
current logic just detects VPID capability if EPT is enabled, this patch
fixes it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-20 18:05:01 +02:00
Wanpeng Li
575b3a2cb4 KVM: nVMX: Fix EPT switching advertising
I can use vmxcap tool to observe "EPTP Switching   yes" even if EPT is not
exposed to L1.

EPT switching is advertised unconditionally since it is emulated, however,
it can be treated as an extended feature for EPT and it should not be
advertised if EPT itself is not exposed. This patch fixes it.

Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-20 18:05:01 +02:00
Kirill A. Shutemov
773dd2fca5 x86/xen: Drop 5-level paging support code from the XEN_PV code
It was decided 5-level paging is not going to be supported in XEN_PV.

Let's drop the dead code from the XEN_PV code.

Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170929140821.37654-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 13:07:10 +02:00
Kirill A. Shutemov
4375c29985 x86/xen: Provide pre-built page tables only for CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y
Looks like we only need pre-built page tables in the CONFIG_XEN_PV=y and
CONFIG_XEN_PVH=y cases.

Let's not provide them for other configurations.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170929140821.37654-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 13:07:10 +02:00
Andrey Ryabinin
12a8cc7fcf x86/kasan: Use the same shadow offset for 4- and 5-level paging
We are going to support boot-time switching between 4- and 5-level
paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
for different paging modes: the constant is passed to gcc to generate
code and cannot be changed at runtime.

This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
for both 4- and 5-level paging.

For 5-level paging it means that shadow memory region is not aligned to
PGD boundary anymore and we have to handle unaligned parts of the region
properly.

In addition, we have to exclude paravirt code from KASAN instrumentation
as we now use set_pgd() before KASAN is fully ready.

[kirill.shutemov@linux.intel.com: clenaup, changelog message]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 13:07:09 +02:00
Ingo Molnar
ca4b9c3b74 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 11:02:05 +02:00
Dave Hansen
da20ab3518 x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
We do not have tracepoints for sys_modify_ldt() because we define
it directly instead of using the normal SYSCALL_DEFINEx() macros.

However, there is a reason sys_modify_ldt() does not use the macros:
it has an 'int' return type instead of 'unsigned long'.  This is
a bug, but it's a bug cemented in the ABI.

What does this mean?  If we return -EINVAL from a function that
returns 'int', we have 0x00000000ffffffea in %rax.  But, if we
return -EINVAL from a function returning 'unsigned long', we end
up with 0xffffffffffffffea in %rax, which is wrong.

To work around this and maintain the 'int' behavior while using
the SYSCALL_DEFINEx() macros, so we add a cast to 'unsigned int'
in both implementations of sys_modify_ldt().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171018172107.1A79C532@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 10:37:33 +02:00
Craig Bergstrom
ce56a86e2a x86/mm: Limit mmap() of /dev/mem to valid physical addresses
Currently, it is possible to mmap() any offset from /dev/mem.  If a
program mmaps() /dev/mem offsets outside of the addressable limits
of a system, the page table can be corrupted by setting reserved bits.

For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
x86_64 system with a 48-bit bus, the page fault handler will be called
with error_code set to RSVD.  The kernel then crashes with a page table
corruption error.

This change prevents this page table corruption on x86 by refusing
to mmap offsets higher than the highest valid address in the system.

Signed-off-by: Craig Bergstrom <craigb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dsafonov@virtuozzo.com
Cc: kirill.shutemov@linux.intel.com
Cc: mhocko@suse.com
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 09:48:00 +02:00
Christoph Hellwig
c9eb6172c3 dma-mapping: turn dma_cache_sync into a dma_map_ops method
After we removed all the dead wood it turns out only two architectures
actually implement dma_cache_sync as a real op: mips and parisc.  Add
a cache_sync method to struct dma_map_ops and implement it for the
mips defualt DMA ops, and the parisc pa11 ops.

Note that arm, arc and openrisc support DMA_ATTR_NON_CONSISTENT, but
never provided a functional dma_cache_sync implementations, which
seems somewhat odd.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2017-10-19 16:37:49 +02:00
Christoph Hellwig
95e499fc7f x86: make dma_cache_sync a no-op
x86 does not implement DMA_ATTR_NON_CONSISTENT allocations, so it doesn't
make any sense to do any work in dma_cache_sync given that it must be a
no-op when dma_alloc_attrs returns coherent memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2017-10-19 16:37:13 +02:00
Bhumika Goyal
642e641cbe x86/events/amd/iommu: Make iommu_pmu const and __initconst
iommu_pmu is only used as source for a copy operation in the init code
path.

Mark it const and __initconst.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: julia.lawall@lip6.fr
Link: https://lkml.kernel.org/r/1505819443-670-1-git-send-email-bhumirks@gmail.com
2017-10-19 16:15:47 +02:00
Jérémy Lefaure
0cfe5b5fc0 x86: Use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-video@atrey.karlin.mff.cuni.cz
Cc: Martin Mares <mj@ucw.cz>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20171001193101.8898-13-jeremy.lefaure@lse.epita.fr
2017-10-19 16:15:47 +02:00
Ladi Prosek
cc3d967f7e KVM: SVM: detect opening of SMI window using STGI intercept
Commit 05cade71cf3b ("KVM: nSVM: fix SMI injection in guest mode") made
KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is
enabled.

The issue manifests for me as a significantly longer boot time of Windows
guests when running with SMM-enabled OVMF.

This commit fixes it by intercepting STGI instead of requesting immediate
exit if the reason why SMM was masked is GIF.

Fixes: 05cade71cf3b ("KVM: nSVM: fix SMI injection in guest mode")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-18 21:21:22 +02:00
Thomas Gleixner
57b8b1a185 x86/cpuid: Prevent out of bound access in do_clear_cpu_cap()
do_clear_cpu_cap() allocates a bitmap to keep track of disabled feature
dependencies. That bitmap is sized NCAPINTS * BITS_PER_INIT. The possible
'features' which can be handed in are larger than this, because after the
capabilities the bug 'feature' bits occupy another 32bit. Not really
obvious...

So clearing any of the misfeature bits, as 32bit does for the F00F bug,
accesses that bitmap out of bounds thereby corrupting the stack.

Size the bitmap proper and add a sanity check to catch accidental out of
bound access.

Fixes: 0b00de857a64 ("x86/cpuid: Add generic table for CPUID dependencies")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171018022023.GA12058@yexl-desktop
2017-10-18 20:03:34 +02:00