12953 Commits

Author SHA1 Message Date
Josh Poimboeuf
262fa734a0 x86/unwind: Dump all stacks in unwind_dump()
Currently unwind_dump() dumps only the most recently accessed stack.
But it has a few issues.

In some cases, 'first_sp' can get out of sync with 'stack_info', causing
unwind_dump() to start from the wrong address, flood the printk buffer,
and eventually read a bad address.

In other cases, dumping only the most recently accessed stack doesn't
give enough data to diagnose the error.

Fix both issues by dumping *all* stacks involved in the trace, not just
the last one.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8b5e99f02264 ("x86/unwind: Dump stack data on warnings")
Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Josh Poimboeuf
b0d50c7b5d x86/unwind: Silence more entry-code related warnings
Borislav Petkov reported the following unwinder warning:

  WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30
  unwind stack type:0 next_sp:          (null) mask:0x6 graph_idx:0
  ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38)
  ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35)
  ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68)
  ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50)
  ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30)
  ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50)
  ffffc9000024fed8: 0000000000000000 ...
  ffffc9000024fee0: 0000000000000100 (0x100)
  ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488)
  ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000)
  ffffc9000024fef8: 0000000000000000 ...
  ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98)
  ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00)
  ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10)
  ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10)
  ffffc9000024ff30: 0000000000000010 (0x10)
  ffffc9000024ff38: 0000000000000296 (0x296)
  ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50)
  ffffc9000024ff48: 0000000000000018 (0x18)
  ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8)
  ...

It unwinded from an interrupt which came in right after entry code
called into a C syscall handler, before it had a chance to set up the
frame pointer, so regs->bp still had its user space value.

Add a check to silence warnings in such a case, where an interrupt
has occurred and regs->sp is almost at the end of the stack.

Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Steven Rostedt (VMware)
dc912c3035 x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder
Fengguang Wu's zero day bot triggered a stack unwinder dump. This can
be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry
is in use on x86_32.

 ># cd /sys/kernel/debug/tracing
 ># echo 'p:schedule schedule' > kprobe_events
 ># echo stacktrace > events/kprobes/schedule/trigger

This is because the code that implemented fentry in the ftrace_regs_caller
tried to use the least amount of #ifdefs, and modified ebp when
CC_USE_FENTRY was defined to point to the parent ip as it does when
CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it
corrupts the ebp register for this frame while doing the tracing.

NOTE, it does not corrupt ebp in any other way. It is just a bad frame
pointer when calling into the tracing infrastructure. The original ebp is
restored before returning from the fentry call. But if a stack trace is
performed inside the tracing, the unwinder will notice the bad ebp.

Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip
into the second parameter (%edx), and have an #else that does it the
original way.

The unwinder will unfortunately miss the function being traced, as the
stack frame is not set up yet for it, as it is for x86_64. But fixing that
is a bit more complex and did not work before anyway.

This has been tested with and without FRAME_POINTERS being set while using
-mfentry, as well as using an older compiler that uses mcount.

Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Fixes: 644e0e8dc76b ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html
Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-21 09:48:16 +02:00
Josh Poimboeuf
aa4f853461 x86/unwind: Remove unused 'sp' parameter in unwind_dump()
The 'sp' parameter to unwind_dump() is unused.  Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/08cb36b004629f6bbcf44c267ae4a609242ebd0b.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
4ea3d7410c x86/unwind: Prepend hex mask value with '0x' in unwind_dump()
In unwind_dump(), the stack mask value is printed in hex, but is
confusingly not prepended with '0x'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e7fe41be19d73c9f99f53082486473febfe08ffa.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
9b135b2346 x86/unwind: Properly zero-pad 32-bit values in unwind_dump()
On x86-32, 32-bit stack values printed by unwind_dump() are confusingly
zero-padded to 16 characters (64 bits):

  unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
  f50cdebc: 00000000f50cdec4 (0xf50cdec4)
  f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
  ...

Instead, base the field width on the size of a long integer so that it
looks right on both x86-32 and x86-64.

x86-32:

  unwind stack type:1 next_sp:  (null) mask:0x2 graph_idx:0
  c0ee9d98: c0ee9de0 (init_thread_union+0x1de0/0x2000)
  c0ee9d9c: c043fd90 (__save_stack_trace+0x50/0xe0)
  ...

x86-64:

  unwind stack type:1 next_sp:          (null) mask:0x2 graph_idx:0
  ffffffff81e03b88: ffffffff81e03c10 (init_thread_union+0x3c10/0x4000)
  ffffffff81e03b90: ffffffff81048f8e (__save_stack_trace+0x5e/0x100)
  ...

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36b743812e7eb291d74af4e5067736736622daad.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
e335bb51cc x86/unwind: Ensure stack pointer is aligned
With frame pointers disabled, on some older versions of GCC (like
4.8.3), it's possible for the stack pointer to get aligned at a
half-word boundary:

  00000000000004d0 <fib_table_lookup>:
       4d0:       41 57                   push   %r15
       4d2:       41 56                   push   %r14
       4d4:       41 55                   push   %r13
       4d6:       41 54                   push   %r12
       4d8:       55                      push   %rbp
       4d9:       53                      push   %rbx
       4da:       48 83 ec 24             sub    $0x24,%rsp

In such a case, the unwinder ends up reading the entire stack at the
wrong alignment.  Then the last read goes past the end of the stack,
hitting the stack guard page:

  BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff)
  kernel stack overflow (page fault): 0000 [#1] SMP
  ...

Fix it by ensuring the stack pointer is properly aligned before
unwinding.

Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18 10:30:23 +02:00
Josh Poimboeuf
a8b7a92318 x86/unwind: Silence entry-related warnings
A few people have reported unwinder warnings like the following:

  WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value           (null)
  unwind stack type:0 next_sp:          (null) mask:2 graph_idx:0
  ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0)
  ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
  ffffc90000fe7fa8: 0000000000000246 (0x246)
  ffffc90000fe7fb0: 0000000000000000 ...
  ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc)
  ffffc90000fe7fc8: 0000000000000006 (0x6)
  ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5)
  ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0)
  ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0)
  ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970)
  ffffc90000fe7ff0: 0000000000000000 ...
  ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f)

This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:20:06 +02:00
Josh Poimboeuf
6bcdf9d51b x86/unwind: Read stack return address in update_stack_state()
Instead of reading the return address when unwind_get_return_address()
is called, read it from update_stack_state() and store it in the unwind
state.  This enables the next patch to check the return address from
unwind_next_frame() so it can detect an entry code frame.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:19:59 +02:00
Josh Poimboeuf
5ed8d8bb38 x86/unwind: Move common code into update_stack_state()
The __unwind_start() and unwind_next_frame() functions have some
duplicated functionality.  They both call decode_frame_pointer() and set
state->regs and state->bp accordingly.  Move that functionality to a
common place in update_stack_state().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:19:49 +02:00
Peter Zijlstra
b5effd3815 debug: Fix __bug_table[] in arch linker scripts
The kbuild test robot reported this build failure on a number
of architectures:

 >         make.cross ARCH=arm
 >    lib/lib.a(bug.o): In function `find_bug':
 > >> lib/bug.c:135: undefined reference to `__start___bug_table'
 > >> lib/bug.c:135: undefined reference to `__stop___bug_table'

Caused by:

  19d436268dde ("debug: Add _ONCE() logic to report_bug()")

Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(),
but a number of architectures don't use RW_DATA_SECTION(), so they
ended up with no __bug_table[] ...

Ideally all those would use RW_DATA_SECTION() in their linker scripts,
but that's for another day.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03 10:22:40 +02:00
Peter Zijlstra
9a93848fe7 x86/debug: Implement __WARN() using UD0
By using "UD0" for WARN()s we remove the function call and its possible
__FILE__ and __LINE__ immediate arguments from the instruction stream.

Total image size will not change much, what we win in the instruction
stream we'll lose because of the __bug_table entries. Still, saves on
I$ footprint and the total image size does go down a bit.

      text    data       filename
  10702123    4530992    defconfig-build/vmlinux.orig
  10682460    4530992    defconfig-build/vmlinux.patched

(UML didn't seem to use GENERIC_BUG at all, so remove it)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 10:20:28 +02:00
Steven Rostedt (VMware)
1fa9d67a2f x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o
Currently ftrace_32.S and ftrace_64.S are compiled even when
CONFIG_FUNCTION_TRACER is not set. This means there's an unnecessary #ifdef
to protect the code. Instead of using preprocessor directives, only compile
those files when FUNCTION_TRACER is defined.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170316210043.peycxdxktwwn6cid@treble
Link: http://lkml.kernel.org/r/20170323143446.217684991@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:08 +01:00
Steven Rostedt (VMware)
644e0e8dc7 x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set
x86_64 has had fentry support for some time. I did not add support to x86_32
as I was unsure if it will be used much in the future. It is still very much
used, and there's issues with function graph tracing with gcc playing around
with the mcount frames, causing function graph to panic. The fentry code
does not have this issue, and is able to cope as there is no frame to mess
up.

Note, this only adds support for fentry when DYNAMIC_FTRACE is set. There's
really no reason to not have that set, because the performance of the
machine drops significantly when it's not enabled.

Keep !DYNAMIC_FTRACE around to test it off, as there's still some archs
that have FTRACE but not DYNAMIC_FTRACE.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143446.052202377@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)
ff04b440d2 x86/ftrace: Clean up ftrace_regs_caller
When ftrace_regs_caller was created, it was designed to preserve flags as
much as possible as it needed to act just like a breakpoint triggered on the
same location. But the design is over complicated as it treated all
operations as modifying flags. But push, mov and lea do not modify flags.
This means the code can become more simplified by allowing flags to be
stored further down.

Making ftrace_regs_caller simpler will also be useful in implementing fentry
logic.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170316135328.36123c3e@gandalf.local.home
Link: http://lkml.kernel.org/r/20170323143445.917292592@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)
e6928e58d4 x86/ftrace: Add stack frame pointer to ftrace_caller
The function hook ftrace_caller does not create its own stack frame, and
this causes the ftrace stack trace to miss the first function when doing
stack traces.

 # echo schedule:stacktrace > /sys/kernel/tracing/set_ftrace_filter

Before:
         <idle>-0     [002] .N..    29.865807: <stack trace>
 => cpu_startup_entry
 => start_secondary
 => startup_32_smp
           <...>-7     [001] ....    29.866509: <stack trace>
 => kthread
 => ret_from_fork
           <...>-1     [000] ....    29.865377: <stack trace>
 => poll_schedule_timeout
 => do_select
 => core_sys_select
 => SyS_select
 => do_fast_syscall_32
 => entry_SYSENTER_32

After:
          <idle>-0     [002] .N..    31.234853: <stack trace>
 => do_idle
 => cpu_startup_entry
 => start_secondary
 => startup_32_smp
           <...>-7     [003] ....    31.235140: <stack trace>
 => rcu_gp_kthread
 => kthread
 => ret_from_fork
           <...>-1819  [000] ....    31.264172: <stack trace>
 => schedule_hrtimeout_range
 => poll_schedule_timeout
 => do_sys_poll
 => SyS_ppoll
 => do_fast_syscall_32
 => entry_SYSENTER_32

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.771707773@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)
3d82c59c6e x86/ftrace: Move the ftrace specific code out of entry_32.S
The function tracing hook code for ftrace is not an entry point from
userspace and does not belong in the entry_*.S files. It has already been
moved out of entry_64.S.

Move it out of entry_32.S into its own ftrace_32.S file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.645218946@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)
db65d7b6dc x86/ftrace: Rename mcount_64.S to ftrace_64.S
With the advent of -mfentry that uses the new "fentry" hook over mcount,
the mcount name is obsolete. Having the code file that ftrace hooks into
called "mcount*.S" is rather misleading. Rename it to ftrace_64.S and
remove the file name reference.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.490601451@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:06 +01:00
Linus Torvalds
eab60d4e5b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "An assorted pile of fixes along with some hardware enablement:

   - a fix for a KASAN / branch profiling related boot failure

   - some more fallout of the PUD rework

   - a fix for the Always Running Timer which is not initialized when
     the TSC frequency is known at boot time (via MSR/CPUID)

   - a resource leak fix for the RDT filesystem

   - another unwinder corner case fixup

   - removal of the warning for duplicate NMI handlers because there are
     legitimate cases where more than one handler can be registered at
     the last level

   - make a function static - found by sparse

   - a set of updates for the Intel MID platform which got delayed due
     to merge ordering constraints. It's hardware enablement for a non
     mainstream platform, so there is no risk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Make unnecessarily global function static
  x86/intel_rdt: Put group node in rdtgroup_kn_unlock
  x86/unwind: Fix last frame check for aligned function stacks
  mm, x86: Fix native_pud_clear build error
  x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
  x86/platform/intel-mid: Add power button support for Merrifield
  x86/platform/intel-mid: Use common power off sequence
  x86/platform: Remove warning message for duplicate NMI handlers
  x86/tsc: Fix ART for TSC_KNOWN_FREQ
  x86/platform/intel-mid: Correct MSI IRQ line for watchdog device
2017-03-17 14:05:03 -07:00
Linus Torvalds
ae13373319 Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 acpi fixes from Thomas Gleixner:
 "This update deals with the fallout of the recent work to make
  cpuid/node mappings persistent.

  It turned out that the boot time ACPI based mapping tripped over ACPI
  inconsistencies and caused regressions. It's partially reverted and
  the fragile part replaced by an implementation which makes the mapping
  persistent when a CPU goes online for the first time"

* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  acpi/processor: Check for duplicate processor ids at hotplug time
  acpi/processor: Implement DEVICE operator for processor enumeration
  x86/acpi: Restore the order of CPU IDs
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
2017-03-17 14:01:40 -07:00
Jiri Olsa
49ec8f5b6a x86/intel_rdt: Put group node in rdtgroup_kn_unlock
The rdtgroup_kn_unlock waits for the last user to release and put its
node. But it's calling kernfs_put on the node which calls the
rdtgroup_kn_unlock, which might not be the group's directory node, but
another group's file node.

This race could be easily reproduced by running 2 instances
of following script:

  mount -t resctrl resctrl /sys/fs/resctrl/
  pushd /sys/fs/resctrl/
  mkdir krava
  echo "krava" > krava/schemata
  rmdir krava
  popd
  umount  /sys/fs/resctrl

It triggers the slub debug error message with following command
line config: slub_debug=,kernfs_node_cache.

Call kernfs_put on the group's node to fix it.

Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14 21:51:58 +01:00
Josh Poimboeuf
87a6b2975f x86/unwind: Fix last frame check for aligned function stacks
Pavel Machek reported the following warning on x86-32:

  WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value   (null)

The warning is caused by the unwinder not realizing that it reached the
end of the stack, due to an unusual prologue which gcc sometimes
generates for aligned stacks.  The prologue is based on a gcc feature
called the Dynamic Realign Argument Pointer (DRAP).  It's almost always
enabled for aligned stacks when -maccumulate-outgoing-args isn't set.

This issue is similar to the one fixed by the following commit:

  8023e0e2a48d ("x86/unwind: Adjust last frame check for aligned function stacks")

... but that fix was specific to x86-64.

Make the fix more generic to cover x86-32 as well, and also ensure that
the return address referred to by the frame pointer is a copy of the
original return address.

Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers")
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14 21:51:57 +01:00
Andrey Ryabinin
be3606ff73 x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y
options selected. With branch profiling enabled we end up calling
ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is
built with KASAN instrumentation, so calling it before kasan has been
initialized leads to crash.

Use DISABLE_BRANCH_PROFILING define to make sure that we don't call
ftrace_likely_update() from early code before kasan_early_init().

Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: lkp@01.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14 00:00:55 +01:00
Mike Travis
0d443b70cc x86/platform: Remove warning message for duplicate NMI handlers
Remove the WARNING message associated with multiple NMI handlers as
there are at least two that are legitimate.  These are the KGDB and the
UV handlers and both want to be called if the NMI has not been claimed
by any other NMI handler.

Use of the UNKNOWN NMI call chain dramatically lowers the NMI call rate
when high frequency NMI tools are in use, notably the perf tools.  It is
required on systems that cannot sustain a high NMI call rate without
adversely affecting the system operation.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Frank Ramsay <frank.ramsay@hpe.com>
Cc: Tony Ernst <tony.ernst@hpe.com>
Link: http://lkml.kernel.org/r/20170307210841.730959611@asylum.americas.sgi.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 20:45:18 +01:00
Peter Zijlstra
44fee88cea x86/tsc: Fix ART for TSC_KNOWN_FREQ
Subhransu reported that convert_art_to_tsc() isn't working for him.

The ART to TSC relation is only set up for systems which use the refined
TSC calibration. Systems with known TSC frequency (available via CPUID 15)
are not using the refined calibration and therefor the ART to TSC relation
is never established.

Add the setup to the known frequency init path which skips ART
calibration. The init code needs to be duplicated as for systems which use
refined calibration the ART setup must be delayed until calibration has
been done.

The problem has been there since the ART support was introdduced, but only
detected now because Subhransu tested the first time on hardware which has
TSC frequency enumerated via CPUID 15.

Note for stable: The conditional has changed from TSC_RELIABLE to
     	 	 TSC_KNOWN_FREQUENCY.

[ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]

Fixes: f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource")
Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: christopher.s.hall@intel.com
Cc: kevin.b.stanton@intel.com
Cc: john.stultz@linaro.org
Cc: akataria@vmware.com
Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 19:50:23 +01:00
Linus Torvalds
5a45a5a881 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - a fix for the kexec/purgatory regression which was introduced in the
   merge window via an innocent sparse fix. We could have reverted that
   commit, but on deeper inspection it turned out that the whole
   machinery is neither documented nor robust. So a proper cleanup was
   done instead

 - the fix for the TLB flush issue which was discovered recently

 - a simple typo fix for a reboot quirk

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tlb: Fix tlb flushing when lguest clears PGE
  kexec, x86/purgatory: Unbreak it and clean it up
  x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
2017-03-12 14:18:49 -07:00
Dou Liyang
2b85b3d229 x86/acpi: Restore the order of CPU IDs
The following commits:

  f7c28833c2 ("x86/acpi: Enable acpi to register all possible cpus at
boot time") and 8f54969dc8 ("x86/acpi: Introduce persistent storage
for cpuid <-> apicid mapping")

... registered all the possible CPUs at boot time via ACPI tables to
make the mapping of cpuid <-> apicid fixed. Both enabled and disabled
CPUs could have a logical CPU ID after boot time.

But, ACPI tables are unreliable. the number amd order of Local APIC
entries which depends on the firmware is often inconsistent with the
physical devices. Even if they are consistent, The disabled CPUs which
take up some logical CPU IDs will also make the order discontinuous.

Revert the part of disabled CPUs registration, keep the allocation
logic of logical CPU IDs and also keep some code location changes.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: guzheng1@huawei.com
Cc: izumi.taku@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1488528147-2279-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11 14:41:19 +01:00
Dou Liyang
c962cff17d Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
Revert: dc6db24d2476 ("x86/acpi: Set persistent cpuid <-> nodeid mapping when booting")

The mapping of "cpuid <-> nodeid" is established at boot time via ACPI
tables to keep associations of workqueues and other node related items
consistent across cpu hotplug.

But, ACPI tables are unreliable and failures with that boot time mapping
have been reported on machines where the ACPI table and the physical
information which is retrieved at actual hotplug is inconsistent.

Revert the mapping implementation so it can be replaced with a less error
prone approach.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: guzheng1@huawei.com
Cc: izumi.taku@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1488528147-2279-2-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11 14:41:18 +01:00
Thomas Gleixner
40c50c1fec kexec, x86/purgatory: Unbreak it and clean it up
The purgatory code defines global variables which are referenced via a
symbol lookup in the kexec code (core and arch).

A recent commit addressing sparse warnings made these static and thereby
broke kexec_file.

Why did this happen? Simply because the whole machinery is undocumented and
lacks any form of forward declarations. The variable names are unspecific
and lack a prefix, so adding forward declarations creates shadow variables
in the core code. Aside of that the code relies on magic constants and
duplicate struct definitions with no way to ensure that these things stay
in sync. The section placement of the purgatory variables happened by
chance and not by design.

Unbreak kexec and cleanup the mess:

 - Add proper forward declarations and document the usage
 - Use common struct definition
 - Use the proper common defines instead of magic constants
 - Add a purgatory_ prefix to have a proper name space
 - Use ARRAY_SIZE() instead of a homebrewn reimplementation
 - Add proper sections to the purgatory variables [ From Mike ]

Fixes: 72042a8c7b01 ("x86/purgatory: Make functions and variables static")
Reported-by: Mike Galbraith <<efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10 20:55:09 +01:00
Matjaz Hegedic
bba8376aea x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
The reboot quirk for ASUS EeeBook X205TA contains a typo in
DMI_PRODUCT_NAME, improperly referring to X205TAW instead of
X205TA, which prevents the quirk from being triggered. The
model X205TAW already has a reboot quirk of its own.

This fix simply removes the inappropriate final letter W.

Fixes: 90b28ded88dd ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk")
Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com>
Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10 11:58:33 +01:00
Masahiro Yamada
8a1115ff6b scripts/spelling.txt: add "disble(d)" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  disble||disable
  disbled||disabled

I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c
untouched.  The macro is not referenced at all, but this commit is
touching only comment blocks just in case.

Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
Linus Torvalds
ec3b93ae0b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes and minor updates all over the place:

   - an SGI/UV fix
   - a defconfig update
   - a build warning fix
   - move the boot_params file to the arch location in debugfs
   - a pkeys fix
   - selftests fix
   - boot message fixes
   - sparse fixes
   - a resume warning fix
   - ioapic hotplug fixes
   - reboot quirks

  ... plus various minor cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build/x86_64_defconfig: Enable CONFIG_R8169
  x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk
  x86/hpet: Prevent might sleep splat on resume
  x86/boot: Correct setup_header.start_sys name
  x86/purgatory: Fix sparse warning, symbol not declared
  x86/purgatory: Make functions and variables static
  x86/events: Remove last remnants of old filenames
  x86/pkeys: Check against max pkey to avoid overflows
  x86/ioapic: Split IOAPIC hot-removal into two steps
  x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC
  x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h
  x86/vmware: Remove duplicate inclusion of asm/timer.h
  x86/hyperv: Hide unused label
  x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk
  x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
  x86/selftests: Add clobbers for int80 on x86_64
  x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
  x86/apic: Fix a warning message in logical CPU IDs allocation
  x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
2017-03-07 14:47:24 -08:00
Linus Torvalds
609b07b72d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A fix for KVM's scheduler clock which (erroneously) was always marked
  unstable, a fix for RT/DL load balancing, plus latency fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
  sched/core: Fix pick_next_task() for RT,DL
  sched/fair: Make select_idle_cpu() more aggressive
2017-03-07 14:42:34 -08:00
Linus Torvalds
c3abcabe81 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This includes a fix for a crash if certain special addresses are
  kprobed, plus does a rename of two Kconfig variables that were a minor
  misnomer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS
  kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed
2017-03-07 14:38:16 -08:00
Matjaz Hegedic
3b3e78552d x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk
Without the parameter reboot=a, ASUS EeeBook X205TA/W will hang
when it should reboot. This adds the appropriate quirk, thus
fixing the problem.

Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com>
Link: http://lkml.kernel.org/r/1488737804-20681-1-git-send-email-matjaz.hegedic@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-06 11:47:43 +01:00
Linus Torvalds
2d62e0768d Second batch of KVM changes for 4.11 merge window
PPC:
  * correct assumption about ASDR on POWER9
  * fix MMIO emulation on POWER9
 
 x86:
  * add a simple test for ioperm
  * cleanup TSS
    (going through KVM tree as the whole undertaking was caused by VMX's
     use of TSS)
  * fix nVMX interrupt delivery
  * fix some performance counters in the guest
 
 And two cleanup patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJYuu5qAAoJEED/6hsPKofoRAUH/jkx/KFDcw3FggixysWVgRai
 iLSbbAZemnSLFSOkOU/t7Bz0fXCUgB0tAcMJd9ow01Dg1zObiTpuUIo6qEPaYHdX
 gqtUzlHuyECZEcgK0RXS9kDYLrvw7EFocxnDWQfV91qCZSS6nBSSLF3ST1rNV69W
 mUvcZG+MciDcZUe1lTexoswVTh1m7avvozEnQ5OHnZR9yicoXiadBQjzL6yqWoqf
 Ml/29zRk5+MvloTudxjkAKm3mh7psW88jNMh37TXbAA7i+Xwl9cU6GLR9mFWstoP
 7Ot7ecq9mNAUO3lTIQh7lqvB60LMFznS4IlYK7MbplC3kvJLkfzhTWaN1aGvh90=
 =cqHo
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Radim Krčmář:
 "Second batch of KVM changes for the 4.11 merge window:

  PPC:
   - correct assumption about ASDR on POWER9
   - fix MMIO emulation on POWER9

  x86:
   - add a simple test for ioperm
   - cleanup TSS (going through KVM tree as the whole undertaking was
     caused by VMX's use of TSS)
   - fix nVMX interrupt delivery
   - fix some performance counters in the guest

  ... and two cleanup patches"

* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: Fix pending events injection
  x86/kvm/vmx: remove unused variable in segment_base()
  selftests/x86: Add a basic selftest for ioperm
  x86/asm: Tidy up TSS limit code
  kvm: convert kvm.users_count from atomic_t to refcount_t
  KVM: x86: never specify a sample period for virtualized in_tx_cp counters
  KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
  KVM: PPC: Book3S HV: Fix software walk of guest process page tables
2017-03-04 11:36:19 -08:00
Ingo Molnar
68e21be291 sched/headers: Move task->mm handling methods to <linux/sched/mm.h>
Move the following task->mm helper APIs into a new header file,
<linux/sched/mm.h>, to further reduce the size and complexity
of <linux/sched.h>.

Here are how the APIs are used in various kernel files:

  # mm_alloc():
  arch/arm/mach-rpc/ecard.c
  fs/exec.c
  include/linux/sched/mm.h
  kernel/fork.c

  # __mmdrop():
  arch/arc/include/asm/mmu_context.h
  include/linux/sched/mm.h
  kernel/fork.c

  # mmdrop():
  arch/arm/mach-rpc/ecard.c
  arch/m68k/sun3/mmu_emu.c
  arch/x86/mm/tlb.c
  drivers/gpu/drm/amd/amdkfd/kfd_process.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/hw/hfi1/file_ops.c
  drivers/vfio/vfio_iommu_spapr_tce.c
  fs/exec.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/mmu_notifier.h
  include/linux/sched/mm.h
  kernel/fork.c
  kernel/futex.c
  kernel/sched/core.c
  mm/khugepaged.c
  mm/ksm.c
  mm/mmu_context.c
  mm/mmu_notifier.c
  mm/oom_kill.c
  virt/kvm/kvm_main.c

  # mmdrop_async_fn():
  include/linux/sched/mm.h

  # mmdrop_async():
  include/linux/sched/mm.h
  kernel/fork.c

  # mmget_not_zero():
  fs/userfaultfd.c
  include/linux/sched/mm.h
  mm/oom_kill.c

  # mmput():
  arch/arc/include/asm/mmu_context.h
  arch/arc/kernel/troubleshoot.c
  arch/frv/mm/mmu-context.c
  arch/powerpc/platforms/cell/spufs/context.c
  arch/sparc/include/asm/mmu_context_32.h
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/core/uverbs_main.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/exec.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/events/uprobes.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/oom_kill.c
  mm/process_vm_access.c
  mm/rmap.c
  mm/swapfile.c
  mm/util.c
  virt/kvm/async_pf.c

  # mmput_async():
  include/linux/sched/mm.h
  kernel/fork.c
  mm/oom_kill.c

  # get_task_mm():
  arch/arc/kernel/troubleshoot.c
  arch/powerpc/platforms/cell/spufs/context.c
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/util.c

  # mm_access():
  fs/proc/base.c
  include/linux/sched/mm.h
  kernel/fork.c
  mm/process_vm_access.c

  # mm_release():
  arch/arc/include/asm/mmu_context.h
  fs/exec.c
  include/linux/sched/mm.h
  include/uapi/linux/sched.h
  kernel/exit.c
  kernel/fork.c

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:43:28 +01:00
Thomas Gleixner
bb1a2c2616 x86/hpet: Prevent might sleep splat on resume
Sergey reported a might sleep warning triggered from the hpet resume
path. It's caused by the call to disable_irq() from interrupt disabled
context.

The problem with the low level resume code is that it is not accounted as a
special system_state like we do during the boot process. Calling the same
code during system boot would not trigger the warning. That's inconsistent
at best.

In this particular case it's trivial to replace the disable_irq() with
disable_hardirq() because this particular code path is solely used from
system resume and the involved hpet interrupts can never be force threaded.


Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703012108460.3684@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-02 09:33:47 +01:00
Peter Zijlstra
f94c8d1169 sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
Wanpeng Li reported that since the following commit:

  acb04058de49 ("sched/clock: Fix hotplug crash")

... KVM always runs with unstable sched-clock even though KVM's
kvm_clock _is_ stable.

The problem is that we've tied clear_sched_clock_stable() to the TSC
state, and overlooked that sched_clock() is a paravirt function.

Solve this by doing two things:

 - tie the sched_clock() stable state more clearly to the TSC stable
   state for the normal (!paravirt) case.

 - only call clear_sched_clock_stable() when we mark TSC unstable
   when we use native_sched_clock().

The first means we can actually run with stable sched_clock in more
situations then before, which is good. And since commit:

  12907fbb1a69 ("sched/clock, clocksource: Add optional cs::mark_unstable() method")

... this should be reliable. Since any detection of TSC fail now results
in marking the TSC unstable.

Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: acb04058de49 ("sched/clock: Fix hotplug crash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:49 +01:00
Ingo Molnar
32ef5517c2 sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.

Update all code that relies on these facilities.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00
Ingo Molnar
9164bb4a18 sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>
Update all usage sites first.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar
68db0cf106 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:36 +01:00
Ingo Molnar
299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar
ef8bd77f33 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/hotplug.h>
We are going to split <linux/sched/hotplug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/hotplug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar
b17b01533b sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar
5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Ingo Molnar
010426079e sched/headers: Prepare for new header dependencies before moving more code to <linux/sched/mm.h>
We are going to split more MM APIs out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.

The APIs that we are going to move are:

  arch_pick_mmap_layout()
  arch_get_unmapped_area()
  arch_get_unmapped_area_topdown()
  mm_update_next_owner()

Include the header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:30 +01:00
Ingo Molnar
38b8d208a4 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/nmi.h>
We are going to move softlockup APIs out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

<linux/nmi.h> already includes <linux/sched.h>.

Include the <linux/nmi.h> header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:30 +01:00
Ingo Molnar
3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00