13164 Commits

Author SHA1 Message Date
Linus Torvalds
89d1cf89c8 * An EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)
* Removal of DRAM error reporting through PCI SERR NMI (Borislav Petkov)
 
 * Misc small fixes (Jan Glauber, Thor Thayer)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlkGtO0ACgkQEsHwGGHe
 VUoXfA/+JLgcHpI04KcvJtTMpNWE3p04xLdzw7hvgvWPLg+JDHF1jXxA4HRy7usI
 BAsEZIcpyk/9tzYjKm4zc/8nhlrjx/ic9cU+hZa8zCy/47uArX9HlrsxAUpgVxcx
 YmWzZ2gyo9Jsi/44wZwnp4dNWibvyG5ECrgis7AFOihT1qyi74YajNfqJWWUbG/H
 W3DkCVs2JVzelue3rI9J8f9MSZk5sL3C9vfFWxk6ifiqr+rlUphoSNFdF+mRnBdr
 dvk555G4Xmmz97ZiBAOM12M1trn+4lCkyfuQuMw0cZYt7F/nS7ZdLqAKK8H1KIoE
 mGl29p85svZRhIM25Cd759LSharAetqpNyxicjAwONwLcKiXVf2UuR5NohVj3y1f
 Dbrh4zRx0OVJctaAKzLEHhW3Re/VA6lU8JUuvjBytKV5fr64jBpqSXFDL8J4y7p2
 RJnKNbPkoXB75LukNqxDgpL+YEnJjzlslqxLqgPVgHFtrsUjpNHAJ9rKDeJQoW3b
 wC2wVBZmwx+4ShyHjJePJC7C6a/gDktbDos2/XW11DHa4w8ZbZ2Q4ep9oYegBKcd
 szliytm0LWlUTUDVNoc9DW/ka0NAh43kjvCqcmUcfC+4lhMO28eajvj35PP7fcic
 hmCAQnJz6M8t1VgxO7xvWi4jAwhvbzXM5IV1O3tIDMYHJQhrLBw=
 =vGf1
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - an EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)

 - removal of DRAM error reporting through PCI SERR NMI (Borislav
   Petkov)

 - misc small fixes (Jan Glauber, Thor Thayer)

* tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, ghes: Do not enable it by default
  EDAC: Rename report status accessors
  EDAC: Delete edac_stub.c
  EDAC: Update Kconfig help text
  EDAC: Remove EDAC_MM_EDAC
  EDAC: Issue tracepoint only when it is defined
  ACPI/extlog: Add EDAC dependency
  EDAC: Move edac_op_state to edac_mc.c
  EDAC: Remove edac_err_assert
  EDAC: Get rid of edac_handlers
  x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
  EDAC, highbank: Align Makefile directives
  EDAC, thunderx: Remove unused code
  EDAC, thunderx: Change LMC index calculation
  EDAC, altera: Fix peripheral warnings for Cyclone5
  EDAC, thunderx: Fix L2C MCI interrupt disable
  EDAC, thunderx: Add Cavium ThunderX EDAC driver
2017-05-01 11:36:00 -07:00
Ingo Molnar
bfb8c6e495 Merge branch 'x86/microcode' into x86/urgent, to pick up cleanup
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-01 13:40:23 +02:00
Linus Torvalds
97ce89f8a4 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "The final fixes for 4.11:

   - prevent a triple fault with function graph tracing triggered via
     suspend to ram

   - prevent optimizing for size when function graph tracing is enabled
     and the compiler does not support -mfentry

   - prevent mwaitx() being called with a zero timeout as mwaitx() might
     never return. Observed on the new Ryzen CPUs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Prevent timer value 0 for MWAITX
  x86/build: convert function graph '-Os' error to warning
  ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
2017-04-30 11:44:18 -07:00
Shaohua Li
bfd20f1cc8 x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
IOMMU harms performance signficantly when we run very fast networking
workloads. It's 40GB networking doing XDP test. Software overhead is
almost unaware, but it's the IOTLB miss (based on our analysis) which
kills the performance. We observed the same performance issue even with
software passthrough (identity mapping), only the hardware passthrough
survives. The pps with iommu (with software passthrough) is only about
~30% of that without it. This is a limitation in hardware based on our
observation, so we'd like to disable the IOMMU force on, but we do want
to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
eabling IOMMU is totally ok.

So introduce a new boot option to disable the force on. It's kind of
silly we need to run into intel_iommu_init even without force on, but we
need to disable TBOOT PMR registers. For system without the boot option,
nothing is changed.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26 23:57:53 +02:00
Andy Lutomirski
9ccee2373f x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 10:02:06 +02:00
Josh Poimboeuf
262fa734a0 x86/unwind: Dump all stacks in unwind_dump()
Currently unwind_dump() dumps only the most recently accessed stack.
But it has a few issues.

In some cases, 'first_sp' can get out of sync with 'stack_info', causing
unwind_dump() to start from the wrong address, flood the printk buffer,
and eventually read a bad address.

In other cases, dumping only the most recently accessed stack doesn't
give enough data to diagnose the error.

Fix both issues by dumping *all* stacks involved in the trace, not just
the last one.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8b5e99f02264 ("x86/unwind: Dump stack data on warnings")
Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Josh Poimboeuf
b0d50c7b5d x86/unwind: Silence more entry-code related warnings
Borislav Petkov reported the following unwinder warning:

  WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30
  unwind stack type:0 next_sp:          (null) mask:0x6 graph_idx:0
  ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38)
  ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35)
  ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68)
  ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50)
  ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30)
  ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50)
  ffffc9000024fed8: 0000000000000000 ...
  ffffc9000024fee0: 0000000000000100 (0x100)
  ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488)
  ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000)
  ffffc9000024fef8: 0000000000000000 ...
  ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98)
  ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00)
  ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10)
  ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10)
  ffffc9000024ff30: 0000000000000010 (0x10)
  ffffc9000024ff38: 0000000000000296 (0x296)
  ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50)
  ffffc9000024ff48: 0000000000000018 (0x18)
  ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8)
  ...

It unwinded from an interrupt which came in right after entry code
called into a C syscall handler, before it had a chance to set up the
frame pointer, so regs->bp still had its user space value.

Add a check to silence warnings in such a case, where an interrupt
has occurred and regs->sp is almost at the end of the stack.

Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Paolo Bonzini
8afd74c296 Merge branch 'x86/process' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
Required for KVM support of the CPUID faulting feature.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-21 11:55:06 +02:00
Steven Rostedt (VMware)
dc912c3035 x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder
Fengguang Wu's zero day bot triggered a stack unwinder dump. This can
be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry
is in use on x86_32.

 ># cd /sys/kernel/debug/tracing
 ># echo 'p:schedule schedule' > kprobe_events
 ># echo stacktrace > events/kprobes/schedule/trigger

This is because the code that implemented fentry in the ftrace_regs_caller
tried to use the least amount of #ifdefs, and modified ebp when
CC_USE_FENTRY was defined to point to the parent ip as it does when
CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it
corrupts the ebp register for this frame while doing the tracing.

NOTE, it does not corrupt ebp in any other way. It is just a bad frame
pointer when calling into the tracing infrastructure. The original ebp is
restored before returning from the fentry call. But if a stack trace is
performed inside the tracing, the unwinder will notice the bad ebp.

Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip
into the second parameter (%edx), and have an #else that does it the
original way.

The unwinder will unfortunately miss the function being traced, as the
stack frame is not set up yet for it, as it is for x86_64. But fixing that
is a bit more complex and did not work before anyway.

This has been tested with and without FRAME_POINTERS being set while using
-mfentry, as well as using an older compiler that uses mcount.

Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Fixes: 644e0e8dc76b ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html
Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-21 09:48:16 +02:00
Vikas Shivappa
4797b7dfdf x86/intel_rdt: Return error for incorrect resource names in schemata
When schemata parses the resource names it does not return an error if it
detects incorrect resource names and fails quietly.

This happens because for_each_enabled_rdt_resource(r) leaves "r" pointing
beyond the end of the rdt_resources_all[] array, and the check for !r->name
results in an out of bounds access.

Split the resource parsing part into a helper function to avoid the issue.

[ tglx: Made it readable by splitting the parser loop out into a function ]

Reported-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1492645804-17465-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20 15:57:59 +02:00
Vikas Shivappa
634b0e0491 x86/intel_rdt: Trim whitespace while parsing schemata input
Schemata is displayed in tabular format which introduces some whitespace
to show data in a tabular format.

Writing back the same data fails as the parser does not handle the
whitespace.

Trim the leading and trailing whitespace before parsing.

Reported-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1492645804-17465-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20 15:57:59 +02:00
Vikas Shivappa
adcbdd7030 x86/intel_rdt: Fix padding when resource is enabled via mount
Currently max width of 'resource name' and 'resource data' is being
initialized based on 'enabled resources' during boot. But the mount can
enable different capable resources at a later time which upsets the
tabular format of schemata. Fix this to be based on 'all capable'
resources.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1492645804-17465-2-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20 15:57:59 +02:00
Chen Yu
c0edbd4a16 x86/irq: Optimize free vector check in the CPU offline path
Before offlining a CPU its required to check whether there are enough free
irq vectors available so interrupts can be migrated away from the CPU.

This check is executed whether its required or not and neither stops
searching when the number of required free vectors are reached.

Bypass the free vector check if the current CPU has no irq to migrate and
leave the for_each_online_cpu() loop when the free vector count reaches the
number of required vectors.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <lenb@kernel.orq>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1492357410-510-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20 15:25:09 +02:00
Prarit Bhargava
21173d0b4d sched/x86: Update reschedule warning text
Modify the reschedule warning to output the offline CPU number and
use a better debug message.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1492518305-3808-1-git-send-email-prarit@redhat.com
[ Tweaked the warning message. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-20 10:14:30 +02:00
Ingo Molnar
afa7a17f3a Merge branch 'WIP.x86/process' into perf/core 2017-04-20 10:07:12 +02:00
Tiantian Feng
fba4f472b3 x86/reboot: Turn off KVM when halting a CPU
A CPU in VMX root mode will ignore INIT signals and will fail to bring
up the APs after reboot.  Therefore, on a panic we disable VMX on all
CPUs before rebooting or triggering kdump.

Do this when halting the machine as well, in case a firmware-level reboot
does not perform a cold reset for all processors.  Without doing this,
rebooting the host may hang.

Signed-off-by: Tiantian Feng <fengtiantian@huawei.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
[ Rewritten commit message. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20170419161839.30550-1-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-20 10:06:57 +02:00
Borislav Petkov
c6a9583fb4 x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only
mce_usable_address() does a bunch of basic sanity checks to verify
whether the address reported with the error is usable for further
processing. However, we do check MCi_STATUS[MISCV] and that is not
needed on AMD as that bit says that there's additional information about
the logged error in the MCi_MISCj banks.

But we don't need that to know whether the address is usable - we only
need to know whether the physical address is valid - i.e., ADDRV.

On Intel the MISCV bit is needed to perform additional checks to determine
whether the reported address is a physical one, etc.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170418183924.6agjkebilwqj26or@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-19 12:04:46 +02:00
Josh Poimboeuf
aa4f853461 x86/unwind: Remove unused 'sp' parameter in unwind_dump()
The 'sp' parameter to unwind_dump() is unused.  Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/08cb36b004629f6bbcf44c267ae4a609242ebd0b.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
4ea3d7410c x86/unwind: Prepend hex mask value with '0x' in unwind_dump()
In unwind_dump(), the stack mask value is printed in hex, but is
confusingly not prepended with '0x'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e7fe41be19d73c9f99f53082486473febfe08ffa.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
9b135b2346 x86/unwind: Properly zero-pad 32-bit values in unwind_dump()
On x86-32, 32-bit stack values printed by unwind_dump() are confusingly
zero-padded to 16 characters (64 bits):

  unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
  f50cdebc: 00000000f50cdec4 (0xf50cdec4)
  f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
  ...

Instead, base the field width on the size of a long integer so that it
looks right on both x86-32 and x86-64.

x86-32:

  unwind stack type:1 next_sp:  (null) mask:0x2 graph_idx:0
  c0ee9d98: c0ee9de0 (init_thread_union+0x1de0/0x2000)
  c0ee9d9c: c043fd90 (__save_stack_trace+0x50/0xe0)
  ...

x86-64:

  unwind stack type:1 next_sp:          (null) mask:0x2 graph_idx:0
  ffffffff81e03b88: ffffffff81e03c10 (init_thread_union+0x3c10/0x4000)
  ffffffff81e03b90: ffffffff81048f8e (__save_stack_trace+0x5e/0x100)
  ...

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36b743812e7eb291d74af4e5067736736622daad.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:59:47 +02:00
Josh Poimboeuf
a5859c6d7b x86/build: convert function graph '-Os' error to warning
For pre-4.6.0 versions of GCC, which don't have '-mfentry', the
'-maccumulate-outgoing-args' option is required for function graph
tracing in order to avoid GCC bug 42109.

However, GCC ignores '-maccumulate-outgoing-args' when '-Os' is
also set.

Currently we force a build error to prevent that scenario, but that
breaks randconfigs.  So change the error to a warning which also
disables CONFIG_CC_OPTIMIZE_FOR_SIZE.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Link: http://lkml.kernel.org/r/20170418214429.o7fbwbmf4nqosezy@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:57:23 +02:00
Dave Airlie
856ee92e86 Linux 4.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO
 ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1
 YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO
 n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL
 t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U
 1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE=
 =6QWZ
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc7' into drm-next

Backmerge Linux 4.11-rc7 from Linus tree, to fix some
conflicts that were causing problems with the rerere cache
in drm-tip.
2017-04-19 11:07:14 +10:00
Vishal Verma
0dc9c639e6 x86/mce: Make the MCE notifier a blocking one
The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
  in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
  [..]
  Call Trace:
   dump_stack
   ___might_sleep
   __might_sleep
   mutex_lock_nested
   ? __lock_acquire
   nfit_handle_mce
   notifier_call_chain
   atomic_notifier_call_chain
   ? atomic_notifier_call_chain
   mce_gen_pool_process

Convert the notifier to a blocking one which gets to run only in process
context.

Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure
they go out and get logged at least.

Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-18 22:23:48 +02:00
Josh Poimboeuf
e335bb51cc x86/unwind: Ensure stack pointer is aligned
With frame pointers disabled, on some older versions of GCC (like
4.8.3), it's possible for the stack pointer to get aligned at a
half-word boundary:

  00000000000004d0 <fib_table_lookup>:
       4d0:       41 57                   push   %r15
       4d2:       41 56                   push   %r14
       4d4:       41 55                   push   %r13
       4d6:       41 54                   push   %r12
       4d8:       55                      push   %rbp
       4d9:       53                      push   %rbx
       4da:       48 83 ec 24             sub    $0x24,%rsp

In such a case, the unwinder ends up reading the entire stack at the
wrong alignment.  Then the last read goes past the end of the stack,
hitting the stack guard page:

  BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff)
  kernel stack overflow (page fault): 0000 [#1] SMP
  ...

Fix it by ensuring the stack pointer is properly aligned before
unwinding.

Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18 10:30:23 +02:00
Borislav Petkov
415601b191 x86/mce: Update notifier priority check
Update the check which enforces the registration of MCE decoder notifier
callbacks with valid priority only, to include mcelog's priority.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18 10:27:52 +02:00
Dou Liyang
0ccecd95e7 x86/irq: Remove a redundant #ifdef directive
The call to irq_ctx_init() is wrapped in #ifdef CONFIG_X86_32.

The declaration of irq_ctx_init in irq.h provides already a stub inline for
the X86_32=n case.

Remove the redundant #ifdef in the code.

[ tglx: Massaged changelog ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1491811500-30307-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 22:43:01 +02:00
Nicolai Stange
747d04b30e x86/apic/timer: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's apic clockevent driver initialize these fields
properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
CC: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:17 -07:00
Vikas Shivappa
64e8ed3d4a x86/intel_rdt/mba: Add schemata file support for MBA
Add support to update the MBA bandwidth values for the domains via the
schemata file.

 - Verify that the bandwidth value is valid

 - Round to the next control step depending on the bandwidth granularity of
   the hardware

 - Convert the bandwidth to delay values and write the delay values to
   the corresponding domain PQOS_MSRs.

[ tglx: Massaged changelog ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-9-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:09 +02:00
Vikas Shivappa
c6ea67de52 x86/intel_rdt: Make schemata file parsers resource specific
The schemata files are the user space interface to update resource
controls. The parser is hardwired to support only cache resources, which do
not fit the requirements of memory resources.

Add a function pointer for a parser to the struct rdt_resource and switch
the cache parsing over.

[ tglx: Massaged changelog ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-8-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:09 +02:00
Vikas Shivappa
db69ef6563 x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation
The files in the info directory for MBA are as follows:

 num_closids
 	The maximum number of CLOSids available for MBA

 min_bandwidth
 	The minimum memory bandwidth percentage value

 bandwidth_gran
 	The granularity of the bandwidth control in percent for the
	particular CPU SKU. Intermediate values entered are rounded off
	to the previous control step available. Available bandwidth
	control steps are minimum_bandwidth + N * bandwidth_gran.

 delay_linear
 	When set, the OS writes a linear percentage based value to the
	control MSRs ranging from minimum_bandwidth to 100 percent.

	This value is informational and has no influence on the values
	written to the schemata files. The values written to the
	schemata are always bandwidth percentage that is requested.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-7-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa
6a507a6ad8 x86/intel_rdt: Make information files resource specific
Cache allocation and memory bandwidth allocation require different
information files in the resctrl/info directory, but the current
implementation does not allow to have files per resource.

Add the necessary fields to the resource struct and assign the files
dynamically depending on the resource type.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-6-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa
05b93417ce x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)
The MBA feature details like minimum bandwidth supported, bandwidth
granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3.

Setup and initialize the MBA specific extensions to data structures like
global list of RDT resources, RDT resource structure and RDT domain
structure.

[ tglx: Split out the seperate structure and the CBM related parts ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-5-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa
ab66a33b03 x86/intel_rdt/mba: Memory bandwith allocation feature detect
Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
Add supporting data structures to detect feature details which is done
in later patch using CPUID with EAX=10H, ECX= 3.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:07 +02:00
Thomas Gleixner
0921c54769 x86/intel_rdt: Add resource specific msr update function
Updating of Cache and Memory bandwidth QOS MSRs is different.

Add a function pointer to struct rdt_resource and convert the cache part
over.

Based on Vikas all in one patch^Wmess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:07 +02:00
Thomas Gleixner
d3e11b4d6f x86/intel_rdt: Move CBM specific data into a struct
Memory bandwidth allocation requires different information than cache
allocation.

To avoid a lump of data in struct rdt_resource, move all cache related
information into a seperate structure and add that to struct rdt_resource.

Sanitize the data types while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:07 +02:00
Vikas Shivappa
2545e9f51e x86/intel_rdt: Cleanup namespace to support multiple resource types
Lot of data structures and functions are named after cache specific
resources(named after cbm, cache etc). In many cases other non cache
resources may need to share the same data structures/functions.

Generalize such naming to prepare to add more resources like memory
bandwidth.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:07 +02:00
Thomas Gleixner
70a1ee9256 x86/intel_rdt: Organize code properly
Having init functions at random places in the middle of the code is
unintuitive.

Move them close to the init routine and mark them __init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:06 +02:00
Thomas Gleixner
06b57e4550 x86/intel_rdt: Init padding only if a device exists
If no device exists it's pointless to calculate the padding data for the
schemata files.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:06 +02:00
Hans de Goede
7ee06cb2f8 x86: i8259: export legacy_pic symbol
The classic PC rtc-coms driver has a workaround for broken ACPI device
nodes for it which lack an irq resource. This workaround used to
unconditionally hardcode the irq to 8 in these cases.

This was causing irq conflict problems on systems without a legacy-pic
so a recent patch added an if (nr_legacy_irqs()) guard to the
workaround to avoid this irq conflict.

nr_legacy_irqs() uses the legacy_pic symbol under the hood causing
an undefined symbol error if the rtc-cmos code is build as a module.

This commit exports the legacy_pic symbol to fix this.

Cc: rtc-linux@googlegroups.com
Cc: alexandre.belloni@free-electrons.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-04-14 12:08:51 +02:00
Josh Poimboeuf
34a477e529 ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function
graph tracing and then suspend to RAM, it will triple fault and reboot when
it resumes.

The first fault happens when booting a secondary CPU:

startup_32_smp()
  load_ucode_ap()
    prepare_ftrace_return()
      ftrace_graph_is_dead()
        (accesses 'kill_ftrace_graph')

The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.

The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing.  The check makes sure the
stack pointer is a virtual kernel address.  It's a bit of a hack, but
it's not very intrusive and it works well enough.

For reference, here are a few other (more difficult) ways this could
have potentially been fixed:

- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
  is enabled.  (No idea what that would break.)

- Track down load_ucode_ap()'s entire callee tree and mark all the
  functions 'notrace'.  (Probably not realistic.)

- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
  or __cpu_up(), and ensure that the pause facility can be queried from
  real mode.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@kernel.org
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 11:48:51 +02:00
Colin King
ace2fb5a8b x86/boot/e820: Remove a redundant self assignment
Remove a redundant self assignment of table->nr_entries, it does
nothing and is an artifact of code simplification re-work.

Detected by CoverityScan, CID#1428450 ("Self assignment")

Fixes: 441ac2f33dd7 ("x86/boot/e820: Simplify e820__update_table()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 11:43:21 +02:00
Piotr Luc
9ea74f7c70 x86/mce: Enable PPIN for Knights Landing/Mill
Intel Xeon Phi processors (KNL and KNM) support PPIN as well, so add their
CPUIDs to the whitelist of supported processors.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170408172004.8463-1-piotr.luc@intel.com
Link: http://lkml.kernel.org/r/20170413201056.10525-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 10:46:12 +02:00
Josh Poimboeuf
a8b7a92318 x86/unwind: Silence entry-related warnings
A few people have reported unwinder warnings like the following:

  WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value           (null)
  unwind stack type:0 next_sp:          (null) mask:2 graph_idx:0
  ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0)
  ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
  ffffc90000fe7fa8: 0000000000000246 (0x246)
  ffffc90000fe7fb0: 0000000000000000 ...
  ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc)
  ffffc90000fe7fc8: 0000000000000006 (0x6)
  ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5)
  ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0)
  ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0)
  ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970)
  ffffc90000fe7ff0: 0000000000000000 ...
  ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f)

This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:20:06 +02:00
Josh Poimboeuf
6bcdf9d51b x86/unwind: Read stack return address in update_stack_state()
Instead of reading the return address when unwind_get_return_address()
is called, read it from update_stack_state() and store it in the unwind
state.  This enables the next patch to check the return address from
unwind_next_frame() so it can detect an entry code frame.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:19:59 +02:00
Josh Poimboeuf
5ed8d8bb38 x86/unwind: Move common code into update_stack_state()
The __unwind_start() and unwind_next_frame() functions have some
duplicated functionality.  They both call decode_frame_pointer() and set
state->regs and state->bp accordingly.  Move that functionality to a
common place in update_stack_state().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:19:49 +02:00
Wanpeng Li
5a48a62271 x86/kvm: virt_xxx memory barriers instead of mandatory barriers
virt_xxx memory barriers are implemented trivially using the low-level
__smp_xxx macros, __smp_xxx is equal to a compiler barrier for strong
TSO memory model, however, mandatory barriers will unconditional add
memory barriers, this patch replaces the rmb() in kvm_steal_clock() by
virt_rmb().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12 20:17:38 +02:00
Masami Hiramatsu
a8d11cd071 kprobes/x86: Consolidate insn decoder users for copying code
Consolidate x86 instruction decoder users on the path of
copying original code for kprobes.

Kprobes decodes the same instruction a maximum of 3 times when
preparing the instruction buffer:

 - The first time for getting the length of the instruction,
 - the 2nd for adjusting displacement,
 - and the 3rd for checking whether the instruction is boostable or not.

For each time, the actual decoding target address is slightly
different (1st is original address or recovered instruction buffer,
2nd and 3rd are pointing to the copied buffer), but all have
the same instruction.

Thus, this patch also changes the target address to the copied
buffer at first and reuses the decoded "insn" for displacement
adjusting and checking boostability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu
ea1e34fc36 kprobes/x86: Use probe_kernel_read() instead of memcpy()
Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in __recover_probed_insn(),
__recover_optprobed_insn() and __copy_instruction().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu
d0381c81c2 kprobes/x86: Set kprobes pages read-only
Set the pages which is used for kprobes' singlestep buffer
and optprobe's trampoline instruction buffer to readonly.
This can prevent unexpected (or unintended) instruction
modification.

This also passes rodata_test as below.

Without this patch, rodata_test shows a warning:

  WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20
  x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000

With this fix, no W+X pages are found:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.
  rodata_test: all tests were successful

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00
Masami Hiramatsu
490154bc68 kprobes/x86: Make boostable flag boolean
Make arch_specific_insn.boostable to boolean, since it has
only 2 states, boostable or not. So it is better to use
boolean from the viewpoint of code readability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00