9561 Commits

Author SHA1 Message Date
Ingo Molnar
fafd883f67 Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes before applying new patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:59:13 +02:00
Peter Zijlstra
d8b11a0cbd perf/x86: Clean up cap_user_time* setting
Currently the cap_user_time_zero capability has different tests than
cap_user_time; even though they expose the exact same data.

Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal
with multi cabinet machines and drop the tsc_disabled() check.. non of
this will work sanely without tsc anyway.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:58:55 +02:00
David Herrmann
29d274b8d3 x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning
IORESOURCE_BUSY is used to mark temporary driver mem-resources
instead of global regions. This suppresses warnings if regions
overlap with a region marked as BUSY.

This was always the case for VESA/VGA/EFI framebuffer regions so
do the same for simplefb regions. The reason we do this is to
allow device handover to real GPU drivers like
i915/radeon/nouveau which get the same regions via PCI BARs.

Maybe at some point we will be able to unregister platform
devices properly during the handover. In this case the simplefb
region would get removed before the new region is created.
However, this is currently not the case and would require rather
huge changes in remove_conflicting_framebuffers(). Add the BUSY
marker now and try to eventually rewrite the handover for a next release.

Also see kernel/resource.c for more information:

  /*
   * if a resource is "BUSY", it's not a hardware resource
   * but a driver mapping of such a resource; we don't want
   * to warn for those; some drivers legitimately map only
   * partial hardware resources. (example: vesafb)
   */

This suppresses warnings like:

  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
  Info: mapping multiple BARs. Your kernel is fine.
  Call Trace:
    dump_stack+0x54/0x8d
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_fmt+0x4c/0x50
    iomem_map_sanity_check+0xac/0xe0
    __ioremap_caller+0x2e3/0x390
    ioremap_wc+0x32/0x40
    i915_driver_load+0x670/0xf50 [i915]
    ...

Reported-by: Tom Gundersen <teg@jklm.no>
Tested-by: Tom Gundersen <teg@jklm.no>
Tested-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03 07:51:11 +02:00
Dave Jones
e56e57f661 x86/reboot: Sort reboot DMI quirks by vendor
Grouping them by vendor should make it easier to spot duplicates.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20131001203655.GA10719@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 08:02:24 +02:00
Ingo Molnar
b8d490c3de Merge branch 'irq/core-v6' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core
Pull hardirq and softirq nesting updates from Frederic Weisbecker,
which fix nesting related stack overruns such as:

  http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop

Beyond being a fix, this series also optimizes and reorganizes arch
hardirq/softirq stack processing to be faster and more robust.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:57:37 +02:00
Ingo Molnar
8a60d42d26 Linux 3.12-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSSKOHAAoJEHm+PkMAQRiGeREH/3EqHmJPBzmVoJwR9/ykDoLg
 u+TJTkuxZG220WhgXS7W/0ECyBX0U7yA0bY9PZbqgcdiLjY0veR18/pOhEq5RzHq
 ub8Q+AJdiORF/sq268q7gnNmy3rSCgnrAyHA/bzBtkbisYODwZPYvWQVUjgNZ2dW
 qtW/TE9rjANcUrk8WdOu9oWcwsq4cyG3cscbfHE/JLFy/8tB5GoD158gxKLZsLXk
 uTCeUHMmvFRT56fZwfyvNstA8ozxXcHBmuu6+Ttceky2zeGzp6dOrd+d2SU1Ps3O
 P91x4e/Af4RFEwDczGP6TpSBEf/J/JaqrM1drjhnQHho0hrNRZVUXhADFVADCXY=
 =dOjB
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc3' into irq/core

Merge Linux v3.12-rc3, to refresh the tree from a v3.11 base to a v3.12 base.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:55:54 +02:00
Tom Gundersen
e33a29a5ae x86/simplefb: Fix overflow causing bogus fall-back
On my MacBook Air lfb_size is 4M, which makes the bitshit
overflow (to 256GB - larger than 32 bits), meaning we fall
back to efifb unnecessarily.

Cast to u64 to avoid the overflow.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.no
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:50:40 +02:00
Frederic Weisbecker
7d65f4a655 irq: Consolidate do_softirq() arch overriden implementations
All arch overriden implementations of do_softirq() share the following
common code: disable irqs (to avoid races with the pending check),
check if there are softirqs pending, then execute __do_softirq() on
a specific stack.

Consolidate the common parts such that archs only worry about the
stack switch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
2013-10-01 12:53:25 +02:00
Borislav Petkov
a17bce4d1d x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-01 10:52:30 +02:00
Bartlomiej Zolnierkiewicz
dd96dc32d5 x86 / ACPI: fix incorrect placement of __initdata tag
__initdata tag should not be placed between "struct" and "resource"
because it prevents the variable from being placed in the intended
.init.data section. Fix it.

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30 20:30:18 +02:00
Toshi Kani
6dedcca610 hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
cpu_hotplug_driver_lock() serializes CPU online/offline operations
when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
necessary with the following reason:

 - lock_device_hotplug() now protects CPU online/offline operations,
   including the probe & release interfaces enabled by
   ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
   redundant.
 - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
   is defined, which is misleading and is only enabled on powerpc.

This patch removes the cpu_hotplug_driver_lock() interface.  As
a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
probe & release interface as intended.  There is no functional change
in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30 19:55:51 +02:00
Linus Torvalds
669fc2f0c7 Merge branches 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler, timer and x86 fixes from Ingo Molnar:
 - A context tracking ARM build and functional fix
 - A handful of ARM clocksource/clockevent driver fixes
 - An AMD microcode patch level sysfs reporting fixlet

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm: Fix build error with context tracking calls

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  clocksource: of: Respect device tree node status
  clocksource: exynos_mct: Set IRQ affinity when the CPU goes online
  arm: clocksource: mvebu: Use the main timer as clock source from DT

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix patch level reporting for family 15h
2013-09-28 14:22:17 -07:00
Linus Torvalds
9b565a8051 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A couple of tooling fixlets and a PMU detection printout fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix PMU detection printout when no PMU is detected
  perf symbols: Demangle cloned functions
  perf machine: Fix path unpopulated in machine__create_modules()
  perf tools: Explicitly add libdl dependency
  perf probe: Fix probing symbols with optimization suffix
  perf trace: Add mmap2 handler
  perf kmem: Make it work again on non NUMA machines
2013-09-28 14:21:13 -07:00
Ingo Molnar
8a3da6c7d0 perf/x86: Fix PMU detection printout when no PMU is detected
Ran into this cryptic PMU bootup log recently:

[    0.124047] Performance Events:
[    0.125000] smpboot: ...

Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:

[    0.122381] Performance Events: no PMU driver, software events only.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 15:48:48 +02:00
Borislav Petkov
646e29a178 x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:10:26 +02:00
Borislav Petkov
2a929242ee lockdep, x86/alternatives: Drop ancient lockdep fixup message
It messes up the output of the nodes/cores bootup table and it
is obsolete anyway, see

  17abecfe651c x86: fix up alternatives with lockdep enabled

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143442.GE4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:09:41 +02:00
Suravee Suthikulpanit
accd1e823e x86/microcode/AMD: Fix patch level reporting for family 15h
On AMD family 14h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute
unit. The driver would skip applying the patch on core1, but it
still need to update kernel structures to reflect the proper
patch level.

The current logic is not updating the struct
ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the
/sys/devices/system/cpu/cpu1/microcode/version to report
incorrect patch level as shown below:

  $ grep . cpu?/microcode/version
  cpu0/microcode/version:0x600063d
  cpu1/microcode/version:0x6000626
  cpu2/microcode/version:0x600063d
  cpu3/microcode/version:0x6000626
  cpu4/microcode/version:0x600063d

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <bp@alien8.de>
Cc: <jacob.w.shin@gmail.com>
Cc: <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-27 09:29:27 +02:00
Masoud Sharbiani
b5eafc6f07 x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
Two entries for the same system type were added, with two different vendor
names: 'Dell' and 'Dell, Inc.'.

Since a prefix match is being used by the DMI parsing code, we can eliminate
the latter as redundant.

Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Cc: holt@sgi.com
Link: http://lkml.kernel.org/r/1380216643-4683-1-git-send-email-masoud.sharbiani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:52:37 +02:00
Ingo Molnar
474a83b76f Merge branch 'linus'
Merge in the relevant upstream merge point to queue up dependent patch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:49:53 +02:00
Linus Torvalds
654fdd0412 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "An EFI fix and two reboot-quirk fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
  x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
  x86, efi: Don't map Boot Services on i386
2013-09-25 13:29:18 -07:00
Linus Torvalds
bdc5663fa1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Assorted standalone fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add model number for Avoton Silvermont
  perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
  perf: Update ABI comment
  tools lib lk: Uninclude linux/magic.h in debugfs.c
  perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms()
  perf probe: Fix finder to find lines of given function
  perf session: Check for SIGINT in more loops
  perf tools: Fix compile with libelf without get_phdrnum
  perf tools: Fix buildid cache handling of kallsyms with kcore
  perf annotate: Fix objdump line parsing offset validation
  perf tools: Fill in new definitions for madvise()/mmap() flags
  perf tools: Sharpen the libaudit dependencies test
2013-09-25 13:28:08 -07:00
Li Fei
16c21ae5ca reboot: Allow specifying warm/cold reset for CF9 boot type
In current implementation for reboot type CF9 and CF9_COND,
warm and cold reset are not differentiated, and both are
performed by writing 0x06 to port 0xCF9.

This commit will differentiate warm and cold reset:

For warm reset, write 0x06 to port 0xCF9;
For cold reset, write 0x0E to port 0xCF9.

[ hpa: This meaning of "cold" and "warm" reset is different from other
  reboot types use, where "warm" means "bypass BIOS POST".  It is also
  not entirely clear that it actually solves any actual problem.  However,
  it would seem fairly harmless to offer this additional option.

  Also note that we do not mask bit 3 in the "warm reset" case.  This
  preserves the behavior on existing systems, including ones quirked
  to use CF9.  It seems reasonable that on any system where the
  warm/cold distinction actually matters that bit 3 would be read as
  zero. ]

From: Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
Link: http://lkml.kernel.org/r/1377072837.24556.2.camel@fli24-HP-Compaq-8100-Elite-CMT-PC
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 19:33:22 +02:00
Peter Zijlstra
1a338ac32c sched, x86: Optimize the preempt_schedule() call
Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:23:07 +02:00
Peter Zijlstra
c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:57 +02:00
Peter Zijlstra
bdb4380658 sched: Extract the basic add/sub preempt_count modifiers
Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:

  __preempt_count_add()
  __preempt_count_sub()

and the new:

  __preempt_count_dec_and_test()

And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.

Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:54 +02:00
Peter Zijlstra
ea81174789 sched, idle: Fix the idle polling state logic
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:10 +02:00
Toshi Kani
1cad5e9a39 hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86
Commit d7c53c9e enabled ARCH_CPU_PROBE_RELEASE on x86 in order to
serialize CPU online/offline operations.  Although it is the config
option to enable CPU hotplug test interfaces, probe & release, it is
also the option to enable cpu_hotplug_driver_lock() as well.  Therefore,
this option had to be enabled on x86 with dummy arch_cpu_probe() and
arch_cpu_release().

Since then, lock_device_hotplug() was introduced to serialize CPU
online/offline & hotplug operations.  Therefore, this config option
is no longer required for the serialization.  This patch disables
this config option on x86 and revert the changes made by commit
d7c53c9e.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:10 +02:00
Toshi Kani
574b851e99 hotplug / x86: Add hotplug lock to missing places
lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline
and hotplug operations.  However, this lock is not held in the debug
interfaces below that initiate CPU online/offline operations.

 - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by
   CONFIG_DEBUG_HOTPLUG_CPU0.
 - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface
   enabled by CONFIG_ARCH_CPU_PROBE_RELEASE.

This patch changes the above interfaces to hold lock_device_hotplug().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Toshi Kani
f6913f9902 hotplug / x86: Fix online state in cpu0 debug interface
_debug_hotplug_cpu() is a debug interface that puts cpu0 offline during
boot-up when CONFIG_DEBUG_HOTPLUG_CPU0 is set.  After cpu0 is put offline
in this interface, however, /sys/devices/system/cpu/cpu0/online still
shows 1 (online).

This patch fixes _debug_hotplug_cpu() to update dev->offline when CPU
online/offline operation succeeded.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Dave Jones
7a20c2fad6 x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 08:41:10 +02:00
Mike Travis
0d12ef0c90 x86/UV: Update UV support for external NMI signals
The current UV NMI handler has not been updated for the changes
in the system NMI handler and the perf operations.  The UV NMI
handler reads an MMR in the UV Hub to check to see if the NMI
event was caused by the external 'system NMI' that the operator
can initiate on the System Mgmt Controller.

The problem arises when the perf tools are running, causing
millions of perf events per second on very large CPU count
systems.  Previously this was okay because the perf NMI handler
ran at a higher priority on the NMI call chain and if the NMI
was a perf event, it would stop calling other NMI handlers
remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI
call chain including the UV NMI handler.  This causes the UV NMI
handler to read the MMRs at the same millions per second rate.
This can lead to significant performance loss and possible
system failures.  It also can cause thousands of 'Dazed and
Confused' messages being sent to the system console.  This
effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running,
this code has been optimized to minimize reading of the MMRs as
much as possible, by moving to the NMI_UNKNOWN notifier chain.
This chain is called only when all the users on the standard
NMI_LOCAL call chain have been called and none of them have
claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is
used.  When the perf tools are in use, it's possible that the UV
NMI was captured by some other NMI handler and then either
ignored or mistakenly processed as a perf event.  We set a
per_cpu ('ping') flag for those CPUs that ignored the initial
NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead
checks the in memory flag indicating it was pinged.  There are
two module variables, 'ping_count' indicating how many requested
NMI events occurred, and 'ping_misses' indicating how many stray
NMI events.  These most likely are perf events so it shows the
overhead of the perf NMI interrupts and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the
first cpu entering the NMI handler on each node set a per HUB
in-memory atomic value.  (Having a per HUB value avoids sending
lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Mike Travis
1e019421bc x86/UV: Move NMI support
This patch moves the UV NMI support from the x2apic file to a
new separate uv_nmi.c file in preparation for the next sequence
of patches. It prevents upcoming bloat of the x2apic file, and
has the added benefit of putting the upcoming /sys/module
parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Jiang Liu
7e1f85f96d x86 / ACPI: simplify _acpi_map_lsapic()
In acpi_register_lapic(), it will generates a new logical cpu
number and maps to the local APIC id, this logical cpu number
can be returned to simplify _acpi_map_lsapic() implementation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:40 +02:00
Jiang Liu
d536bf3dc9 ACPI / processor: use apic_id and remove duplicated _MAT evaluation
Since APIC id is saved in processor struct, just use it and
remove the duplicated _MAT evaluation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:39 +02:00
Masoud Sharbiani
4f0acd31c3 x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.

Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: Vinson Lee <vlee@twitter.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:26:08 +02:00
Yan, Zheng
cf3b425dd8 perf/x86/intel: Add model number for Avoton Silvermont
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:22:00 +02:00
Peter Zijlstra
92519bbc8a perf/x86/intel: Fix build warning in intel_pmu_drain_pebs_nhm()
Fengguang Wu reported this build warning:

    arch/x86/kernel/cpu/perf_event_intel_ds.c: In function 'intel_pmu_drain_pebs_nhm':
    arch/x86/kernel/cpu/perf_event_intel_ds.c:964:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int'

Because pointer arithmetics result type is bitness dependent there's no natural
type to use here, cast it to long.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jbpauwxJqtf24luewcsdFith@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 13:15:27 +02:00
Peter Zijlstra
fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74e9 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Yan, Zheng
73c4427c6c perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:36 +02:00
Peter Zijlstra
eb8417aa70 perf/x86/intel: Remove division from the intel_pmu_drain_pebs_nhm() hot path
Only do the division in case we have to print the result out in a warning.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-43nl31erfbajwpfj254f6zji@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:53:44 +02:00
Linus Torvalds
f42bcf1aa8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/lpss: Add pin control support to Intel low power subsystem
  perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
  x86: Remove now-unused save_rest()
  x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
2013-09-18 11:26:17 -05:00
Linus Torvalds
186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Stephane Eranian
9d8e3f9693 perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-14 08:00:18 +02:00
Linus Torvalds
75acebf242 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixes.

  The -g perf report lockup you reported is only partially addressed,
  patches that fix the excessive runtime are still being worked on"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix uncore PCI fixed counter handling
  uprobes: Fix utask->depth accounting in handle_trampoline()
  perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
  perf: Fix up MMAP2 buffer space reservation
  perf tools: Add attr->mmap2 support
  perf kvm: Fix sample_type manipulation
  perf evlist: Fix id pos in perf_evlist__open()
  perf trace: Handle perf.data files with no tracepoints
  perf session: Separate progress bar update when processing events
  perf trace: Check if MAP_32BIT is defined
  perf hists: Fix formatting of long symbol names
  perf evlist: Fix parsing with no sample_id_all bit set
  perf tools: Add test for parsing with no sample_id_all bit
  perf trace: Check control+C more often
2013-09-12 10:44:54 -07:00
Ingo Molnar
7f2ee91f54 perf/x86/intel: Clean up EVENT_ATTR_STR() muck
Make the code a bit more readable by removing stray whitespaces et al.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:17:50 +02:00
Peter Zijlstra
d2beea4a34 perf/x86/intel: Clean-up/reduce PEBS code
Get rid of some pointless duplication introduced by the Haswell code.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8q6y4davda9aawwv5yxe7klp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Peter Zijlstra
2b9e344df3 perf/x86/intel: Clean up checkpoint-interrupt bits
Clean up the weird CP interrupt exception code by keeping a CP mask.

Andi suggested this implementation but weirdly didn't actually
implement it himself, do so now because it removes the conditional in
the interrupt handler and avoids the assumption its only on cnt2.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Andi Kleen
4b2c4f1f1b perf/x86/intel: Add Haswell TSX event aliases
Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM.  They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events:

 tx-start	Count start transaction (used by perf stat -T)
 tx-commit	Count commit of transaction
 tx-abort	Count all aborts
 tx-conflict	Count aborts due to conflict with another CPU.
 tx-capacity	Count capacity aborts (transaction too large)

Then matching el-* events for HLE

 cycles-t	Transactional cycles (used by perf stat -T)
  * also exists on POWER8

 cycles-ct	Transactional cycles commited (used by perf stat -T)
  * according to Michael Ellerman POWER8 has a cycles-transactional-committed,
  * perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those
are more specialized for very specific situations.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:36 +02:00
Andi Kleen
748e86aa90 perf/x86: Report TSX transaction abort cost as weight
Use the existing weight reporting facility to report the transaction
abort cost, that is the number of cycles wasted in aborts.
Haswell reports this in the PEBS record.

This was in fact the original user for weight.

This is a very useful sort key to concentrate on the most
costly aborts and a good metric for TSX tuning.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:34 +02:00
Andi Kleen
2dbf0116aa perf/x86/intel: Avoid checkpointed counters causing excessive TSX aborts
With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with
a ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:

- Using the full counter width for counting counters (earlier patch)

- Forbid sampling for checkpointed counters. It's not too useful anyways,
  checkpointing is mainly for counting. The check is approximate
  (to still handle KVM), but should catch the majority of cases.

- On a PMI always set back checkpointed counters to zero.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:33 +02:00