9062 Commits

Author SHA1 Message Date
H. Peter Anvin
7b5c4a65cc Linux 3.8-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRAuO3AAoJEHm+PkMAQRiGbfAH/1C3QQKB11aBpYLAw7qijAze
 yOui26UCnwRryxsO8zBCQjGoByy5DvY/Q0zyUCWUE6nf/JFSoKGUHzfJ1ATyzGll
 3vENP6Fnmq0Hgc4t8/gXtXrZ1k/c43cYA2XEhDnEsJlFNmNj2wCQQj9njTNn2cl1
 k6XhZ9U1V2hGYpLL5bmsZiLVI6dIpkCVw8d4GZ8BKxSLUacVKMS7ml2kZqxBTMgt
 AF6T2SPagBBxxNq8q87x4b7vyHYchZmk+9tAV8UMs1ecimasLK8vrRAJvkXXaH1t
 xgtR0sfIp5raEjoFYswCK+cf5NEusLZDKOEvoABFfEgL4/RKFZ8w7MMsmG8m0rk=
 =m68Y
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc5' into x86/mm

The __pa() fixup series that follows touches KVM code that is not
present in the existing branch based on v3.7-rc5, so merge in the
current upstream from Linus.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:31:21 -08:00
Paul Gortmaker
43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Chen Gang
349eab6eb0 x86/process: Change %8s to %s for pr_warn() in release_thread()
the length of dead_task->comm[] is 16 (TASK_COMM_LEN)
on pr_warn(), it is not meaningful to use %8s for task->comm[].

So change it to %s, since the line is not solid anyway.

Additional information:

 %8s  limit the width, not for the original string output length
      if name length is more than 8, it still can be fully displayed.
      if name length is less than 8, the ' ' will be filled before name.

 %.8s truly limit the original string output length (precision)

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Link: http://lkml.kernel.org/n/tip-nridm1zvreai1tgfLjuexDmd@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 18:11:03 +01:00
Alan Cox
c903f0456b x86/msr: Add capabilities check
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.

In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.

Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Horses <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:37:51 +01:00
Maarten Lankhorst
73b664ceb5 x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
I ran out of free entries when I had CONFIG_DMA_API_DEBUG
enabled. Some other archs seem to default to 65536, so increase
this limit for x86 too.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
----
2013-01-24 17:34:18 +01:00
Alexander Gordeev
51906e779f x86/MSI: Support multiple MSIs in presense of IRQ remapping
The MSI specification has several constraints in comparison with
MSI-X, most notable of them is the inability to configure MSIs
independently. As a result, it is impossible to dispatch
interrupts from different queues to different CPUs. This is
largely devalues the support of multiple MSIs in SMP systems.

Also, a necessity to allocate a contiguous block of vector
numbers for devices capable of multiple MSIs might cause a
considerable pressure on x86 interrupt vector allocator and
could lead to fragmentation of the interrupt vectors space.

This patch overcomes both drawbacks in presense of IRQ remapping
and lets devices take advantage of multiple queues and per-IRQ
affinity assignments.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c8bd86ff56b5fc118257436768aaa04489ac0a4c.1353324359.git.agordeev@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:25:12 +01:00
Jan Beulich
9611dc7a8d x86: Convert a few mistaken __cpuinit annotations to __init
The first two are functions serving as initcalls; the SFI one is
only being called from __init code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/50AFB35102000078000AAECA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:12:19 +01:00
Yuanhan Liu
e3e81aca8d x86: Fix a typo
legact -> legacy

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:22:10 +01:00
Youquan Song
923d8697e2 x86/perf: Add IvyBridge EP support
Running the perf utility on a Ivybridge EP server we encounter
"not supported" events:

   <not supported> L1-dcache-loads
   <not supported> L1-dcache-load-misses
   <not supported> L1-dcache-stores
   <not supported> L1-dcache-store-misses
   <not supported> L1-dcache-prefetches
   <not supported> L1-dcache-prefetch-misses

This patch adds support for this processor.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Youquan Song <youquan.song@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:14:04 +01:00
yangyongqiang
9faec5be3a perf/x86: Fix P6 driver section warning
Fix a compile warning - 'a section type conflict' by removing
__initconst.

Signed-off-by: yangyongqiang <yangyongqiang01@baidu.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:04:56 +01:00
ShuoX Liu
0927b482ae perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
These three chip are based on Atom and have different model id.
So add such three id for perf HW event support.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Cc: yanmin_zhang@intel.linux.com
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1356713324-12442-1-git-send-email-shuox.liu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:10:03 +01:00
Bjorn Helgaas
6125bc8b86 x86/time/rtc: Don't print extended CMOS year when reading RTC
We shouldn't print the current century every time we read the
RTC.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130104224146.15189.14874.stgit@bhelgaas.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 14:56:35 +01:00
Ingo Molnar
4913ae3991 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull tracing updates from Steve Rostedt.

This commit:

      tracing: Remove the extra 4 bytes of padding in events

changes the ABI. All involved parties seem to agree that it's safe to
do now, but the devil is in the details ...

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:39:31 +01:00
Alok N Kataria
4cca6ea04d x86/apic: Allow x2apic without IR on VMware platform
This patch updates x2apic initializaition code to allow x2apic
on VMware platform even without interrupt remapping support.
The hypervisor_x2apic_available hook was added in x2apic
initialization code and used by KVM and XEN, before this.
I have also cleaned up that code to export this hook through the
hypervisor_x86 structure.

Compile tested for KVM and XEN configs, this patch doesn't have
any functional effect on those two platforms.

On VMware platform, verified that x2apic is used in physical
mode on products that support this.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Reviewed-by: Doug Covelli <dcovelli@vmware.com>
Reviewed-by: Dan Hecht <dhecht@vmware.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:11:18 +01:00
Cong Ding
b9975dabe3 x86/apb/timer: Remove unnecessary "if"
adev has no chance to be NULL, so we don't need to check it. It
is also dereferenced just before the check .

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1358199561-15518-1-git-send-email-dinggnu@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:03:26 +01:00
Dave Jones
e3f0f36ddf x86/apic: Remove noisy zero-mask warning from default_send_IPI_mask_logical()
Since circa 3.5, we've had dozens of reports of people hitting
this warning. Forwarded reports have been met with silence, so
just remove the warning if no-one cares.

Example reports:

  https://bugzilla.redhat.com/show_bug.cgi?id=797687
  https://bugzilla.redhat.com/show_bug.cgi?id=867174
  https://bugzilla.redhat.com/show_bug.cgi?id=894865

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130118175847.GA27662@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 12:12:42 +01:00
Jan Beulich
444723dccc x86-64: Fix unwind annotations in recent NMI changes
While in one case a plain annotation is necessary, in the other
case the stack adjustment can simply be folded into the
immediately preceding RESTORE_ALL, thus getting the correct
annotation for free.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Link: http://lkml.kernel.org/r/51010C9302000078000B9045@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 10:56:32 +01:00
Linus Torvalds
ed06ef318a perf/urgent fixes:
. revert 20b279 - require exclude_guest to use PEBS - kernel side,
   now older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.
 
 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from
   Sebastian Andrzej Siewior
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7yQhAAoJENZQFvNTUqpAjj4P/RR+8WeXpV02yzndhry+Yjav
 L/WQH0CkRRmyUA5akTcpgFrohJCEOi+auHl7ivmDX4XWFavcAkX3H1Yz1FytCOkb
 Cbb9Lv1rGdlTno8fTVUn8mNyTG64AlWrAw3ICixWw6q6I/k6SO7EkKigCxPhmY+2
 BE2EvkZmOY/PEUXgM6HtUdifORatX48p1toS7p3CDQ31cxBN5OVNZUXa1FakJpyH
 7R+1imKLsjuyi/G7Bt061LyWQkOh7L/ITWN+5Rx4RsUwRRT3vm1H9nlqUBsPS0PW
 qfkktkCmn/cFpKbfBipuglnt16jHPMfI/pghKvzx8n2uJMNEGXbfFDJefpzcdih9
 wIRgB6a5bvA8VF6Xpcn0I5JhqLAcnWTer07JgjZevjqYCdZStpbJjvE5131JjTLw
 Dnm7UshE+VFBcA3iXNX64p/X7WDJSk+SIDsJDuNe57dktFVLw76Ibb55XG18Ex7e
 c9QcIEhD1P19VzOniDZQZNEJhqnu5Vjle/eG+JRVRCm/BgoQFyJuD3EooKPN7hHR
 Op4oqf5RhDf7XNH0+Y4rOdRMZRiumdfcEl6kdcQGPJPycxpD7xCJNzBLWK/BvQgT
 Kl0AEkRC0KE2c5LyFttW+g1Byu1rctlMz2TVJDTskTm0XGOQ9mTzsQBP5rgBVt+b
 hQsMMWNVSI+jfm8bTJwx
 =LTjZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 . revert 20b279 - require exclude_guest to use PEBS - kernel side, now
   older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.

 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI,
   from Sebastian Andrzej Siewior

[ Pulling directly, Ingo would normally pull but has been unresponsive ]

* tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Fix building from 'make perf-*-src-pkg' tarballs
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22 14:32:07 -08:00
Oleg Nesterov
9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
H. Peter Anvin
d29a4a5fe8 This patchset adds support for federated systems where multiple memory
controllers can exist and see each other over multiple PCI domains. This
 basically means that AMD node ids can be more than 8 now and the code
 handling this is taught to incorporate PCI domain into those IDs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7tvQAAoJEBLB8Bhh3lVKAdcP/iiHMUvovmwYL4H/PUUrBmKH
 JZnESlNIxWwRsecIQSqJO+O0SQa489cnpBV59yEMrs7Cu2sOhi7/ubbwzngzannS
 ab/M1GhaXn8UJ8N9NzUnxJ0KW/1cbtN1J3YgyqJ7zx9m058l3KDpDkuIyCejzRGR
 cFQHZUgjIUL7LNaAQmGZnAgsVbVUv+yzZhzeYxXQ2h6445H10pd6JEb6/aZTUFlL
 nYv2ypWdBXr27Mc8FkKdftiFw4dLUEQsNgFbBVJZtKbi4WrSTloP3dWjwo8KCtFz
 1QgvsKlNOIl5DEXl0sSoTc8Jhi05eqPANke2oTu46fSwoHGKP2atmcroSdiHmDDM
 vH6X5K6bs1fNMxqyAjvHUnbUs2aupBMjIxobZaKBLM+arK4sSt+elRGZMZxHtjgh
 tHeMpVvkXDAtNX+C9frDSiFEkj2NNvuCJc8naXdQ1cDNLuanP7dnTfS3ZcHQShoI
 FBzz0RbRkXKXcE1sQRfzT5dmt2gdgDzFjqN4rz4fEOM3+1MFRZ/B/cKLcZ25abvc
 3uweD6e6XSfFnnpz8xkr6eek3CpwHPiNYQWLXq9ick7BW5fjcHiU0Fo2rJO6G8tq
 vRqGU6P81sHlGrE0sLMiaYv9NqTYEERuKO05WSP9ryn3tdGebvpy116DeD5w/olo
 MTcWtwp69J+Eoqr7bb9z
 =ZzvJ
 -----END PGP SIGNATURE-----

Merge tag 'numascale' into x86/platform

This patchset adds support for federated systems where multiple memory
controllers can exist and see each other over multiple PCI domains. This
basically means that AMD node ids can be more than 8 now and the code
handling this is taught to incorporate PCI domain into those IDs.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-22 08:37:34 -08:00
Masami Hiramatsu
f684199f5d kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/
Move arch-dep kprobes stuff under arch/x86/kernel/kprobes.

Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed whitespace and s/__attribute__((packed))/__packed/ ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:37 -05:00
Masami Hiramatsu
e7dbfe349d kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c
Split ftrace-based kprobes code from kprobes, and introduce
CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags.
For the cleanup reason, this also moves kprobe_ftrace check
into skip_singlestep.

Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:36 -05:00
Rusty Russell
373d4d0997 taint: add explicit flag to show whether lock dep is still OK.
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-21 17:17:57 +10:30
H. Peter Anvin
021ef050fc x86-32: Start out cr0 clean, disable paging before modifying cr3/4
Patch

  5a5a51db78e x86-32: Start out eflags and cr4 clean

... made x86-32 match x86-64 in that we initialize %eflags and %cr4
from scratch.  This broke OLPC XO-1.5, because the XO enters the
kernel with paging enabled, which the kernel doesn't expect.

Since we no longer support 386 (the source of most of the variability
in %cr0 configuration), we can simply match further x86-64 and
initialize %cr0 to a fixed value -- the one variable part remaining in
%cr0 is for FPU control, but all that is handled later on in
initialization; in particular, configuring %cr0 as if the FPU is
present until proven otherwise is correct and necessary for the probe
to work.

To deal with the XO case sanely, explicitly disable paging in %cr0
before we muck with %cr3, %cr4 or EFER -- those operations are
inherently unsafe with paging enabled.

NOTE: There is still a lot of 386-related junk in head_32.S which we
can and should get rid of, however, this is intended as a minimal fix
whereas the cleanup can be deferred to the next merge window.

Reported-by: Andres Salomon <dilinger@queued.net>
Tested-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-19 11:01:22 -08:00
Linus Torvalds
5c69bed266 Fixes:
- CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
  - Fix racy vma access spotted by Al Viro
  - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
  - Fix vcpu online/offline BUG:scheduling while atomic..
  - Fix unbound buffer scanning for more than 32 vCPUs.
  - Fix grant table being incorrectly initialized
  - Fix incorrect check in pciback
  - Allow privcmd in backend domains.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ+L7qAAoJEFjIrFwIi8fJLNIH/jUsneraEggWeh0L4GGWZvWL
 cNCf0zjQt/pi1Q5drbleW2/6Wv6s6N1QA9pGRsJ+rrliC73HVTqIWFh0TjpwmCVy
 hZal7jDXOuFVIR7GbGEPn004T6mkEnYDb/O2fyojwMVg0NQYwtMYJfTBkKdjKnmV
 z6sWpQPVqO3/nZ17k2DipYRldbeiqS6LLOiUWd72b2W8bV4ySY5iVPVsqFusSEr6
 PNyW33RPs5H0jEPR1uJlLD+l/uIbENykpEPeAS2uHGlch129+xHH5h79dwYJTbw6
 x5nAOveO9VNJscUoqhpE7YbySzJmrUwxnBerZ6YTW6WCknYXrx4uiVAlfWem7uY=
 =26Sq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
 - Fix racy vma access spotted by Al Viro
 - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
 - Fix vcpu online/offline BUG:scheduling while atomic..
 - Fix unbound buffer scanning for more than 32 vCPUs.
 - Fix grant table being incorrectly initialized
 - Fix incorrect check in pciback
 - Allow privcmd in backend domains.

Fix up whitespace conflict due to ugly merge resolution in Xen tree in
arch/arm/xen/enlighten.c

* tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
  Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
  xen/gntdev: remove erronous use of copy_to_user
  xen/gntdev: correctly unmap unlinked maps in mmu notifier
  xen/gntdev: fix unsafe vma access
  xen/privcmd: Fix mmap batch ioctl.
  Xen: properly bound buffer access when parsing cpu/*/availability
  xen/grant-table: correctly initialize grant table version 1
  x86/xen : Fix the wrong check in pciback
  xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-18 12:02:52 -08:00
Jacob Pan
29c6fb7be1 x86/nmi: export local_touch_nmi() symbol for modules
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-17 22:25:57 +08:00
Andrew Cooper
9174adbee4 xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
This fixes CVE-2013-0190 / XSA-40

There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path.  This can result in the kernel crashing.

In the classic kernel case, the relevant code looked a little like:

        popl %eax      # Error code from hypervisor
        jz 5f
        addl $16,%esp
        jmp iret_exc   # Hypervisor said iret fault
5:      addl $16,%esp
                       # Hypervisor said segment selector fault

Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.

In the PVOPS case, the code looks like:

        popl_cfi %eax         # Error from the hypervisor
        lea 16(%esp),%esp     # Add $16 before choosing fault path
        CFI_ADJUST_CFA_OFFSET -16
        jz 5f
        addl $16,%esp         # Incorrectly adjust %esp again
        jmp iret_exc

It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present.  At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.

This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16 16:17:42 -05:00
Linus Torvalds
2409c873be Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is mainly a workaround for a bug in Sandy Bridge graphics which
  causes corruption of certain memory pages."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
  x86/Sandy Bridge: mark arrays in __init functions as __initconst
  x86/Sandy Bridge: reserve pages when integrated graphics is present
  x86, efi: correct precedence of operators in setup_efi_pci
2013-01-16 09:11:50 -08:00
Bernd Faust
2353b47bff Round the calculated scale factor in set_cyc2ns_scale()
During some experiments with an external clock (in a FPGA), we saw that
the TSC clock drifted approx. 2.5ms per second.

This drift was caused by the current way of calculating the scale.
In our case cpu_khz had a value of 3292725. This resulted in a scale
value of 310. But when doing the calculation by hand it shows that the
actual value is 310.9886188491, so a value of 311 would be more precise.

With this change the value is rounded.

Signed-off-by: Bernd Faust <berndfaust@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:07 -08:00
H. Peter Anvin
e43b3cec71 x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13 20:58:57 -08:00
H. Peter Anvin
ab3cd8670e x86/Sandy Bridge: mark arrays in __init functions as __initconst
Mark static arrays as __initconst so they get removed when the init
sections are flushed.

Reported-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-13 20:36:39 -08:00
Jesse Barnes
a9acc5365d x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table.  So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.

Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.

[ hpa: made a number of stylistic changes, marked arrays as static
  const, and made less verbose; use "memblock=debug" for full
  verbosity. ]

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11 14:26:38 -08:00
Linus Torvalds
ccae663cd4 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: use dynamic percpu allocations for shared msrs area
  KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
  powerpc: Corrected include header path in kvm_para.h
  Add rcu user eqs exception hooks for async page fault
2013-01-10 09:05:18 -08:00
Daniel J Blueman
8b84c8df38 x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
Change amd_get_nb_id to return u16 to support >255 memory controllers,
and related consistency fixes.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
David Ahern
a706d965dc perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
This patch is brought to you by the letter 'H'.

Commit 20b279 breaks compatiblity with older perf binaries when run with
precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
samples) unless host only profiling is requested (:H modifier). The workaround
for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
toggles exclude_guest to 1). This was deemed unacceptable by Linus:

https://lkml.org/lkml/2012/12/12/570

Between family in town and the fresh snow in Breckenridge there is no time left
to be working on the proper fix for this over the holidays. In the New Year I
have more pressing problems to resolve -- like some memory leaks in perf which
are proving to be elusive -- although the aforementioned snow is probably why
they are proving to be elusive. Either way I do not have any spare time to work
on this and from the time I have managed to spend on it the solution is more
difficult than just moving to a new exclude_guest flag (does not work) or
flipping the logic to include_guest (which is not as trivial as one would
think).

So, two options: silently force exclude_guest on as suggested by Gleb which
means no impact to older perf binaries or revert the original patch which
caused the breakage.

This patch does the latter -- reverts the original patch that introduced the
regression. The problem can be revisited in the future as time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-10 09:21:19 -03:00
Greg Kroah-Hartman
a18e3690a5 X86: drivers: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitconst,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Drake <dsd@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:04 -08:00
Jorrit Schippers
d82603c6da treewide: Replace incomming with incoming in all comments and strings
Signed-off-by: Jorrit Schippers <jorrit@ncode.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-03 16:15:49 +01:00
Tejun Heo
4d899be584 x86/mce: don't use [delayed_]work_pending()
There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from x86/mce.  Only compile tested.

v2: Local var work removed from mce_schedule_work() as suggested by
    Borislav.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
2012-12-28 13:40:16 -08:00
Linus Torvalds
54d46ea993 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "sigaltstack infrastructure + conversion for x86, alpha and um,
  COMPAT_SYSCALL_DEFINE infrastructure.

  Note that there are several conflicts between "unify
  SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
  resolution is trivial - just remove definitions of SS_ONSTACK and
  SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
  include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to generic sigaltstack
  new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
  generic compat_sys_sigaltstack()
  introduce generic sys_sigaltstack(), switch x86 and um to it
  new helper: compat_user_stack_pointer()
  new helper: restore_altstack()
  unify SS_ONSTACK/SS_DISABLE definitions
  new helper: current_user_stack_pointer()
  missing user_stack_pointer() instances
  Bury the conditionals from kernel_thread/kernel_execve series
  COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-20 18:05:28 -08:00
Sasha Levin
8f170faeb4 x86, apb_timer: remove unused variable percpu_timer
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-28-git-send-email-sasha.levin@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:49:39 -08:00
Al Viro
c40702c49f new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
note that they are relying on access_ok() already checked by caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:41 -05:00
Al Viro
9026843952 generic compat_sys_sigaltstack()
Again, conditional on CONFIG_GENERIC_SIGALTSTACK

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:41 -05:00
Al Viro
6bf9adfc90 introduce generic sys_sigaltstack(), switch x86 and um to it
Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:40 -05:00
Linus Torvalds
1bd12c91de Merge branch 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull one final 386 removal patch from Peter Anvin.

IRQ 13 FPU error handling is gone.  That was not one of the proudest
moments in PC history.

* 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, 386 removal: Remove support for IRQ 13 FPU error reporting
2012-12-19 13:02:23 -08:00
Li Zhong
9b132fbe54 Add rcu user eqs exception hooks for async page fault
This patch adds user eqs exception hooks for async page fault page not
present code path, to exit the user eqs and re-enter it as necessary.

Async page fault is different from other exceptions that it may be
triggered from idle process, so we still need rcu_irq_enter() and
rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
that needs use rcu.

As Frederic pointed out it would be safest and simplest to protect the
whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
code there deeply for potential RCU uses and ensure it will never be
extended later to use RCU.".

However, We'd better re-enter the cpu idle eqs if we get the exception
in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().

So the patch does what Frederic suggested for rcu_irq_*() API usage
here, except that I moved the rcu_irq_*() pair originally in
do_async_page_fault() into kvm_async_pf_task_wait().

That's because, I think it's better to have rcu_irq_*() pairs to be in
one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
kvm_async_pf_task_wait() has other callers, which might cause
rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
which is illegal if the cpu happens to be in rcu idle state.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-18 15:15:41 +02:00
H. Peter Anvin
bc3eba6068 x86, 386 removal: Remove support for IRQ 13 FPU error reporting
Remove support for FPU error reporting via IRQ 13, as opposed to
exception 16 (#MF).  One last remnant of i386 gone.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
2012-12-17 11:42:40 -08:00
Linus Torvalds
2a74dbb9a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16 15:40:50 -08:00
Linus Torvalds
11520e5e7c Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)"
This reverts commit bd52276fa1d4 ("x86-64/efi: Use EFI to deal with
platform wall clock (again)"), and the two supporting commits:

  da5a108d05b4: "x86/kernel: remove tboot 1:1 page table creation code"

  185034e72d59: "x86, efi: 1:1 pagetable mapping for virtual EFI calls")

as they all depend semantically on commit 53b87cf088e2 ("x86, mm:
Include the entire kernel memory map in trampoline_pgd") that got
reverted earlier due to the problems it caused.

This was pointed out by Yinghai Lu, and verified by me on my Macbook Air
that uses EFI.

Pointed-out-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-15 15:20:41 -08:00
Linus Torvalds
d42b3a2906 Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI update from Peter Anvin:
 "EFI tree, from Matt Fleming.  Most of the patches are the new efivarfs
  filesystem by Matt Garrett & co.  The balance are support for EFI
  wallclock in the absence of a hardware-specific driver, and various
  fixes and cleanups."

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  efivarfs: Make efivarfs_fill_super() static
  x86, efi: Check table header length in efi_bgrt_init()
  efivarfs: Use query_variable_info() to limit kmalloc()
  efivarfs: Fix return value of efivarfs_file_write()
  efivarfs: Return a consistent error when efivarfs_get_inode() fails
  efivarfs: Make 'datasize' unsigned long
  efivarfs: Add unique magic number
  efivarfs: Replace magic number with sizeof(attributes)
  efivarfs: Return an error if we fail to read a variable
  efi: Clarify GUID length calculations
  efivarfs: Implement exclusive access for {get,set}_variable
  efivarfs: efivarfs_fill_super() ensure we clean up correctly on error
  efivarfs: efivarfs_fill_super() ensure we free our temporary name
  efivarfs: efivarfs_fill_super() fix inode reference counts
  efivarfs: efivarfs_create() ensure we drop our reference on inode on error
  efivarfs: efivarfs_file_read ensure we free data in error paths
  x86-64/efi: Use EFI to deal with platform wall clock (again)
  x86/kernel: remove tboot 1:1 page table creation code
  x86, efi: 1:1 pagetable mapping for virtual EFI calls
  x86, mm: Include the entire kernel memory map in trampoline_pgd
  ...
2012-12-14 10:08:40 -08:00
Linus Torvalds
18dd0bf22b Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ACPI update from Peter Anvin:
 "This is a patchset which didn't make the last merge window.  It adds a
  debugging capability to feed ACPI tables via the initramfs.

  On a grander scope, it formalizes using the initramfs protocol for
  feeding arbitrary blobs which need to be accessed early to the kernel:
  they are fed first in the initramfs blob (lots of bootloaders can
  concatenate this at boot time, others can use a single file) in an
  uncompressed cpio archive using filenames starting with "kernel/".

  The ACPI maintainers requested that this patchset be fed via the x86
  tree rather than the ACPI tree as the footprint in the general x86
  code is much bigger than in the ACPI code proper."

* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  X86 ACPI: Use #ifdef not #if for CONFIG_X86 check
  ACPI: Fix build when disabled
  ACPI: Document ACPI table overriding via initrd
  ACPI: Create acpi_table_taint() function to avoid code duplication
  ACPI: Implement physical address table override
  ACPI: Store valid ACPI tables passed via early initrd in reserved memblock areas
  x86, acpi: Introduce x86 arch specific arch_reserve_mem_area() for e820 handling
  lib: Add early cpio decoder
2012-12-14 10:03:23 -08:00