IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The error path of uncore_type_init() frees up any allocations
that were made along the way, but it relies upon type->pmus
being set, which only happens if the function succeeds. As
type->pmus remains null in this case, the call to
uncore_type_exit will do nothing.
Moving the assignment earlier will allow us to actually free
those allocations should something go awry.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306172028.GA552@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
411cf180fa00 perf/x86/uncore: fix initialization of cpumask
introduced the function uncore_cpumask_init(), which is only
called in __init intel_uncore_init(). But it is not marked
with __init, which produces the following warning:
WARNING: vmlinux.o(.text+0x2464a): Section mismatch in reference from the function uncore_cpumask_init() to the function .init.text:uncore_cpu_setup()
The function uncore_cpumask_init() references
the function __init uncore_cpu_setup().
This is often because uncore_cpumask_init lacks a __init
annotation or the annotation of uncore_cpu_setup is wrong.
This patch marks uncore_cpumask_init() with __init.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1394013516-4964-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch restores the changes of commit dff38e3e93 "x86: Use inline
assembler instead of global register variable to get sp". They got lost
in commit 198d208df4 "x86: Keep thread_info on thread stack in x86_32"
while moving the code to arch/x86/kernel/irq_32.c.
Quoting Andi from commit dff38e3e93:
"""
LTO in gcc 4.6/47. has trouble with global register variables. They were
used to read the stack pointer. Use a simple inline assembler statement
with a mov instead.
This also helps LLVM/clang, which does not support global register
variables.
"""
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/1394178752-18047-1-git-send-email-minipli@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
It's an enum, not a #define, you can't use it in asm files.
Introduced in commit 5fa10196bdb5 ("x86: Ignore NMIs that come in during
early boot"), and sadly I didn't compile-test things like I should have
before pushing out.
My weak excuse is that the x86 tree generally doesn't introduce stupid
things like this (and the ARM pull afterwards doesn't cause me to do a
compile-test either, since I don't cross-compile).
Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don Zickus reports:
A customer generated an external NMI using their iLO to test kdump
worked. Unfortunately, the machine hung. Disabling the nmi_watchdog
made things work.
I speculated the external NMI fired, caused the machine to panic (as
expected) and the perf NMI from the watchdog came in and was latched.
My guess was this somehow caused the hang.
----
It appears that the latched NMI stays latched until the early page
table generation on 64 bits, which causes exceptions to happen which
end in IRET, which re-enable NMI. Therefore, ignore NMIs that come in
during early execution, until we have proper exception handling.
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
Ftrace modifies function calls using Int3 breakpoints on x86.
The breakpoints are handled only when the patching is in progress.
If something goes wrong, there is a recovery code that removes
the breakpoints. If this fails, the system might get silently
rebooted when a remaining break is not handled or an invalid
instruction is proceed.
We should BUG() when the breakpoint could not be removed. Otherwise,
the system silently crashes when the function finishes the Int3
handler is disabled.
Note that we need to modify remove_breakpoint() to return non-zero
value only when there is an error. The return value was ignored before,
so it does not cause any troubles.
Link: http://lkml.kernel.org/r/1393258342-29978-4-git-send-email-pmladek@suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the data parameter is not really used by any ftrace_dyn_arch_init,
remove that from ftrace_dyn_arch_init. This also removes the addr
local variable from ftrace_init which is now unused.
Note the documentation was imprecise as it did not suggest to set
(*data) to 0.
Link: http://lkml.kernel.org/r/1393268401-24379-4-git-send-email-jslaby@suse.cz
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
No architecture uses the "data" parameter in ftrace_dyn_arch_init() in any
way, it just sets the value to 0. And this is used as a return value
in the caller -- ftrace_init, which just checks the retval against
zero.
Note there is also "return 0" in every ftrace_dyn_arch_init. So it is
enough to check the retval and remove all the indirect sets of data on
all archs.
Link: http://lkml.kernel.org/r/1393268401-24379-3-git-send-email-jslaby@suse.cz
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Having ftrace_write() return -EPERM on failure, as that's what the callers
return, then we can clean up the code a bit. That is, instead of:
if (ftrace_write(...))
return -EPERM;
return 0;
or
if (ftrace_write(...)) {
ret = -EPERM;
goto_out;
}
We can instead have:
return ftrace_write(...);
or
ret = ftrace_write(...);
if (ret)
goto out;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The dump_trace() function in dumpstack_64.c is hard to follow.
The test for exception stack is processed differently than the
test for irq stack, and the normal stack is outside completely.
By restructuring this code to have all the stacks determined by
a single function that returns an enum of the following:
STACK_IS_NORMAL
STACK_IS_EXCEPTION
STACK_IS_IRQ
STACK_IS_UNKNOWN
and has the logic of each within a switch statement.
This should make the code much easier to read and understand.
Link: http://lkml.kernel.org/r/20110806012354.684598995@goodmis.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144322.086050042@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.
x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.
Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).
The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).
By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.
The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Only CF9_COND is appropriate for inclusion in the default chain, not
CF9; the latter will poke that register unconditionally, whereas
CF9_COND will at least look for PCI configuration method #1 or #2
first (a weak check, but better than nothing.)
CF9 should be used for explicit system configuration (command line or
DMI) only.
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reboot is the last service linux OS provides to the end user. We are
supposed to make this function more robust than today. This patch adds
all of the known reboot methods into the default attempt list. The
machines requiring reboot=efi or reboot=p or reboot=bios get a chance
to reboot automatically now.
If there is a new reboot method emerged, we are supposed to add it to
the default list as well, instead of adding the endless dmidecode entry.
If one method required is in the default list in this patch but the
machine reboot still hangs, that means some methods ahead of the
required method cause the system hangs, then reboot the machine by
passing reboot= arguments and submit the reboot dmidecode table quirk.
We are supposed to remove the reboot dmidecode table from the kernel,
but to be safe, we keep it. This patch prevents us from adding more.
If you happened to have a machine listed in the reboot dmidecode
table and this patch makes reboot work on your machine, please submit
a patch to remove the quirk.
The default reboot order with this patch is now:
ACPI > KBD > ACPI > KBD > EFI > CF9_COND > BIOS
Because BIOS and TRIPLE are mutually exclusive (either will either
work or hang the machine) that method is not included.
[ hpa: as with any changes to the reboot order, this patch will have
to be monitored carefully for regressions. ]
Signed-off-by: Aubrey Li <aubrey.li@intel.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Compiling last minute changes without setting the proper config
options is not really clever.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
causes a crash during boot - Borislav Petkov
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ
ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL
Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6
fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv
JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5
tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R
IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv
bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O
yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b
PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF
6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa
PsVs9t/l7TZoL1MQHlEJ
=YIab
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* Disable the new EFI 1:1 virtual mapping for SGI UV because using it
causes a crash during boot - Borislav Petkov
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,
kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Call Trace:
[<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
[<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
[<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
[<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
[<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
[<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
[<ffffffff8153d955>] ? printk+0x72/0x74
[<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
[<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
[<ffffffff81535530>] ? rest_init+0x74/0x74
[<ffffffff81535539>] kernel_init+0x9/0xff
[<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
[<ffffffff81535530>] ? rest_init+0x74/0x74
Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Commit 1aec16967 (x86: Hyperv: Cleanup the irq mess) removed the
ability to build the hyperv stuff as a module. Bring it back.
Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
The vmbus/hyperv interrupt handling is another complete trainwreck and
probably the worst of all currently in tree.
If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens
via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but:
The driver requests first a normal device interrupt. The only reason
to do so is to increment the interrupt stats of that device
interrupt. For no reason it also installs a private flow handler.
We have proper accounting mechanisms for direct vectors, but of
course it's too much effort to add that 5 lines of code.
Aside of that the alloc_intr_gate() is not protected against
reallocation which makes module reload impossible.
Solution to the problem is simple to rip out the whole mess and
implement it correctly.
First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and
merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation
protection and use the proper direct vector accounting mechanism.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
HyperV abuses a device interrupt to account for the
HYPERVISOR_CALLBACK_VECTOR.
Provide proper accounting as we have for the other vectors as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212738.681855582@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As we grow support for more EFI architectures they're going to want the
ability to query which EFI features are available on the running system.
Instead of storing this information in an architecture-specific place,
stick it in the global 'struct efi', which is already the central
location for EFI state.
While we're at it, let's change the return value of efi_enabled() to be
bool and replace all references to 'facility' with 'feature', which is
the usual word used to describe the attributes of the running system.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
If a failure occurs while modifying ftrace function, it bails out and will
remove the tracepoints to be back to what the code originally was.
There is missing the final sync run across the CPUs after the fix up is done
and before the ftrace int3 handler flag is reset.
Here's the description of the problem:
CPU0 CPU1
---- ----
remove_breakpoint();
modifying_ftrace_code = 0;
[still sees breakpoint]
<takes trap>
[sees modifying_ftrace_code as zero]
[no breakpoint handler]
[goto failed case]
[trap exception - kernel breakpoint, no
handler]
BUG()
Link: http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmladek@suse.cz
Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If a failure occurs while enabling a trace, it bails out and will remove
the tracepoints to be back to what the code originally was. But the fix
up had some bugs in it. By injecting a failure in the code, the fix up
ran to completion, but shortly afterward the system rebooted.
There was two bugs here.
The first was that there was no final sync run across the CPUs after the
fix up was done, and before the ftrace int3 handler flag was reset. That
means that other CPUs could still see the breakpoint and trigger on it
long after the flag was cleared, and the int3 handler would think it was
a spurious interrupt. Worse yet, the int3 handler could hit other breakpoints
because the ftrace int3 handler flag would have prevented the int3 handler
from going further.
Here's a description of the issue:
CPU0 CPU1
---- ----
remove_breakpoint();
modifying_ftrace_code = 0;
[still sees breakpoint]
<takes trap>
[sees modifying_ftrace_code as zero]
[no breakpoint handler]
[goto failed case]
[trap exception - kernel breakpoint, no
handler]
BUG()
The second bug was that the removal of the breakpoints required the
"within()" logic updates instead of accessing the ip address directly.
As the kernel text is mapped read-only when CONFIG_DEBUG_RODATA is set, and
the removal of the breakpoint is a modification of the kernel text.
The ftrace_write() includes the "within()" logic, where as, the
probe_kernel_write() does not. This prevented the breakpoint from being
removed at all.
Link: http://lkml.kernel.org/r/1392650573-3390-1-git-send-email-pmladek@suse.cz
Reported-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull perf fixes from Ingo Molnar:
"Misc fixes, most of them on the tooling side"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix strict alias issue for find_first_bit
perf tools: fix BFD detection on opensuse
perf: Fix hotplug splat
perf/x86: Fix event scheduling
perf symbols: Destroy unused symsrcs
perf annotate: Check availability of annotate when processing samples
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
We call this "clflush" in /proc/cpuinfo, and have
cpu_has_clflush()... let's be consistent and just call it that.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org
The NUMAQ support seems to be unmaintained, remove it.
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com
The SGI Visual Workstation seems to be dead; remove support so we
don't have to continue maintaining it.
Cc: Andrey Panin <pazke@donpac.ru>
Cc: Michael Reed <mdr@sgi.com>
Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add a few comments on the ->add(), ->del() and ->*_txn()
implementation.
Requested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-he3819318c245j7t5e1e22tr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.
This is I think the relevant bit:
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
> pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
So we add the insn:p event (fd[23]).
At this point we should have:
n_events = 2, n_added = 1, n_txn = 1
> pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
> pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.
group_sched_in()
pmu->start_txn() /* nop - BP pmu */
event_sched_in()
event->pmu->add()
So here we should end up with:
0: n_events = 3, n_added = 2, n_txn = 2
4: n_events = 4, n_added = 3, n_txn = 3
But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.
Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.
But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.
However, since we try and schedule 4 it means the 0 event must have
succeeded! Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:
event_sched_out()
event->pmu->del()
on 0 and the BP event.
Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:
n_event = 2, n_added = 2, n_txn = 2
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Randomize the load address of modules in the kernel to make kASLR
effective for modules. Modules can only be loaded within a particular
range of virtual address space. This patch adds 10 bits of entropy to
the load address by adding 1-1024 * PAGE_SIZE to the beginning range
where modules are loaded.
The single base offset was chosen because randomizing each module
load ends up wasting/fragmenting memory too much. Prior approaches to
minimizing fragmentation while doing randomization tend to result in
worse entropy than just doing a single base address offset.
Example kASLR boot without this change, with a single module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0001000 4K ro GLB x pte
0xffffffffc0001000-0xffffffffc0002000 4K ro GLB NX pte
0xffffffffc0002000-0xffffffffc0004000 8K RW GLB NX pte
0xffffffffc0004000-0xffffffffc0200000 2032K pte
0xffffffffc0200000-0xffffffffff000000 1006M pmd
---[ End Modules ]---
Example kASLR boot after this change, same module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0200000 2M pmd
0xffffffffc0200000-0xffffffffc03bf000 1788K pte
0xffffffffc03bf000-0xffffffffc03c0000 4K ro GLB x pte
0xffffffffc03c0000-0xffffffffc03c1000 4K ro GLB NX pte
0xffffffffc03c1000-0xffffffffc03c3000 8K RW GLB NX pte
0xffffffffc03c3000-0xffffffffc0400000 244K pte
0xffffffffc0400000-0xffffffffff000000 1004M pmd
---[ End Modules ]---
Signed-off-by: Andy Honig <ahonig@google.com>
Link: http://lkml.kernel.org/r/20140226005916.GA27083@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging.
[ hpa: pushing this for v3.14 to avoid having a kernel version with
kASLR where we can't debug output. ]
Signed-off-by: Eugene Surovegin <surovegin@google.com>
Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 fixes from Thomas Gleixner:
- a bugfix which prevents a divide by 0 panic when the newly introduced
try_msr_calibrate_tsc() fails
- enablement of the Baytrail platform to utilize the newfangled msr
based calibration
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: tsc: Add missing Baytrail frequency to the table
x86, tsc: Fallback to normal calibration if fast MSR calibration fails
These days hv_clock allocation is memblock based (i.e. the percpu
allocator is not involved), which means that the physical address
of each of the per-cpu hv_clock areas is guaranteed to remain
unchanged through all its lifetime and we do not need to update
its location after CPU bring-up.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch updates the CBOX PMU filters mapping tables for SNB-EP
and IVT (model 45 and 62 respectively).
The NID umask always comes in addition to another umask.
When set, the NID filter is applied.
The current mapping tables were missing some code/umask
combinations to account for the NID umask. This patch
fixes that.
Cc: mingo@elte.hu
Cc: ak@linux.intel.com
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140219131018.GA24475@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current code simply assumes Intel Arch PerfMon v2+ to have
the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.
This was found by KVM which implements v2+ but didn't provide the
capabilities MSR. Change the code to DTRT; KVM will also implement the
MSR and return 0.
Cc: pbonzini@redhat.com
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When using BTS on Core i7-4*, I get the below kernel warning.
$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317920] Do you have a strange power saving mode enabled?
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317945] Dazed and confused, but trying to continue
Make intel_pmu_handle_irq() take the full exit path when returning early.
Cc: eranian@google.com
Cc: peterz@infradead.org
Cc: mingo@kernel.org
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch is needed because that PMU uses 32-bit free
running counters with no interrupt capabilities.
On SNB/IVB/HSW, we used 20GB/s theoretical peak to calculate
the hrtimer timeout necessary to avoid missing an overflow.
That delay is set to 5s to be on the cautious side.
The SNB IMC uses free running counters, which are handled
via pseudo fixed counters. The SNB IMC PMU implementation
supports an arbitrary number of events, because the counters
are read-only. Therefore it is not possible to track active
counters. Instead we put active events on a linked list which
is then used by the hrtimer handler to update the SW counts.
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-8-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>