IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* master: (848 commits)
SELinux: Fix RCU deref check warning in sel_netport_insert()
binary_sysctl(): fix memory leak
mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
ipmi_watchdog: restore settings when BMC reset
oom: fix integer overflow of points in oom_badness
memcg: keep root group unchanged if creation fails
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
nilfs2: unbreak compat ioctl
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mmc: vub300: fix type of firmware_rom_wait_states module parameter
Revert "mmc: enable runtime PM by default"
mmc: sdhci: remove "state" argument from sdhci_suspend_host
x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
cgroups: fix a css_set not found bug in cgroup_attach_proc
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
...
Conflicts:
kernel/cgroup_freezer.c
Mathieu Desnoyers pointed out a case that can cause issues with
NMIs running on the debug stack:
int3 -> interrupt -> NMI -> int3
Because the interrupt changes the stack, the NMI will not see that
it preempted the debug stack. Looking deeper at this case,
interrupts only happen when the int3 is from userspace or in
an a location in the exception table (fixup).
userspace -> int3 -> interurpt -> NMI -> int3
All other int3s that happen in the kernel should be processed
without ever enabling interrupts, as the do_trap() call will
panic the kernel if it is called to process any other location
within the kernel.
Adding a counter around the sections that enable interrupts while
using the debug stack allows the NMI to also check that case.
If the NMI sees that it either interrupted a task using the debug
stack or the debug counter is non-zero, then it will have to
change the IDT table to make the int3 not change stacks (which will
corrupt the stack if it does).
Note, I had to move the debug_usage functions out of processor.h
and into debugreg.h because of the static inlined functions to
inc and dec the debug_usage counter. __get_cpu_var() requires
smp.h which includes processor.h, and would fail to build.
Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.com
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We want to allow NMI handlers to have breakpoints to be able to
remove stop_machine from ftrace, kprobes and jump_labels. But if
an NMI interrupts a current breakpoint, and then it triggers a
breakpoint itself, it will switch to the breakpoint stack and
corrupt the data on it for the breakpoint processing that it
interrupted.
Instead, have the NMI check if it interrupted breakpoint processing
by checking if the stack that is currently used is a breakpoint
stack. If it is, then load a special IDT that changes the IST
for the debug exception to keep the same stack in kernel context.
When the NMI is done, it puts it back.
This way, if the NMI does trigger a breakpoint, it will keep
using the same stack and not stomp on the breakpoint data for
the breakpoint it interrupted.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Put the logic to compute the event index into a per pmu method. This
is required because the x86 rules are weird and wonderful and don't
match the capabilities of the current scheme.
AFAIK only powerpc actually has a usable userspace read of the PMCs
but I'm not at all sure anybody actually used that.
ARM is restored to the default since it currently does not support
userspace access at all. And all software events are provided with a
method that reports their index as 0 (disabled).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/n/tip-dfydxodki16lylkt3gl2j7cw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the encoding and definitions necessary for the
unhalted_reference_cycles event avaialble since Intel Core 2 processors.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323559734-3488-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space. However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files. The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:
text data bss dec hex filename
4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before
4737444 506459 972040 6215943 5ed907 vmlinux.o.after
for a difference of 276 bytes for an example UP config.
If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LAPIC related statistics are grouped inside the per-cpu
structure irq_stat, so there is no need for icr_read_retry_count
to be a standalone per-cpu variable.
This patch moves icr_read_retry_count to where it belongs.
Suggested-y: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
gcc noticed (when using -Wempty-body) that our use of
lock_cmos() and unlock_cmos() in
arch/x86/include/asm/mach_traps.h is potentially problematic :
arch/x86/include/asm/mach_traps.h:32:15: warning: suggest braces around empty body in an ¡else¢ statement [-Wempty-body]
arch/x86/include/asm/mach_traps.h:40:16: warning: suggest braces around empty body in an ¡else¢ statement [-Wempty-body]
Let's just use the standard 'do {} while (0)' solution. That
shuts up gcc and also prevents future problems if the macros
should end up being used in a similar situation elsewhere.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1112180103130.21784@swampdragon.chaosbits.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If one builds the kernel with -Wempty-body one gets this
warning:
mm/memory.c:3432:46: warning: suggest braces around empty body in an ¡if¢ statement [-Wempty-body]
due to the fact that 'flush_tlb_fix_spurious_fault' is a macro
that can sometimes be defined to nothing.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-mm@kvack.org
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1112180128070.21784@swampdragon.chaosbits.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mce-inject provides a mechanism to simulate errors so that test
scripts can check for correct operation of the kernel without
requiring any specialized hardware to create rare events.
The existing code can simulate events in normal process context
and also in NMI context - but not in IRQ context. This patch
fills that gap.
Link: https://lkml.org/lkml/2011/12/7/537
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
fls(N), ffs(N) and fls64(N) can be optimised on x86_64. Currently they use a
CMOV instruction after the BSR/BSF to set the destination register to -1 if the
value to be scanned was 0 (in which case BSR/BSF set the Z flag).
Instead, according to the AMD64 specification, we can make use of the fact that
BSR/BSF doesn't modify its output register if its input is 0. By preloading
the output with -1 and incrementing the result, we achieve the desired result
without the need for a conditional check.
The Intel x86_64 specification, however, says that the result of BSR/BSF in
such a case is undefined. That said, when queried, one of the Intel CPU
architects said that the behaviour on all Intel CPUs is that:
(1) with BSRQ/BSFQ, the 64-bit destination register is written with its
original value if the source is 0, thus, in essence, giving the effect we
want. And,
(2) with BSRL/BSFL, the lower half of the 64-bit destination register is
written with its original value if the source is 0, and the upper half is
cleared, thus giving us the effect we want (we return a 4-byte int).
Further, it was indicated that they (Intel) are unlikely to get away with
changing the behaviour.
It might be possible to optimise the 32-bit versions of these functions, but
there's a lot more variation, and so the effective non-destructive property of
BSRL/BSRF cannot be relied on.
[ hpa: specifically, some 486 chips are known to NOT have this property. ]
I have benchmarked these functions on my Core2 Duo test machine using the
following program:
#include <stdlib.h>
#include <stdio.h>
#ifndef __x86_64__
#error
#endif
#define PAGE_SHIFT 12
typedef unsigned long long __u64, u64;
typedef unsigned int __u32, u32;
#define noinline __attribute__((noinline))
static __always_inline int fls64(__u64 x)
{
long bitpos = -1;
asm("bsrq %1,%0"
: "+r" (bitpos)
: "rm" (x));
return bitpos + 1;
}
static inline unsigned long __fls(unsigned long word)
{
asm("bsr %1,%0"
: "=r" (word)
: "rm" (word));
return word;
}
static __always_inline int old_fls64(__u64 x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}
static noinline // __attribute__((const))
int old_get_order(unsigned long size)
{
int order;
size = (size - 1) >> (PAGE_SHIFT - 1);
order = -1;
do {
size >>= 1;
order++;
} while (size);
return order;
}
static inline __attribute__((const))
int get_order_old_fls64(unsigned long size)
{
int order;
size--;
size >>= PAGE_SHIFT;
order = old_fls64(size);
return order;
}
static inline __attribute__((const))
int get_order(unsigned long size)
{
int order;
size--;
size >>= PAGE_SHIFT;
order = fls64(size);
return order;
}
unsigned long prevent_optimise_out;
static noinline unsigned long test_old_get_order(void)
{
unsigned long n, total = 0;
long rep, loop;
for (rep = 1000000; rep > 0; rep--) {
for (loop = 0; loop <= 16384; loop += 4) {
n = 1UL << loop;
total += old_get_order(n);
}
}
return total;
}
static noinline unsigned long test_get_order_old_fls64(void)
{
unsigned long n, total = 0;
long rep, loop;
for (rep = 1000000; rep > 0; rep--) {
for (loop = 0; loop <= 16384; loop += 4) {
n = 1UL << loop;
total += get_order_old_fls64(n);
}
}
return total;
}
static noinline unsigned long test_get_order(void)
{
unsigned long n, total = 0;
long rep, loop;
for (rep = 1000000; rep > 0; rep--) {
for (loop = 0; loop <= 16384; loop += 4) {
n = 1UL << loop;
total += get_order(n);
}
}
return total;
}
int main(int argc, char **argv)
{
unsigned long total;
switch (argc) {
case 1: total = test_old_get_order(); break;
case 2: total = test_get_order_old_fls64(); break;
default: total = test_get_order(); break;
}
prevent_optimise_out = total;
return 0;
}
This allows me to test the use of the old fls64() implementation and the new
fls64() implementation and also to contrast these to the out-of-line loop-based
implementation of get_order(). The results were:
warthog>time ./get_order
real 1m37.191s
user 1m36.313s
sys 0m0.861s
warthog>time ./get_order x
real 0m16.892s
user 0m16.586s
sys 0m0.287s
warthog>time ./get_order x x
real 0m7.731s
user 0m7.727s
sys 0m0.002s
Using the current upstream fls64() as a basis for an inlined get_order() [the
second result above] is much faster than using the current out-of-line
loop-based get_order() [the first result above].
Using my optimised inline fls64()-based get_order() [the third result above]
is even faster still.
[ hpa: changed the selection of 32 vs 64 bits to use CONFIG_X86_64
instead of comparing BITS_PER_LONG, updated comments, rebased manually
on top of 83d99df7c4bf x86, bitops: Move fls64.h inside __KERNEL__ ]
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We would include <asm-generic/bitops/fls64.h> even without __KERNEL__,
but that doesn't make sense, as:
1. That file provides fls64(), but the corresponding function fls() is
not exported to user space.
2. The implementation of fls64.h uses kernel-only symbols.
3. fls64.h is not exported to user space.
This appears to have been a bug introduced in checkin:
d57594c203b1 bitops: use __fls for fls64 on 64-bit archs
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/4EEA77E1.6050009@zytor.com
They had several problems/shortcomings:
Only the first memory operand was mentioned in the 2x32bit asm()
operands, and 2x64-bit version had a memory clobber. The first
allowed the compiler to not recognize the need to re-load the
data in case it had it cached in some register, and the second
was overly destructive.
The memory operand in the 2x32-bit asm() was declared to only be
an output.
The types of the local copies of the old and new values were
incorrect (as in other per-CPU ops, the types of the per-CPU
variables accessed should be used here, to make sure the
respective types are compatible).
The __dummy variable was pointless (and needlessly initialized
in the 2x32-bit case), given that local copies of the inputs
already exist.
The 2x64-bit variant forced the address of the first object into
%rsi, even though this is needed only for the call to the
emulation function. The real cmpxchg16b can operate on an
memory.
At once also change the return value type to what it really is -
'bool'.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
In the IPI delivery slow path (NMI delivery) we retry the ICR
read to check for delivery completion a limited number of times.
[ The reason for the limited retries is that some of the places
where it is used (cpu boot, kdump, etc) IPI delivery might not
succeed (due to a firmware bug or system crash, for example)
and in such a case it is better to give up and resume
execution of other code. ]
This patch adds a new entry to /proc/interrupts, RTR, which
tells user space the number of times we retried the ICR read in
the IPI delivery slow path.
This should give some insight into how well the APIC
message delivery hardware is working - if the counts are way
too large then we are hitting a (very-) slow path way too
often.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: Jörn Engel <joern@logfs.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/n/tip-vzsp20lo2xdzh5f70g0eis2s@git.kernel.org
[ extended the changelog ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This hangs my MacBook Air at boot time; I get no console
messages at all. I reverted this on top of -rc5 and my machine
boots again.
This reverts commit e8c7106280a305e1ff2a3a8a4dfce141469fb039.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a symbol, EFI_LOADER_SIGNATURE instead of using the magic
strings, which also helps to reduce the amount of ifdeffery.
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
commit 37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new
bzImage fields") introduced some new fields into the bzImage header
but struct setup_header was not updated accordingly. Add the missing
'pref_address' and 'init_size' fields.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.
On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:
BUG: unable to handle kernel paging request at f7f22280
IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
[...]
Call Trace:
[<c104f8ca>] ? page_is_ram+0x1a/0x40
[<c1025aff>] reserve_memtype+0xdf/0x2f0
[<c1024dc9>] set_memory_uc+0x49/0xa0
[<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
[<c19216d4>] start_kernel+0x291/0x2f2
[<c19211c7>] ? loglevel+0x1b/0x1b
[<c19210bf>] i386_start_kernel+0xbf/0xc8
A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().
Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,
https://bugzilla.redhat.com/show_bug.cgi?id=748516
Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The node_distance function is not x86 64-bit specific. Having
the #ifdef around the extern function declaration and the
#define causes the default node_distance macro to be used in
asm-generic/topology.h. This also causes a sparse warning in
arch/x86/mm/numa.c when CONFIG_X86_64 is not set:
warning: symbol '__node_distance' was not declared. Should it be
static?
Remove the #ifdef to fix both issues.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1112061220310.28251@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
KVM needs to know perf capability to decide which PMU it can expose to a
guest.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel CPUs report non-available architectural events in cpuid leaf
0AH.EBX. Use it to disable events that are not available according
to CPU.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The x86_64 kernel pushes the fake kernel stack in
arch/x86/kernel/entry_64.S:FAKE_STACK_FRAME, and
rflags register in it does not conform to the specification.
Although Intel's manual[1] says bit 1 of it shall be set to 1,
this bit is cleared to 0 on pushing the fake stack.
[1] Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Vol.1 3-21 Figure 3-8. EFLAGS Register
If it is not on purpose, it is better to be fixed, because
it can lead some tools misunderstanding the stack frame. For example,
"crash" utility[2] actually detects it and warns you like
below:
RIP: ffffffff8005dfa2 RSP: ffff8104ce0c7f58 RFLAGS: 00000200
[...]
bt: WARNING: possibly bogus exception frame
Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Tested-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intr_remapping: Fix section mismatch in ir_dev_scope_init()
intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev()
x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions
x86, AMD: Correct align_va_addr documentation
x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms
x86/mrst: Battery fixes
x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode
x86: Fix "Acer Aspire 1" reboot hang
x86/mtrr: Resolve inconsistency with Intel processor manual
x86: Document rdmsr_safe restrictions
x86, microcode: Fix the failure path of microcode update driver init code
Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup
x86/mpparse: Account for bus types other than ISA and PCI
x86, mrst: Change the pmic_gpio device type to IPC
mrst: Added some platform data for the SFI translations
x86,mrst: Power control commands update
x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
x86, UV: Fix UV2 hub part number
x86: Add user_mode_vm check in stack_overflow_check
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, x86: Avoid unnecessary overflow in sched_clock
sched: Fix buglet in return_cfs_rq_runtime()
sched: Avoid SMT siblings in select_idle_sibling() if possible
sched: Set the command name of the idle tasks in SMP kernels
sched, rt: Provide means of disabling cross-cpu bandwidth sharing
sched: Document wait_for_completion_*() return values
sched_fair: Fix a typo in the comment describing update_sd_lb_stats
sched: Add a comment to effective_load() since it's a pain
In the target code I have a do_div(x, PAGE_SIZE). The x86-64
version of it was doing a shift and a mask which is clever. The
32bit version of it had a div operation in it which made me
think. After digging I noticed that x86 has an optimized version
of it. This patch adds this shift and mask optimization if base
is constant so we don't have any runtime "checking" overhead
since most users use a power of ten.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1322649814-544-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
tsc=reliable boot parameter is supposed to skip all the TSC
stablility checks during boot time.
On a 8-socket system where we want to run an experiment with the
"tsc=reliable" boot option, TSC synchronization checks are not
getting skipped and marking the TSC as not stable.
Check for tsc_clocksource_reliable (which is set via
tsc=reliable or for platforms supporting synthetic TSC_RELIABLE
feature bit etc) and when set, skip the TSC synchronization
tests during boot.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1320446537.15071.14.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GET_THREAD_INFO() involves a memory read immediately followed by
an "sub" on the value read, in turn (in several cases)
immediately followed by a use of the calculated value as the
base address of a memory access. This combination of
instructions has a non-negligible potential for stalls.
In the system call entry point code, however, the (fixed) offset
of the stack pointer from the end of the stack is generally
known, and hence we can instead avoid the memory load and
subtract, and instead do the memory reference using %rsp as the
base register. To do so in a legible fashion, introduce a
THREAD_INFO() macro which, provided a register (generally %rsp)
and the known offset from the end of the stack, produces a
suitable memory access operand.
The patch attempts to only touch the fast paths (no auditing and
alike), but manages to do so only in the 64-bit entry point
case; the compatibility mode entry points have so many
interdependencies between their various branch targets that it
was necessary to also adjust the slow paths to eliminate the
risk of having missed some register dependency during code
analysis.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adds support for Numascale NumaChip large-SMP systems. It is
needed to enable the booting of more than ~168 cores.
v2:
- [Steffen] enumerate only accessible northbridges
- [Daniel] rediffed and validated against 3.1-rc10
v3:
- [Daniel] use x86_init core numbering override
- [Daniel] cleanups as per feedback
v4:
- [Daniel] use updated x86_cpuinit override
v5:
- drop disabling interrupts locally, as ISR write is atomic; drop delay
- added read-mostly annotations where appropriate
- require CONFIG_SMP, so drop conditional path
Workload tested on 96 cores/16 sockets.
Signed-off-by: Steffen Persvold <sp@numascale.com>
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Link: http://lkml.kernel.org/r/1323101246-2400-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add an x86_init vector for handling inconsistent core numbering.
This is useful for multi-fabric platforms, such as Numascale
NumaConnect.
v2:
- use struct x86_cpuinit_ops
- provide default fall-back function to warn
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow flat_init_apic_ldr() to be used outside the compilation
unit for similar APIC implementations.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Link: http://lkml.kernel.org/r/1323073238-32686-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel MID x86 platforms have a memory mapped virtual RTC
instead. No MID platform have the default ports (and
accessing them may do weird stuff).
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: feng.tang@intel.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Recently, I got bitten by using rdmsr_safe too early in the boot
process. Document its shortcomings for future reference.
Link: http://lkml.kernel.org/r/4ED5B70F.606@lwfinger.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
On the Intel MID devices SCU commands are issued to manage power
off and the like. We need to issue different ones for
non-Lincroft based devices.
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To make this work, we teach the page fault handler how to send
signals on failed uaccess. This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.
This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly. UML relies on this behavior to
"fault in" pages as needed.
We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.
This fixes issues with UML when vsyscall=emulate.
Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The previous patch modified the stop cpus path to use NMI
instead of IRQ as the way to communicate to the other cpus to
shutdown. There were some concerns that various machines may
have problems with using an NMI IPI.
This patch creates a selftest to check if NMI is working at
boot. The idea is to help catch any issues before the machine
panics and we learn the hard way.
Loosely based on the locking-selftest.c file, this separate file
runs a couple of simple tests and reports the results. The
output looks like:
...
Brought up 4 CPUs
----------------
| NMI testsuite:
--------------------
remote IPI: ok |
local IPI: ok |
--------------------
Good, all 2 testcases passed! |
---------------------------------
Total of 4 processors activated (21330.61 BogoMIPS).
...
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: seiji.aguchi@hds.com
Cc: vgoyal@redhat.com
Cc: mjg@redhat.com
Cc: tony.luck@intel.com
Cc: gong.chen@intel.com
Cc: satoru.moriya@hds.com
Cc: avi@redhat.com
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/1318533267-18880-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There was a mixup when the SGI UV2 hub chip was sent to be
fabricated, and it ended up with the wrong part number in the
HRP_NODE_ID mmr. Future versions of the chip will (may) have the
correct part number. Change the UV infrastructure to recognize
both part numbers as valid IDs of a UV2 hub chip.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>