IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.
This patch finishes step 2.
In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.
And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.
After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.
And finally we will use this array to make cpuid <-> nodeid persistent.
cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.
In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch turns e820 and e820_saved into pointers to e820 tables,
of the same size as before.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160917213927.1787-2-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lately tsc page was implemented but filled with empty
values. This patch setup tsc page scale and offset based
on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value.
The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr
reads count to zero which potentially improves performance.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
[Computation of TSC page parameters rewritten to use the Linux timekeeper
parameters. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a function that reads the exact nanoseconds value that is
provided to the guest in kvmclock. This crystallizes the notion of
kvmclock as a thin veneer over a stable TSC, that the guest will
(hopefully) convert with NTP. In other words, kvmclock is *not* a
paravirtualized host-to-guest NTP.
Drop the get_kernel_ns() function, that was used both to get the base
value of the master clock and to get the current value of kvmclock.
The former use is replaced by ktime_get_boot_ns(), the latter is
the purpose of get_kernel_ns().
This also allows KVM to provide a Hyper-V time reference counter that
is synchronized with the time that is computed from the TSC page.
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All previous users of dump_trace() have been converted to use the new
unwind interfaces, so we can remove it and the related
print_context_stack() and print_context_stack_bp() callback functions.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5b97da3572b40b5a4d8e185cf2429308d0987a13.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert show_trace_log_lvl() to use the new unwinder. dump_trace() has
been deprecated.
show_trace_log_lvl() is special compared to other users of the unwinder.
It's the only place where both reliable *and* unreliable addresses are
needed. With frame pointers enabled, most callers of the unwinder don't
want to know about unreliable addresses. But in this case, when we're
dumping the stack to the console because something presumably went
wrong, the unreliable addresses are useful:
- They show stale data on the stack which can provide useful clues.
- If something goes wrong with the unwinder, or if frame pointers are
corrupt or missing, all the stack addresses still get shown.
So in order to show all addresses on the stack, and at the same time
figure out which addresses are reliable, we have to do the scanning and
the unwinding in parallel.
The scanning is done with the help of get_stack_info() to traverse the
stacks. The unwinding is done separately by the new unwinder.
In theory we could simplify show_trace_log_lvl() by instead pushing some
of this logic into the unwind code. But then we would need some kind of
"fake" frame logic in the unwinder which would add a lot of complexity
and wouldn't be worth it in order to support only one user.
Another benefit of this approach is that once we have a DWARF unwinder,
we should be able to just plug it in with minimal impact to this code.
Another change here is that callers of show_trace_log_lvl() don't need
to provide the 'bp' argument. The unwinder already finds the relevant
frame pointer by unwinding until it reaches the first frame after the
provided stack pointer.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/703b5998604c712a1f801874b43f35d6dac52ede.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The x86 stack dump code is a bit of a mess. dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.
Also there are some upcoming features which will need more changes to
the stack dump code, including the printing of stack pt_regs, reliable
stack detection for live patching, and a DWARF unwinder. Each of those
features would at least need more callbacks and/or callback interfaces,
resulting in a much bigger mess than what we have today.
Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.
The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:
https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com
It's also similar to the libunwind API:
http://www.nongnu.org/libunwind/man/libunwind(3).html
Some if its advantages:
- Simplicity: no more callback sprawl and less code duplication.
- Flexibility: it allows the caller to stop and inspect the stack state
at each step in the unwinding process.
- Modularity: the unwinder code, console stack dump code, and stack
metadata analysis code are all better separated so that changing one
of them shouldn't have much of an impact on any of the others.
Two implementations are added which conform to the new unwind interface:
- The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.
- The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n. This
isn't an "unwinder" per se. All it does is scan the stack for kernel
text addresses. But with no frame pointers, guesses are better than
nothing in most cases.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6dc2f909c47533d213d0505f0a113e64585bec82.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the addition of uses of GCC's condition code outputs in commit:
35ccfb7114 ("x86, asm: Use CC_SET()/CC_OUT() in <asm/rwsem.h>")
... there's now an overlap of outputs and clobbers in __down_write_trylock().
Such overlaps are generally getting tagged with an error (occasionally
even with an ICE). I can't really tell why plain GCC 6.2 doesn't detect
this (judging by the code it is meant to), while the slightly modified
one I use does. Since condition code clobbers are never necessary on x86
(other than perhaps for documentation purposes, which doesn't really
get done consistently), remove it altogether rather than inventing
something like CC_CLOBBER (to accompany CC_SET/CC_OUT).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57E003CC0200007800110102@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.
The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.
There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.
[ tglx: Massaged changelog ]
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc6+ #5 Not tainted
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.
stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
Call Trace:
<IRQ> [<ffffffff94446949>] dump_stack+0x99/0xd0
[<ffffffff940e9247>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff9448e0d5>] do_trace_write_msr+0x135/0x140
[<ffffffff9406e750>] native_write_msr+0x20/0x30
[<ffffffff9406503d>] native_apic_msr_eoi_write+0x1d/0x30
[<ffffffff9405b17e>] smp_trace_call_function_interrupt+0x1e/0x270
[<ffffffff948cb1d6>] trace_call_function_interrupt+0x96/0xa0
<EOI> [<ffffffff947200f4>] ? cpuidle_enter_state+0xe4/0x360
[<ffffffff947200df>] ? cpuidle_enter_state+0xcf/0x360
[<ffffffff947203a7>] cpuidle_enter+0x17/0x20
[<ffffffff940df008>] cpu_startup_entry+0x338/0x4d0
[<ffffffff9405bfc4>] start_secondary+0x154/0x180
This can be reproduced readily by running ftrace test case of kselftest.
Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly. The same applies to
exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().
[ tglx: Massaged changelog ]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The TSC offset can now be read directly from struct kvm_arch_vcpu.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A future commit will want to easily read a vCPU's TSC offset,
so we store it in struct kvm_arch_vcpu_arch for easy access.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show_stack_log_lvl() and friends allow a NULL pointer for the
task_struct to indicate the current task. This creates confusion and
can cause sneaky bugs.
Instead require the caller to pass 'current' directly.
This only changes the internal workings of the dumpstack code. The
dump_trace() and show_stack() interfaces still allow a NULL task
pointer. Those interfaces should also probably be fixed as well.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
get_user_ex(x, ptr) should zero x on failure. It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that most of the thread_info users have been cleaned up,
this is straightforward.
Most of this code was written by Linus.
Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a50eab40abeaec9cb9a9e3cbdeafd32190206654.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because sched.h and thread_info.h are a tangled mess, I turned
in_compat_syscall() into a macro. If we had current_thread_struct()
or similar and we could use it from thread_info.h, then this would
be a bit cleaner.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ccc8a1b2f41f9c264a41f771bb4a6539a642ad72.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
valid_stack_ptr() is buggy: it assumes that all stacks are of size
THREAD_SIZE, which is not true for exception stacks. So the
walk_stack() callbacks will need to know the location of the beginning
of the stack as well as the end.
Another issue is that in general the various features of a stack (type,
size, next stack pointer, description string) are scattered around in
various places throughout the stack dump code.
Encapsulate all that information in a single place with a new stack_info
struct and a get_stack_info() interface.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8164dd0db96b7e6a279fa17ae5e6dc375eecb4a9.1473905218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce new flags that defines which ABI to use on creating sigframe.
Those flags kernel will set according to sigaction syscall ABI,
which set handler for the signal being delivered.
So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver.
Those flags will be used only under CONFIG_COMPAT.
Similar way ARM uses sa_flags to differ in which mode deliver signal
for 26-bit applications (look at SA_THIRYTWO).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-7-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The MCA_IPID register uniquely identifies a bank's type and instance
on Scalable MCA systems. We should save the value of this register
in struct mce along with the other relevant error information. This
ensures that we can decode errors without relying on system software to
correlate the bank to the type.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.
Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.
[ bp: Align comments, simplify a bit. ]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Syndrome information is no longer contained in MCA_STATUS for SMCA
systems but in a new register - MCA_SYND.
Add a synd field to struct mce to hold MCA_SYND register value. Add it
to the end of struct mce to maintain compatibility with old versions of
mcelog. Also, add it to the respective tracepoint.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Paul Mackerras writes:
The highlights are:
* Reduced latency for interrupts from PCI pass-through devices, from
Suresh Warrier and me.
* Halt-polling implementation from Suraj Jitindar Singh.
* 64-bit VCPU statistics, also from Suraj.
* Various other minor fixes and improvements.
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Commit 2c23b73c2d02 ("x86/efi: Prepare GOP handling code for reuse
as generic code") introduced an efi_is_64bit() macro to x86 which
previously only existed for arm arches. The macro is used to
choose between the 64 bit or 32 bit code path in gop.c at runtime.
However the code path that's going to be taken is known at compile
time when compiling for x86_32 or for x86_64 with mixed mode disabled.
Amend the macro to eliminate the unused code path in those cases.
Size of gop.o text section:
CONFIG_X86_32: 1758 before, 1299 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 2201 before, 1406 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 2201 before and after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Every EFI architecture apart from ia64 needs to setup the EFI memory
map at efi.memmap, and the code for doing that is essentially the same
across all implementations. Therefore, it makes sense to factor this
out into the common code under drivers/firmware/efi/.
The only slight variation is the data structure out of which we pull
the initial memory map information, such as physical address, memory
descriptor size and version, etc. We can address this by passing a
generic data structure (struct efi_memory_map_data) as the argument to
efi_memmap_init_early() which contains the minimum info required for
initialising the memory map.
In the process, this patch also fixes a few undesirable implementation
differences:
- ARM and arm64 were failing to clear the EFI_MEMMAP bit when
unmapping the early EFI memory map. EFI_MEMMAP indicates whether
the EFI memory map is mapped (not the regions contained within) and
can be traversed. It's more correct to set the bit as soon as we
memremap() the passed in EFI memmap.
- Rename efi_unmmap_memmap() to efi_memmap_unmap() to adhere to the
regular naming scheme.
This patch also uses a read-write mapping for the memory map instead
of the read-only mapping currently used on ARM and arm64. x86 needs
the ability to update the memory map in-place when assigning virtual
addresses to regions (efi_map_region()) and tagging regions when
reserving boot services (efi_reserve_boot_services()).
There's no way for the generic fake_mem code to know which mapping to
use without introducing some arch-specific constant/hook, so just use
read-write since read-only is of dubious value for the EFI memory map.
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
PKRU is the register that lets you disallow writes or all access to a given
protection key.
The XSAVE hardware defines an "init state" of 0 for PKRU: its most
permissive state, allowing access/writes to everything. Since we start off
all new processes with the init state, we start all processes off with the
most permissive possible PKRU.
This is unfortunate. If a thread is clone()'d [1] before a program has
time to set PKRU to a restrictive value, that thread will be able to write
to all data, no matter what pkey is set on it. This weakens any integrity
guarantees that we want pkeys to provide.
To fix this, we define a very restrictive PKRU to override the
XSAVE-provided value when we create a new FPU context. We choose a value
that only allows access to pkey 0, which is as restrictive as we can
practically make it.
This does not cause any practical problems with applications using
protection keys because we require them to specify initial permissions for
each key when it is allocated, which override the restrictive default.
In the end, this ensures that threads which do not know how to manage their
own pkey rights can not do damage to data which is pkey-protected.
I would have thought this was a pretty contrived scenario, except that I
heard a bug report from an MPX user who was creating threads in some very
early code before main(). It may be crazy, but folks evidently _do_ it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds two new system calls:
int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
int pkey_free(int pkey);
These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allocator
that the kernel has for file descriptors. The kernel tracks
which numbers are in use, and only allows operations on keys that
are valid. A key which was not obtained by pkey_alloc() may not,
for instance, be passed to pkey_mprotect().
These system calls are also very important given the kernel's use
of pkeys to implement execute-only support. These help ensure
that userspace can never assume that it has control of a key
unless it first asks the kernel. The kernel does not promise to
preserve PKRU (right register) contents except for allocated
pkeys.
The 'init_access_rights' argument to pkey_alloc() specifies the
rights that will be established for the returned pkey. For
instance:
pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
will allocate 'pkey', but also sets the bits in PKRU[1] such that
writing to 'pkey' is already denied.
The kernel does not prevent pkey_free() from successfully freeing
in-use pkeys (those still assigned to a memory range by
pkey_mprotect()). It would be expensive to implement the checks
for this, so we instead say, "Just don't do it" since sane
software will never do it anyway.
Any piece of userspace calling pkey_alloc() needs to be prepared
for it to fail. Why? pkey_alloc() returns the same error code
(ENOSPC) when there are no pkeys and when pkeys are unsupported.
They can be unsupported for a whole host of reasons, so apps must
be prepared for this. Also, libraries or LD_PRELOADs might steal
keys before an application gets access to them.
This allocation mechanism could be implemented in userspace.
Even if we did it in userspace, we would still need additional
user/kernel interfaces to tell userspace which keys are being
used by the kernel internally (such as for execute-only
mappings). Having the kernel provide this facility completely
removes the need for these additional interfaces, or having an
implementation of this in userspace at all.
Note that we have to make changes to all of the architectures
that do not use mman-common.h because we use the new
PKEY_DENY_ACCESS/WRITE macros in arch-independent code.
1. PKRU is the Protection Key Rights User register. It is a
usermode-accessible register that controls whether writes
and/or access to each individual pkey is allowed or denied.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE.
Three of those bits: READ/WRITE/EXEC get translated directly in to
vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in
mprotect()'s 'prot' argument then it must be cleared in vma->vm_flags
during the mprotect() call.
We do this clearing today by first calculating the VMA flags we
want set, then clearing the ones we do not want to inherit from
the original VMA:
vm_flags = calc_vm_prot_bits(prot, key);
...
newflags = vm_flags;
newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));
However, we *also* want to mask off the original VMA's vm_flags in
which we store the protection key.
To do that, this patch adds a new macro:
ARCH_VM_PKEY_FLAGS
which allows the architecture to specify additional bits that it would
like cleared. We use that to ensure that the VM_PKEY_BIT* bits get
cleared.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163013.E48D6981@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pkey_mprotect() is just like mprotect, except it also takes a
protection key as an argument. On systems that do not support
protection keys, it still works, but requires that key=0.
Otherwise it does exactly what mprotect does.
I expect it to get used like this, if you want to guarantee that
any mapping you create can *never* be accessed without the right
protection keys set up.
int real_prot = PROT_READ|PROT_WRITE;
pkey = pkey_alloc(0, PKEY_DENY_ACCESS);
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
This way, there is *no* window where the mapping is accessible
since it was always either PROT_NONE or had a protection key set
that denied all access.
We settled on 'unsigned long' for the type of the key here. We
only need 4 bits on x86 today, but I figured that other
architectures might need some more space.
Semantically, we have a bit of a problem if we combine this
syscall with our previously-introduced execute-only support:
What do we do when we mix execute-only pkey use with
pkey_mprotect() use? For instance:
pkey_mprotect(ptr, PAGE_SIZE, PROT_WRITE, 6); // set pkey=6
mprotect(ptr, PAGE_SIZE, PROT_EXEC); // set pkey=X_ONLY_PKEY?
mprotect(ptr, PAGE_SIZE, PROT_WRITE); // is pkey=6 again?
To solve that, we make the plain-mprotect()-initiated execute-only
support only apply to VMAs that have the default protection key (0)
set on them.
Proposed semantics:
1. protection key 0 is special and represents the default,
"unassigned" protection key. It is always allocated.
2. mprotect() never affects a mapping's pkey_mprotect()-assigned
protection key. A protection key of 0 (even if set explicitly)
represents an unassigned protection key.
2a. mprotect(PROT_EXEC) on a mapping with an assigned protection
key may or may not result in a mapping with execute-only
properties. pkey_mprotect() plus pkey_set() on all threads
should be used to _guarantee_ execute-only semantics if this
is not a strong enough semantic.
3. mprotect(PROT_EXEC) may result in an "execute-only" mapping. The
kernel will internally attempt to allocate and dedicate a
protection key for the purpose of execute-only mappings. This
may not be possible in cases where there are no free protection
keys available. It can also happen, of course, in situations
where there is no hardware support for protection keys.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163012.3DDD36C4@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch introduces avic_ga_log_notifier, which will be called
by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduces per-VM AVIC ID and helper functions to manage the IDs.
Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag.
The ID is 24-bit one-based indexing value, and is managed via helper
functions to get the next ID, or to free an ID once a VM is destroyed.
There should be no ID conflict for any active VMs.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments. Create helper functions to
do that instead.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we get a page fault indicating kernel stack overflow, invoke
handle_stack_overflow(). To prevent us from overflowing the stack
again while handling the overflow (because we are likely to have
very little stack space left), call handle_stack_overflow() on the
double-fault stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6d6cf96b3fb9b4c9aa303817e1dc4de0c7c36487.1472603235.git.luto@kernel.org
[ Minor edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merrifield2 is actually Moorefield.
Rename it accordingly and drop tail digit from Merrifield1.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906184254.94440-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tell SCU that we are about powering off the device.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160907123955.21228-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.
Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
Without this, the checks for things like __builtin_const_p() won't work
consistently in either hardened usercopy nor the recent adjustments for
detecting usercopy overflows at compile time.
The change in kernel text size is detectable, but very small:
text data bss dec hex filename
12118735 5768608 14229504 32116847 1ea106f vmlinux.before
12120207 5768608 14229504 32118319 1ea162f vmlinux.after
Signed-off-by: Kees Cook <keescook@chromium.org>
We now have a better way to determine if we are running on a cpu that
supports machine check recovery. Free up this feature bit.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d5db39e08d46cf1012d94d3902275d08ba931926.1472754712.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the mcsafe_key defined in the previous patch to make decisions on which
copy function to use. We can't use the FEATURE bit any more because PCI
quirks run too late to affect the patching of code. So we use a static key.
Turn memcpy_mcsafe() into an inline function to make life easier for
callers. The assembly code that actually does the copy is now named
memcpy_mcsafe_unrolled()
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/bfde2fc774e94f53d91b70a4321c85a0d33e7118.1472754712.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Each Xeon includes a number of capability registers in PCI space that
describe some features not enumerated by CPUID.
Use these to determine that we are running on a model that can recover from
machine checks. Hooks for Ivybridge ... Skylake provided.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/abf331dc4a3e2a2d17444129bc51127437bcf4ba.1472754711.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:
1) "copy_from_user() buffer size is too small" compile warning/error
This is a static warning which happens when object size and copy size
are both const, and copy size > object size. I didn't see any false
positives for this one. So the function warning attribute seems to
be working fine here.
Note this scenario is always a bug and so I think it should be
changed to *always* be an error, regardless of
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
2) "copy_from_user() buffer size is not provably correct" compile warning
This is another static warning which happens when I enable
__compiletime_object_size() for new compilers (and
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
is const, but copy size is *not*. In this case there's no way to
compare the two at build time, so it gives the warning. (Note the
warning is a byproduct of the fact that gcc has no way of knowing
whether the overflow function will be called, so the call isn't dead
code and the warning attribute is activated.)
So this warning seems to only indicate "this is an unusual pattern,
maybe you should check it out" rather than "this is a bug".
I get 102(!) of these warnings with allyesconfig and the
__compiletime_object_size() gcc check removed. I don't know if there
are any real bugs hiding in there, but from looking at a small
sample, I didn't see any. According to Kees, it does sometimes find
real bugs. But the false positive rate seems high.
3) "Buffer overflow detected" runtime warning
This is a runtime warning where object size is const, and copy size >
object size.
All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:
2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size(). But in fact,
__compiletime_object_size() seems to be working fine. The false
positives were instead triggered by #2 above. (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)
So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.
Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.
Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thread_saved_pc() was using a completely bogus method to get the return
address. Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored. Now with the
frame of a sleeping thread well defined, this can be implemented correctly.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of 'struct fork_frame'.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>