IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When show_trace_log_lvl() is called from show_regs(), it completely
fails to dump the stack. This bug was introduced when
show_stack_log_lvl() was removed with the following commit:
0ee1dd9f5e7e ("x86/dumpstack: Remove raw stack dump")
Previous callers of that function now call show_trace_log_lvl()
directly. That resulted in a subtle change, in that the 'stack'
argument can now be NULL in certain cases.
A NULL 'stack' pointer means that the stack dump should start from the
topmost stack frame unless 'regs' is valid, in which case it should
start from 'regs->sp'.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0ee1dd9f5e7e ("x86/dumpstack: Remove raw stack dump")
Link: http://lkml.kernel.org/r/c551842302a9c222d96a14e42e4003f059509f69.1479362652.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We already have the same functionality in usercopy_32.c. Share it with
64-bit and get rid of some more asm glue which is not needed anymore.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161031151015.22087-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent changes, which forced the registration of the boot cpu on UP
systems, which do not have ACPI tables, have been fixed for systems w/o
local APIC, but left a wreckage for systems which have neither ACPI nor
mptables, but the CPU has an APIC, e.g. virtualbox.
The boot process crashes in prefill_possible_map() as it wants to register
the boot cpu, which needs to access the local apic, but the local APIC is
not yet mapped.
There is no reason why init_apic_mapping() can't be invoked before
prefill_possible_map(). So instead of playing another silly early mapping
game, as the ACPI/mptables code does, we just move init_apic_mapping()
before the call to prefill_possible_map().
In hindsight, I should have noticed that combination earlier.
Sorry for the churn (also in stable)!
Fixes: ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com>
Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at>
Cc: prarit@redhat.com
Cc: ville.syrjala@linux.intel.com
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: frank.mehnert@oracle.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Specifics:
- Fix three ACPICA issues related to the interpreter locking and
introduced by recent changes in that area (Lv Zheng).
- Fix a PCI IRQ management regression introduced during the 4.7
cycle and related to the configuration of shared IRQs on systems
with an ISA bus (Sinan Kaya).
- Fix up a return value of one function in the APEI code (Punit
Agrawal).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJYE+wwAAoJEILEb/54YlRxe0wQAKIyO3ktsxxbz2iACxFPZGmn
ML1+OTBIGQKYDSGCINhMV5PGd98IMBaVCB9RllG/B9iALb8VCGiJ6AuJKoR7q2pZ
6mr7ioXNfTNlLISykt63cD/Lp/YobZMG6WhoNWzoKslVUQrSWAISV+wGpBxoj08i
8X7t/QtvRIVWfy4H4reDgpQMIKUeDhk6REeb8FESiXYboOvNhbXZpPS+bv8XXEfD
bu/ASQIZs3Z9YB2uTij16Tx95eJETHhr9zYIxbi848YDxjelpZNs1QuVoYxq3GO2
S7vbGHZITMLSEz4jD0w98YvDcb0jywfSXX53NBMaSJqOAleVKNH1rE8KvywG6H0s
2298yBc0o0ldBb+4nszoL7NyGQKDrmxMEGBRFlk67gZ1LA4cpk+Usv9Q0GmNx38H
KQfuA144n9ICaM9Kw9CKD8xrQ+PtpoTIBXzKGsdqIwHsS+2XSwzgp4IWEuMRfZNu
5ermtO2tRz47UDnQ5UxdweB0n2pEMrZXDDzBONdqp3ds4aOvOvVqQP3OB+iMaMrT
rPvVlYLr2Q+ekIzHPCpB7uBwI//bJ3L2cLqHxp9hlbNeFQWdv/NlMlmbWgYVVpn3
tCvqqpH+jtof8lnmWZtBdc4M0GhwM+CP6TYd65n2yDqVCfPraXokE0UdNBW+je4Z
BRuhjVEi0vBsrc3M4IC9
=wjnL
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix recent ACPICA regressions, an older PCI IRQ management
regression, and an incorrect return value of a function in the APEI
code.
Specifics:
- Fix three ACPICA issues related to the interpreter locking and
introduced by recent changes in that area (Lv Zheng).
- Fix a PCI IRQ management regression introduced during the 4.7 cycle
and related to the configuration of shared IRQs on systems with an
ISA bus (Sinan Kaya).
- Fix up a return value of one function in the APEI code (Punit
Agrawal)"
* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
ACPICA: Dispatcher: Fix order issue of method termination
ACPI / APEI: Fix incorrect return value of ghes_proc()
ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
ACPI/PCI: pci_link: penalize SCI correctly
ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
Pull perf fixes from Ingo Molnar:
"Misc kernel fixes: a virtualization environment related fix, an uncore
PMU driver removal handling fix, a PowerPC fix and new events for
Knights Landing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf/powerpc: Don't call perf_event_disable() from atomic context
perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
perf/x86/intel/cstate: Add C-state residency events for Knights Landing
Pull x86 fixes from Ingo Molnar:
"Misc fixes: three build fixes, an unwinder fix and a microcode loader
fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
x86: Fix export for mcount and __fentry__
x86/quirks: Hide maybe-uninitialized warning
x86/build: Fix build with older GCC versions
x86/unwind: Fix empty stack dereference in guess unwinder
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYEFHpAAoJEAx081l5xIa+tyYP/0xq+ZqHwS90k1mge/2uWYB3
sVQvFFIV55r6siOjIdDek+dsHq7IGFOChbsxegGyGvfwYVjzSmdoBwO1aMTV+Ii9
OoqLS/53kts9jHOVm1UNsbxW1lzJVWoFWpMY57KDodWsWxVbd0NuP9mfTRIH2Sfj
MmymKigXgwHSndn07+2xp9jI9Y5krtOLl+4YDsly7JF2IR7UBRRoW8n/WHR75lny
MNn2Vtn9NBwxDieFQc/KQGUQ1nC8wB0c3wtGDDQIux0gp6IVW+pQoCLo6CMtgHXB
IXGDojVA9KpcyEUz5RkBsVHYmvZR1PoS+nrnEE6b/C8p7UDuyCrk1Zfy0ZTGV/hq
LKmfRKB3NWbgKnBbqOdFYhsh/iyVjqoNdGYqfR4qJx5JGIltVWbjYwlwUpImlrIY
gKqtAdVFaFuoJs8MpFharxFlBf/o9DPDTPTWPQxGI16y7poH+86v7QmAJT9dJHRE
pf3oyYI3eHTeIQb42f7PHSp4hsVJMX5Awkm9a8b9PhNlu/3cHUOYkCT060ripMBc
ZksIUqKFzuk+TDRTnQrCQjaC4vJ6s8XUwntFhfHCZUmnnH8YhYpryDwdyzavcUvX
or8rkKsO/+Jxa1kRr8d2c1JYi2FIMHBP0srAu43WeYyAsSPFIL/9l5flIeHi2Ow3
tSHbCo4W5YRbQaVcBzxG
=prah
-----END PGP SIGNATURE-----
Merge tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux
Pull drm x86/pat regression fixes from Dave Airlie:
"This is a standalone pull request for the fix for a regression
introduced in -rc1 by a change to vm_insert_mixed to start using the
PAT range tracking to validate page protections. With this fix in
place, all the VRAM mappings for GPU drivers ended up at UC instead of
WC.
There are probably better ways to fix this long term, but nothing I'd
considered for -fixes that wouldn't need more settling in time. So
I've just created a new arch API that the drivers can reserve all
their VRAM aperture ranges as WC"
* tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux:
drm/drivers: add support for using the arch wc mapping API.
x86/io: add interface to reserve io memtype for a resource range. (v1.1)
perf doesn't seem to honour the number of fixed counters specified by CPUID
leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters.
So if some of the fixed counters are masked out by the hypervisor, it still
tries to check/set them.
This patch makes perf behave nicer when the kernel is running under a
hypervisor that doesn't expose all the counters.
This patch contains some ideas from Matt Wilson.
Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Kozyrev <alexander.kozyrev@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wilson <msw@amazon.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.
However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.
This makes this check fire:
VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
^^^^^^
due to the randomization offset.
The fix is as simple as using __pa_nodebug() because we do that
randomization offset accounting later in that function ourselves.
Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable@vger.kernel.org # 4.9
Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a sanity check to ensure the stack only grows down, and print a
warning if the check fails.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161027131058.tpdffwlqipv7pcd6@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Those pointers were initialized before call to _install_special_mapping()
after the commit:
f7b6eb3fa072 ("x86: Set context.vdso before installing the mapping")
This is not required anymore as special mappings have their vma name and
don't use arch_vma_name() after commit:
a62c34bd2a8a ("x86, mm: Improve _install_special_mapping and fix x86 vdso naming")
So, this way to init looks less entangled.
I even belive that we can remove NULL initializers:
- on failure load_elf_binary() will not start a new thread;
- arch_prctl will have the same pointers as before syscall.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20161027141516.28447-3-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As userspace knows nothing about kernel config, thus #ifdefs
around ABI prctl constants makes them invisible to userspace.
Let it be clean'n'simple: remove #ifdefs.
If kernel has CONFIG_CHECKPOINT_RESTORE disabled, sys_prctl()
will return -EINVAL for those prctls.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: oleg@redhat.com
Fixes: 2eefd8789698 ("x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*")
Link: http://lkml.kernel.org/r/20161027141516.28447-2-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The use of config_enabled() is ambiguous. For config options,
IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.
I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.
Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:
- arch/x86/mm/kaslr.c
replace config_enabled() with IS_ENABLED() because
CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
- include/asm-generic/export.h
replace config_enabled() with __is_defined().
Then, config_enabled() can be removed now.
Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.
Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If __kernel_text_address() doesn't recognize a return address on the
stack, it probably means that it's some generated code which
__kernel_text_address() doesn't know about yet.
Otherwise there's probably some stack corruption.
Either way, warn about it.
Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2d897898f324e275943b590d160b55e482bba65f.1477496147.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Print a warning if stack recursion is detected.
Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/def18247aafaab480844484398e793f552b79bda.1477496147.git.jpoimboe@redhat.com
[ Unbroke the lines. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Detect situations in the unwinder where the frame pointer refers to a
bad address, and print an appropriate warning.
Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/03c888f6f7414d54fa56b393ea25482be6899b5f.1477496147.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 784d5699eddc5 ("x86: move exports to actual definitions") removed the
EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
is that function_hook isn't a function at all, but a macro that is defined
as either mcount or __fentry__ depending on the support from gcc.
Originally, I thought this was a macro issue, like what __stringify()
is used for. But the problem is a bit deeper. The Makefile.build has
some magic that does post processing of files to create the CRC
bindings. It does some searches for EXPORT_SYMBOL() and because it
finds a macro name and not the actual functions, this causes
function_hook not to be converted into mcount or __fentry__ and they
are missed.
Instead of adding more magic to Makefile.build, just add
EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
Since this is assembly and not C, it doesn't require being set after
the function is defined.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Gabriel C <nix.or.die@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If the instruction sanity test fails, it prints a "Failure" message to
stdout. Make this program behave like the rest of the build and print
that message to stderr.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1477428965-20548-3-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the instruction decoder test ran successful it prints a message like
this to stderr:
Succeed: decoded and checked 1767380 instructions
But, as described in "console mode programming user interface guidelines
version 101" which doesn't exist, programs should use stderr for errors
or warnings. We're told about a successful run here, so the instruction
decoder test should use stdout.
Let's fix the typo too, while we're at it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1477428965-20548-2-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A recent change to the mm code in:
87744ab3832b mm: fix cache mode tracking in vm_insert_mixed()
started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.
I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.
The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.
v1.1: use CONFIG_X86_PAT + add some comments in io.h
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Cc: mcgrof@suse.com
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For mostly historical reasons, the x86 oops dump shows the raw stack
values:
...
[registers]
Stack:
ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
0000000000000002 0000000000000000 0000000000000000 0000000000000000
Call Trace:
...
This seems to be an artifact from long ago, and probably isn't needed
anymore. It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.
Linus says:
"The stack dump actually goes back to forever, and it used to be
useful back in 1992 or so. But it used to be useful mainly because
stacks were simpler and we didn't have very good call traces anyway. I
definitely remember having used them - I just do not remember having
used them in the last ten+ years.
Of course, it's still true that if you can trigger an oops, you've
likely already lost the security game, but since the stack dump is so
useless, let's aim to just remove it and make games like the above
harder."
This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.
It can be a security issue because it leaks kernel addresses. It also
affects the usefulness of the stack dump. Linus says:
"I actually spend time cleaning up commit messages in logs, because
useless data that isn't actually information (random hex numbers) is
actively detrimental.
It makes commit logs less legible.
It also makes it harder to parse dumps.
It's not useful. That makes it actively bad.
I probably look at more oops reports than most people. I have not
found the hex numbers useful for the last five years, because they are
just randomized crap.
The stack content thing just makes code scroll off the screen etc, for
example."
The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names. However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.
There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:
$ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
write_sysrq_trigger+0x51/0x60:
write_sysrq_trigger at drivers/tty/sysrq.c:1098
Or gdb can be used:
$ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
(gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).
(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds. faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)
Here's an example of what a stack dump looks like after this change:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: sysrq_handle_crash+0x45/0x80
PGD 36bfa067 [ 29.650644] PUD 7aca3067
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
task: ffff880078582a40 task.stack: ffffc90000ba8000
RIP: 0010:sysrq_handle_crash+0x45/0x80
RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296
RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292
RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000
FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0
Stack:
ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002
0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20
ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40
Call Trace:
__handle_sysrq+0x138/0x220
? __handle_sysrq+0x5/0x220
write_sysrq_trigger+0x51/0x60
proc_reg_write+0x42/0x70
__vfs_write+0x37/0x140
? preempt_count_sub+0xa1/0x100
? __sb_start_write+0xf5/0x210
? vfs_write+0x183/0x1a0
vfs_write+0xb8/0x1a0
SyS_write+0x58/0xc0
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x7ffb42f55940
RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940
RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001
RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700
R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10
R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010
Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8
CR2: 0000000000000000
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
uses uninitialized data when CONFIG_PCI is not set:
arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]
However, the function is also not called in this configuration, so we
can avoid the warning by moving the existing #ifdef to cover it as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
These macros were added in the following commit:
86a1c34a929f ("x86_64 syscall audit fast-path")
They were used in two-phase sycalls entry tracing, but this functionality
was then moved to the arch/x86/entry/common.c:syscall_trace_enter() function,
in the following commit:
1f484aa69046 ("x86/entry: Move C entry and exit code to arch/x86/entry/common.c")
syscall_trace_enter() now uses the defines from <linux/audit.h>,
so these defines entry_64.S are no longer used anywhere.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161023135646.4453-1-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vince Waver reported the following bug:
WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
Call Trace:
<NMI> ? dump_stack+0x46/0x59
? __warn+0xd5/0xee
? vmalloc_fault+0x58/0x1f0
? __do_page_fault+0x6d/0x48e
? perf_log_throttle+0xa4/0xf4
? trace_page_fault+0x22/0x30
? __unwind_start+0x28/0x42
? perf_callchain_kernel+0x75/0xac
? get_perf_callchain+0x13a/0x1f0
? perf_callchain+0x6a/0x6c
? perf_prepare_sample+0x71/0x2eb
? perf_event_output_forward+0x1a/0x54
? __default_send_IPI_shortcut+0x10/0x2d
? __perf_event_overflow+0xfb/0x167
? x86_pmu_handle_irq+0x113/0x150
? native_read_msr+0x6/0x34
? perf_event_nmi_handler+0x22/0x39
? perf_ibs_nmi_handler+0x4a/0x51
? perf_event_nmi_handler+0x22/0x39
? nmi_handle+0x4d/0xf0
? perf_ibs_handle_irq+0x3d1/0x3d1
? default_do_nmi+0x3c/0xd5
? do_nmi+0x92/0x102
? end_repeat_nmi+0x1a/0x1e
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
<EOE> ^A4---[ end trace 632723104d47d31a ]---
BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
kernel stack overflow (page fault): 0000 [#1] SMP
...
The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty. The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.
Add a check in the guess unwinder to deal with an empty stack. (The
frame pointer unwinder already has such a check.)
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Advertise control feature flags in xenstore.
- Fix x86 build when XEN_PVHVM is disabled.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJYDjVtAAoJEFxbo/MsZsTRv2UH/0YR95ajlgJnN/ldeG4KhBdV
Oe6piyw1cbHDPvFrFFl7HgYgAiiuaMxOFk+j/XKVJ7naAOD06kWHoVzZNkpNFF4i
2m81jGfvW3msbXd77aR+IHulWxRxQ9TE4HV2s94DiQiSJa2f02PqVCdqyJws736m
mjDdDRzd90xb2rDI3XrcRNnjgNaFtfMLGhtwtgXI5U+Ic+uVW1VBwLefZXCI2SKw
yUSVBwsYENgfGUJ+NmYrl53WmlSnAatrs1wClLVqm/0fD7+J2XLHRAonISTwoKtp
z+XJthe7uWq0Fb/DMiWhvTrTn852chy9BEC6QsRBmGM6RRZG9n7x8k97NgTiqiw=
=lM7p
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from David Vrabel:
- advertise control feature flags in xenstore
- fix x86 build when XEN_PVHVM is disabled
* tag 'for-linus-4.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: check return value of xenbus_scanf()
xenbus: prefer list_for_each()
x86: xen: move cpu_up functions out of ifdef
xenbus: advertise control feature flags
Three newly introduced functions are not defined when CONFIG_XEN_PVHVM is
disabled, but are still being used:
arch/x86/xen/enlighten.c:141:12: warning: ‘xen_cpu_up_prepare’ used but never defined
arch/x86/xen/enlighten.c:142:12: warning: ‘xen_cpu_up_online’ used but never defined
arch/x86/xen/enlighten.c:143:12: warning: ‘xen_cpu_dead’ used but never defined
Fixes: 4d737042d6c4 ("xen/x86: Convert to hotplug state machine")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Ondrej reported that IRQs stopped working in v4.7 on several
platforms. A typical scenario, from Ondrej's VT82C694X/694X, is:
ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI
We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.
We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.
Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().
Fixes: 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull x86 fixes from Ingo Molnar:
"Three fixes, a hw-enablement and a cross-arch fix/enablement change:
- SGI/UV fix for older platforms
- x32 signal handling fix
- older x86 platform bootup APIC fix
- AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
(Multiply Accumulation Single precision instructions) enablement.
- move thread_info back into x86 specific code, to make life easier
for other architectures trying to make use of
CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/smp: Don't try to poke disabled/non-existent APIC
sched/core, x86: Make struct thread_info arch specific again
x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
x86/vmware: Skip timer_irq_works() check on VMware
Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.
No real clue if this is the right fix, but at least my
P3 machine boots again.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Cc: stable@vger.kernel.org
Fixes: 2a51fe083eba ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The value of regs->orig_ax contains potentially useful debugging data:
For syscalls it contains the syscall number. For interrupts it contains
the (negated) vector number. To reduce noise, print it only if it has a
useful value (i.e., something other than -1).
Here's what it looks like for a write syscall:
RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
...
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line. That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:
Call Trace:
<IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
[<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
[<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
...
<EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
[<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
[<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
It will look worse once we start printing pt_regs registers found in the
middle of the stack:
<IRQ> RIP: 0010:[<ffffffff8189c6c3>] [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
RSP: 0018:ffff88007876f720 EFLAGS: 00000206
RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
...
Improve readability by adding a newline to the stack name:
Call Trace:
<IRQ>
[<ffffffff814431c3>] dump_stack+0x86/0xc3
[<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
[<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
...
<EOI>
[<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
[<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
[<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The entry code doesn't encode the pt_regs pointer for syscalls. But the
pt_regs are always at the same location, so we can add a manual check
for them.
A later patch prints them as part of the oops stack dump. They could be
useful, for example, to determine the arguments to a system call.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.
This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon. There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.
Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.
This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.
This patch also updates the unwinder to be able to decode it, because
otherwise the unwinder would be broken by this change.
Note that this causes a change in the behavior of the unwinder: each
instance of a pt_regs on the stack is now considered a "frame". So
callers of unwind_get_return_address() will now get an occasional
'regs->ip' address that would have previously been skipped over.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
gcc 7 warns:
arch/x86/kvm/ioapic.c: In function 'kvm_ioapic_reset':
arch/x86/kvm/ioapic.c:597:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
And it is right. Memset whole array using sizeof operator.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
[Added x86 subject tag]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
When CONFIG_CPU_FREQ is not set, int cpu is unused and gcc rightfully
warns about it:
arch/x86/kvm/x86.c: In function ‘kvm_timer_init’:
arch/x86/kvm/x86.c:5697:6: warning: unused variable ‘cpu’ [-Wunused-variable]
int cpu;
^~~
But since it is used only in the CONFIG_CPU_FREQ block, simply move it
there, thus squashing the warning too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The following commit:
c65eacbe290b ("sched/core: Allow putting thread_info into task_struct")
... made 'struct thread_info' a generic struct with only a
single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is
selected.
This change however seems to be quite x86 centric, since at least the
generic preemption code (asm-generic/preempt.h) assumes that struct
thread_info also has a preempt_count member, which apparently was not
true for x86.
We could add a bit more #ifdefs to solve this problem too, but it seems
to be much simpler to make struct thread_info arch specific
again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a
bit easier for architectures that have a couple of arch specific stuff
in their thread_info definition.
The arch specific stuff _could_ be moved to thread_struct. However
keeping them in thread_info makes it easier: accessing thread_info
members is simple, since it is at the beginning of the task_struct,
while the thread_struct is at the end. At least on s390 the offsets
needed to access members of the thread_struct (with task_struct as
base) are too large for various asm instructions. This is not a
problem when keeping these members within thread_info.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent introduction of SA_X32/IA32 sa_flags added a check for
user_64bit_mode() into sigaction_compat_abi(). user_64bit_mode() is true
for native 64-bit processes and x32 processes.
Due to that the function returns w/o setting the SA_X32_ABI flag for X32
processes. In consequence the kernel attempts to deliver the signal to the
X32 process in native 64-bit mode causing the process to segfault.
Remove the check, so the actual check for X32 mode which sets the ABI flag
can be reached. There is no side effect for native 64-bit mode.
[ tglx: Rewrote changelog ]
Fixes: 6846351052e6 ("x86/signal: Add SA_{X32,IA32}_ABI sa_flags")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Link: http://lkml.kernel.org/r/CAJwJo6Z8ZWPqNfT6t-i8GW1MKxQrKDUagQqnZ%2B0%2B697%3DMyVeGg@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers. Among other things, this can cause the unwinder to
behave incorrectly when unwinding to x86 head code.
Head code is text code too, so mark it as such. This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/789cf978866420e72fa89df44aa2849426ac378d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right below their saved pt_regs,
regardless of which syscall was used to enter the kernel. That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for the unwinder to verify the stack is
sane.
However, the boot CPU's idle "swapper" task doesn't follow that
convention. Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The frame at the end of each idle task stack has a zeroed return
address. This is inconsistent with real task stacks, which have a real
return address at that spot. This inconsistency can be confusing for
stack unwinders. It also hides useful information about what asm code
was involved in calling into C.
Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f59593ae7b15d5126f872b0a23143173d28aa32d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64(). They're identical except for the
stack setup. Combine the common parts into a shared start_cpu()
function.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1d692ffa62fcb3cc835a5b254e953f2d9bab3549.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 32-bit kernels, the initial idle stack calculation doesn't take into
account the TOP_OF_KERNEL_STACK_PADDING, making the stack end address
inconsistent with other tasks on 32-bit.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6cf569410bfa84cf923902fc4d628444cace94be.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The frame at the end of each idle task stack is inconsistent with real
task stacks, which have a stack frame header and a real return address
before the pt_regs area. This inconsistency can be confusing for stack
unwinders. It also hides useful information about what asm code was
involved in calling into C.
Fix that by changing the initial code jumps to calls. Also add infinite
loops after the calls to make it clear that the calls don't return, and
to hang if they do.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>