IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
"System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or
compat_sys_xyzzy() should only be called from userspace via the
syscall table, but not from elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from
v4.17 onwards to not call system call functions in the kernel: It is
better to use use a different calling convention for system calls
there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
which then hands processing over to the actual syscall function. This
means that only those parameters which are actually needed for a
specific syscall are passed on during syscall entry, instead of
filling in six CPU registers with random user space content all the
time (which may cause serious trouble down the call chain). Those
x86-specific patches will be pushed through the x86 tree in the near
future.
Moreover, rules on how data may be accessed may differ between kernel
data and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific
code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"
* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
bpf: whitelist all syscalls for error injection
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
kernel/sys_ni: sort cond_syscall() entries
syscalls/x86: auto-create compat_sys_*() prototypes
syscalls: sort syscall prototypes in include/linux/compat.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
kexec: move sys_kexec_load() prototype to syscalls.h
x86/sigreturn: use SYSCALL_DEFINE0
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
...
Pull x86 dma mapping updates from Ingo Molnar:
"This tree, by Christoph Hellwig, switches over the x86 architecture to
the generic dma-direct and swiotlb code, and also unifies more of the
dma-direct code between architectures. The now unused x86-only
primitives are removed"
* 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs
swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS
dma/swiotlb: Remove swiotlb_{alloc,free}_coherent()
dma/direct: Handle force decryption for DMA coherent buffers in common code
dma/direct: Handle the memory encryption bit in common code
dma/swiotlb: Remove swiotlb_set_mem_attributes()
set_memory.h: Provide set_memory_{en,de}crypted() stubs
x86/dma: Remove dma_alloc_coherent_gfp_flags()
iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()
iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
x86/dma/amd_gart: Use dma_direct_{alloc,free}()
x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA
x86/dma: Use generic swiotlb_ops
x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y)
x86/dma: Remove dma_alloc_coherent_mask()
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel
calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that
this function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as ksys_fadvise64_64().
Some compat stubs called sys_fadvise64(), which then just passed through
the arguments to sys_fadvise64_64(). Get rid of this indirection, and call
ksys_fadvise64_64() directly.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel
calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_fallocate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_p{read,write}64() wrappers allows us to get rid of
in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_p{read,write}64().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_truncate() wrapper allows us to get rid of in-kernel
calls to the sys_truncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_truncate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using this helper allows us to avoid the in-kernel calls to the
sys_sync_file_range() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_sync_file_range().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_ftruncate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Pull perf updates from Ingo Molnar:
"The main kernel side changes were:
- Modernize the kprobe and uprobe creation/destruction tooling ABIs:
The existing text based APIs (kprobe_events and uprobe_events in
tracefs), are naive, limited ABIs in that they require user-space
to clean up after themselves, which is both difficult and fragile
if the tool is buggy or exits unexpectedly. In other words they are
not really suited for modern, robust tooling.
So introduce a modern, file descriptor based ABI that does not have
these limitations: introduce the 'perf_kprobe' and 'perf_uprobe'
PMUs and extend the perf_event_open() syscall to create events with
a kprobe/uprobe attached to them. These [k,u]probe are associated
with this file descriptor, so they are not available in tracefs.
(Song Liu)
- Intel Cannon Lake CPU support (Harry Pan)
- Intel PT cleanups (Alexander Shishkin)
- Improve the performance of pinned/flexible event groups by using RB
trees (Alexey Budankov)
- Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification
of hardware breakpoints, which new ABI variant massively speeds up
existing tooling that uses hardware breakpoints to instrument (and
debug) memory usage.
(Milind Chabbi, Jiri Olsa)
- Various Intel PEBS handling fixes and improvements, and other Intel
PMU improvements (Kan Liang)
- Various perf core improvements and optimizations (Peter Zijlstra)
- ... misc cleanups, fixes and updates.
There's over 200 tooling commits, here's an (imperfect) list of
highlights:
- 'perf annotate' improvements:
* Recognize and handle jumps to other functions as calls, which
improves the navigation along jumps and back. (Arnaldo Carvalho
de Melo)
* Add the 'P' hotkey in TUI annotation to dump annotation output
into a file, to ease e-mail reporting of annotation details.
(Arnaldo Carvalho de Melo)
* Add an IPC/cycles column to the TUI (Jin Yao)
* Improve s390 assembly annotation (Thomas Richter)
* Refactor the output formatting logic to better separate it into
interactive and non-interactive features and add the --stdio2
output variant to demonstrate this. (Arnaldo Carvalho de Melo)
- 'perf script' improvements:
* Add Python 3 support (Jaroslav Škarvada)
* Add --show-round-event (Jiri Olsa)
- 'perf c2c' improvements:
* Add NUMA analysis support (Jiri Olsa)
- 'perf trace' improvements:
* Improve PowerPC support (Ravi Bangoria)
- 'perf inject' improvements:
* Integrate ARM CoreSight traces (Robert Walker)
- 'perf stat' improvements:
* Add the --interval-count option (yuzhoujian)
* Add the --timeout option (yuzhoujian)
- 'perf sched' improvements (Changbin Du)
- Vendor events improvements :
* Add IBM s390 vendor events (Thomas Richter)
* Add and improve arm64 vendor events (John Garry, Ganapatrao
Kulkarni)
* Update POWER9 vendor events (Sukadev Bhattiprolu)
- Intel PT tooling improvements (Adrian Hunter)
- PMU handling improvements (Agustin Vega-Frias)
- Record machine topology in perf.data (Jiri Olsa)
- Various overwrite related cleanups (Kan Liang)
- Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet)
- ... and lots of other changes, cleanups and fixes, see the shortlog
and Git history for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
perf/x86/intel: Enable C-state residency events for Cannon Lake
perf/x86/intel: Add Cannon Lake support for RAPL profiling
perf/x86/pt, coresight: Clean up address filter structure
perf vendor events s390: Add JSON files for IBM z14
perf vendor events s390: Add JSON files for IBM z13
perf vendor events s390: Add JSON files for IBM zEC12 zBC12
perf vendor events s390: Add JSON files for IBM z196
perf vendor events s390: Add JSON files for IBM z10EC z10BC
perf mmap: Be consistent when checking for an unmaped ring buffer
perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done()
perf build: Fix check-headers.sh opts assignment
perf/x86: Update rdpmc_always_available static key to the modern API
perf annotate: Use absolute addresses to calculate jump target offsets
perf annotate: Defer searching for comma in raw line till it is needed
perf annotate: Support jumping from one function to another
perf annotate: Add "_local" to jump/offset validation routines
perf python: Reference Py_None before returning it
perf annotate: Mark jumps to outher functions with the call arrow
perf annotate: Pass function descriptor to its instruction parsing routines
perf annotate: No need to calculate notes->start twice
...
In commit 9690c1574268 ("powerpc/mm/radix: Fix always false comparison
against MMU_NO_CONTEXT") an issue was discovered where `mm->context.id` was
being truncated to an `unsigned int`, while the PID is actually an
`unsigned long`. Update the earlier patch by fixing one remaining
occurrence. Discovered during a compilation with W=1:
arch/powerpc/mm/tlb-radix.c:702:19: error: comparison is always false due to limited range of data type [-Werror=type-limits]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the
user context when single_step_exception() prepares the SIGTRAP
delivery. The resulting branch-trap-within-the-SIGTRAP-handler
isn't healthy.
Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by
replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call
to clear_single_step() which only clears MSR_SE.
This patch adds a new helper, clear_br_trace(), which clears the
debug trap before invoking the signal handler. This helper is a
NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reduces vmlinux text size by 1kB and data by 1.5kB with a small
build!
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little
endian possible mask as noticed by Nick.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add GENERIC_CPU support for little-endian rather than using POWER8
specific selection for POWER9 and above.
Restrict GENERIC_CPU to POWER8 and above on little endian.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt
being redefined. Spell out that GENERIC means >= POWER4 for BE.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER4 has been broken since at least the change 49d09bf2a6
("powerpc/64s: Optimise MSR handling in exception handling"), which
requires mtmsrd L=1 support. This was introduced in ISA v2.01, and
POWER4 supports ISA v2.00.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The last usage was removed in c17b98cf6028 ("KVM: PPC: Book3S HV:
Remove code for PPC970 processors") (Dec 2014).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and
above (which is what the cputable setup does). Fix DT CPU features
quirk setup to match.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Merge with upstream changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rather than override the machine type in .S code (which can hide wrong
or ambiguous code generation for the target), set the type to power4
for all assembly.
This also means we need to be careful not to build power4-only code
when we're not building for Book3S, such as the "power7" versions of
copyuser/page/memcpy.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
ALTIVEC and VSX features are not added by to default to the POWERx CPU
feature sets because they are intended to be enabled by firmware.
Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in
other the set for other CPUs, eg. PPC970.
But they should be added individually to the CPU_FTRS_POSSIBLE set,
because if we reduce the set of CPUs that are built-for they may
disappear from the possible mask.
It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features
should be used because they won't be present if compiled out.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add detail to change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a
missed opportunity for optimisation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When building a uImage or zImage using ppc6xx_defconfig and some other
defconfigs, the following error occurs with GCC 4.5.1:
/arch/powerpc/boot/libfdt_env.h:10:13: error: redefinition of typedef 'uint32_t'
/arch/powerpc/boot/types.h:21:13: note: previous declaration of 'uint32_t' was here
/arch/powerpc/boot/libfdt_env.h:11:13: error: redefinition of typedef 'uint64_t'
/arch/powerpc/boot/types.h:22:13: note: previous declaration of 'uint64_t' was here
The problem is that commit 656ad58ef19e (powerpc/boot: Add OPAL
console to epapr wrappers) adds typedefs for uint32_t and uint64_t to
type.h but doesn't remove the pre-existing (and now duplicate)
typedefs from libfdt_env.h.
Fix the error by removing the duplicate typedefs from libfdt_env.h
Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When waking from a CPU idle instruction (e.g., nap or stop), the sync
for ordering the KVM secondary thread state can be avoided if there
wakeup is coming from a kernel context rather than KVM context.
This improves performance for ping-pong benchmark with the stop0 idle
state by 0.46% for 2 threads in the same core, and 1.02% for different
cores.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Implement a new function to invoke stop, power9_offline_stop, which is
like power9_idle_stop but used by the cpu hotplug code.
Move KVM secondary state manipulation code to the offline case.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
system_reset_exception does most of its own crash handling now,
invoking the debugger or crash dumps if they are registered. If not,
then it goes through to die() to print stack traces, and then is
supposed to panic (according to comments).
However after die() prints oopses, it does its own handling which
doesn't allow system_reset_exception to panic (e.g., it may just
kill the current process). This patch causes sreset exceptions to
return from die after it prints messages but before acting.
This also stops die from invoking the debugger on 0x100 crashes.
system_reset_exception similarly calls the debugger. It had been
thought this was harmless (because if the debugger was disabled,
neither call would fire, and if it was enabled the first call
would return). However in some cases like xmon 'X' command, the
debugger returns 0, which currently causes it to be entered
again (first in system_reset_exception, then in die), which is
confusing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
System Reset, being an NMI, must return more carefully than other
interrupts. It has traditionally returned via the nromal return
from exception path, but that has a number of problems.
- r13 does not get restored if returning to kernel. This is for
interrupts which may cause a context switch, which sreset will
never do. Interrupting OPAL (which uses a different r13) is one
place where this causes breakage.
- It may cause several other problems returning to kernel with
preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time.
It's safer just to have a simple restore and return, like machine
check which is the other NMI.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current EEH callbacks can race with a driver unbind. This can
result in a backtraces like this:
EEH: Frozen PHB#0-PE#1fc detected
EEH: PE location: S000009, PHB location: N/A
CPU: 2 PID: 2312 Comm: kworker/u258:3 Not tainted 4.15.6-openpower1 #2
Workqueue: nvme-wq nvme_reset_work [nvme]
Call Trace:
dump_stack+0x9c/0xd0 (unreliable)
eeh_dev_check_failure+0x420/0x470
eeh_check_failure+0xa0/0xa4
nvme_reset_work+0x138/0x1414 [nvme]
process_one_work+0x1ec/0x328
worker_thread+0x2e4/0x3a8
kthread+0x14c/0x154
ret_from_kernel_thread+0x5c/0xc8
nvme nvme1: Removing after probe failure status: -19
<snip>
cpu 0x23: Vector: 300 (Data Access) at [c000000ff50f3800]
pc: c0080000089a0eb0: nvme_error_detected+0x4c/0x90 [nvme]
lr: c000000000026564: eeh_report_error+0xe0/0x110
sp: c000000ff50f3a80
msr: 9000000000009033
dar: 400
dsisr: 40000000
current = 0xc000000ff507c000
paca = 0xc00000000fdc9d80 softe: 0 irq_happened: 0x01
pid = 782, comm = eehd
Linux version 4.15.6-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2017.11.2-00008-g4b6188e)) #2 SM P Tue Feb 27 12:33:27 PST 2018
enter ? for help
eeh_report_error+0xe0/0x110
eeh_pe_dev_traverse+0xc0/0xdc
eeh_handle_normal_event+0x184/0x4c4
eeh_handle_event+0x30/0x288
eeh_event_handler+0x124/0x170
kthread+0x14c/0x154
ret_from_kernel_thread+0x5c/0xc8
The first part is an EEH (on boot), the second half is the resulting
crash. nvme probe starts the nvme_reset_work() worker thread. This
worker thread starts touching the device which see a device error
(EEH) and hence queues up an event in the powerpc EEH worker
thread. nvme_reset_work() then continues and runs
nvme_remove_dead_ctrl_work() which results in unbinding the driver
from the device and hence releases all resources. At the same time,
the EEH worker thread starts doing the EEH .error_detected() driver
callback, which no longer works since the resources have been freed.
This fixes the problem in the same way the generic PCIe AER code (in
drivers/pci/pcie/aer/aerdrv_core.c) does. It makes the EEH code hold
the device_lock() while performing the driver EEH callbacks and
associated code. This ensures either the callbacks are no longer
register, or if they are registered the driver will not be removed
from underneath us.
This has been broken forever. The EEH call backs were first introduced
in 2005 (in 77bd7415610) but it's not clear if a lock was needed back
then.
Fixes: 77bd74156101 ("[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines")
Cc: stable@vger.kernel.org # v2.6.16+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
kexec_file_load() on powerpc doesn't support kdump kernels yet, so it
returns -ENOTSUPP in that case.
I've recently learned that this errno is internal to the kernel and
isn't supposed to be exposed to userspace. Therefore, change to
-EOPNOTSUPP which is defined in an uapi header.
This does indeed make kexec-tools happier. Before the patch, on
ppc64le:
# ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz
kexec_file_load failed: Unknown error 524
After the patch:
# ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz
kexec_file_load failed: Operation not supported
Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This hack, introduced in commit c5df7f775148 ("powerpc: allow ioremap
within reserved memory regions") is now unnecessary.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Because the two memory blocks (usually called MEM1 and MEM2) are not
merged anymore, __request_region in kernel/resource.c will correctly
allow reserving regions in the physical address space between MEM1 and
MEM2, where many important peripherals are (GPIO, MMC, USB, ...).
A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c
ensures that multiple memblocks are properly considered in ioremap; this
makes it unnecessary to set __allow_ioremap_reserved.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On systems where there is MMIO space between different blocks of RAM in
the physical address space, __ioremap_caller did not allow mapping these
MMIO areas, because they were below the end RAM and thus considered RAM
as well. Use the memblock-based page_is_ram function, which returns
false for such MMIO holes.
v2:
Keep the check for p < virt_to_phys(high_memory). On 32-bit systems
with high memory (memory above physical address 4GiB), the high memory
is expected to be available though ioremap. The high_memory variable
marks the end of low memory; comparing against it means that only
ioremap requests for low RAM will be denied.
Reported by Michael Ellerman.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To support accurate checking for different blocks of memory on PPC32,
use the same memblock-based approach that's already used on PPC64 also
on PPC32.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of open-coding the search in page_is_ram, call memblock_is_memory.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Wii has a blue LED in the disk drive slot, which is controlled via a
GPIO line. Add this LED to wii.dts, and mark it as a panic-indicator.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These are the GPIO line names on a Nintendo Wii, as documented in:
https://wiibrew.org/wiki/Hardware/Hollywood_GPIOs
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Hollywood GPIO controller supports 32 GPIOs, but on the Wii, only 24
are used.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Hollywood chipset's GPIO controller has two sets of registers: One
for access by the PowerPC CPU, and one for access by the ARM coprocessor
(but both are accessible from the PPC because the memory firewall
(AHBPROT) is usually disabled when booting Linux, today).
The wii_power_off function currently assumes that the poweroff GPIO pin
is configured for use via the ARM side, but the upcoming GPIO driver
configures all pins for use via the PPC side, breaking poweroff.
Configure the owner register explicitly in wii_power_off to make
wii_power_off work with and without the new GPIO driver.
I think the Wii can be switched to the generic gpio-poweroff driver,
after the GPIO driver is merged.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Previously, wii_device_probe would only initialize devices under the
/hollywood node. After this patch, platform devices placed outside of
/hollywood will also be initialized.
The intended usecase for this are devices located outside of the
Hollywood chip, such as GPIO LEDs and GPIO buttons.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On 64-bit Book3E systems, in setup_tlb_core_data() we reference other
CPUs pacas. But in commit 59f577743d71 ("powerpc/64: Defer paca
allocation until memory topology is discovered") the allocation of
non-boot-CPU pacas was deferred until later in boot.
This leads to an oops:
CPU maps initialized for 1 thread per core
Unable to handle kernel paging request for data at address 0x8888888888888918
Faulting instruction address: 0xc000000000e2f0d0
Oops: Kernel access of bad area, sig: 11 [#1]
NIP .setup_tlb_core_data+0xdc/0x160
Call Trace:
.setup_tlb_core_data+0x5c/0x160 (unreliable)
.setup_arch+0x80/0x348
.start_kernel+0x7c/0x598
start_here_common+0x1c/0x40
Luckily setup_tlb_core_data() is called immediately prior to
smp_setup_pacas(). So simply switching their order is sufficient to
fix the oops and seems unlikely to have any other unwanted side
effects.
Fixes: 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with
H_SET_DABR. Since the recent commit e8ebedbf3131 ("KVM: PPC: Book3S
HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to
return H_UNSUPPORTED on Power9, we see guest boot failures, the
symptom is the boot seems to just stop in SLOF, eg:
SLOF ***************************************************************
QEMU Starting
Build Date = Sep 24 2017 12:23:07
FW Version = buildd@ release 20170724
<no further output>
SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return
value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break
the guest boot.
That does mean we return a different error to PowerVM in this case,
but that's probably not a big concern.
Fixes: e8ebedbf3131 ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.
This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
PPC:
- Fix a bug causing occasional machine check exceptions on POWER8 hosts
(introduced in 4.16-rc1)
x86:
- Fix a guest crashing regression with nested VMX and restricted guest
(introduced in 4.16-rc1)
- Fix dependency check for pv tlb flush (The wrong dependency that
effectively disabled the feature was added in 4.16-rc4, the original
feature in 4.16-rc1, so it got decent testing.)
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJavUt5AAoJEED/6hsPKofo8uQH/RuijrsAIUnymkYY+6BYFXlh
Ri8qhG8VB+C3SpWEtsqcqNVkjJTepCD2Ej5BJTL4Gc9BSTWy7Ht6kqskEgwcnzu2
xRfkg0q0vTj1+GDd+UiTZfxiinoHtB9x3fiXali5UNTCd1fweLxdidETfO+GqMMq
KDhTR+S8dXE5VG7r+iJ80LZPtHQJ94f0fh9XpQk3X2ExTG5RBxag1U2nCfiKRAZk
xRv1CNAxNaBxS38CgYfHzg31NJx38fnq/qREsIdOx0Ju9WQkglBFkhLAGUb4vL0I
nn8YX/oV9cW2G8tyPWjC245AouABOLbzu0xyj5KgCY/z1leA9tdLFX/ET6Zye+E=
=++uZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix a bug causing occasional machine check exceptions on POWER8
hosts (introduced in 4.16-rc1)
x86:
- Fix a guest crashing regression with nested VMX and restricted
guest (introduced in 4.16-rc1)
- Fix dependency check for pv tlb flush (the wrong dependency that
effectively disabled the feature was added in 4.16-rc4, the
original feature in 4.16-rc1, so it got decent testing)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix pv tlb flush dependencies
KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0
KVM: PPC: Book3S HV: Fix duplication of host SLB entries
We need to zero-out pgd table only if we share the slab cache with
pud/pmd level caches. With the support of 4PB, we don't share the slab
cache anymore. Instead of removing the code completely hide it within
an #ifdef. We don't need to do this with any other page table level,
because they all allocate table of double the size and we take of
initializing the first half corrrectly during page table zap.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Consolidate multiple #if / #ifdef into one]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch increases the max virtual (effective) address value to 4PB.
With 4K page size config we continue to limit ourself to 64TB.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the H_PGTABLE_RANGE test, update it to work]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.
The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a following patch, on finding a free area we will need to do
allocatinon of extra contexts as needed. Consolidating the return path
for slice_get_unmapped_area() will make that easier.
Split into a separate patch to make review easy.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Memory keys are supported only with hash translation mode. Instead of
using #ifdef in generic code move the key related pte bits to
respective headers
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
asm/barrier.h is not always included after asm/synch.h, which meant
it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would
be eieio when it should be lwsync. kernel/time/hrtimer.c is one case.
__SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in
to where it's used. Previously with my small simulator config, 377
instances of eieio in the tree. After this patch there are 55.
Fixes: 46d075be585e ("powerpc: Optimise smp_wmb")
Cc: stable@vger.kernel.org # v2.6.29+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>