linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	642e7fd233	Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...	2018-04-02 21:22:12 -07:00
Linus Torvalds	2fcd2b306a	Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dma mapping updates from Ingo Molnar: "This tree, by Christoph Hellwig, switches over the x86 architecture to the generic dma-direct and swiotlb code, and also unifies more of the dma-direct code between architectures. The now unused x86-only primitives are removed" * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS dma/swiotlb: Remove swiotlb_{alloc,free}_coherent() dma/direct: Handle force decryption for DMA coherent buffers in common code dma/direct: Handle the memory encryption bit in common code dma/swiotlb: Remove swiotlb_set_mem_attributes() set_memory.h: Provide set_memory_{en,de}crypted() stubs x86/dma: Remove dma_alloc_coherent_gfp_flags() iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent() iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() x86/dma/amd_gart: Use dma_direct_{alloc,free}() x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA x86/dma: Use generic swiotlb_ops x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) x86/dma: Remove dma_alloc_coherent_mask()	2018-04-02 17:18:45 -07:00
Dominik Brodowski	c7b95d5156	mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Using this helper allows us to avoid the in-kernel calls to the sys_readahead() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_readahead(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:12 +02:00
Dominik Brodowski	a90f590a1b	mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Using this helper allows us to avoid the in-kernel calls to the sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mmap_pgoff(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:11 +02:00
Dominik Brodowski	9d5b7c956b	mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_fadvise64_64(). Some compat stubs called sys_fadvise64(), which then just passed through the arguments to sys_fadvise64_64(). Get rid of this indirection, and call ksys_fadvise64_64() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:10 +02:00
Dominik Brodowski	edf292c76b	fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() Using the ksys_fallocate() wrapper allows us to get rid of in-kernel calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_fallocate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	36028d5dd7	fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls Using the ksys_p{read,write}64() wrappers allows us to get rid of in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_p{read,write}64(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	df260e21e6	fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() Using the ksys_truncate() wrapper allows us to get rid of in-kernel calls to the sys_truncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_truncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:08 +02:00
Dominik Brodowski	806cbae122	fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Using this helper allows us to avoid the in-kernel calls to the sys_sync_file_range() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync_file_range(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:07 +02:00
Dominik Brodowski	411d9475cf	fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ftruncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:00 +02:00
Linus Torvalds	486adcea4a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...	2018-04-02 11:06:34 -07:00
Mathieu Malaterre	19e68b2aec	powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT In commit 9690c1574268 ("powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT") an issue was discovered where `mm->context.id` was being truncated to an `unsigned int`, while the PID is actually an `unsigned long`. Update the earlier patch by fixing one remaining occurrence. Discovered during a compilation with W=1: arch/powerpc/mm/tlb-radix.c:702:19: error: comparison is always false due to limited range of data type [-Werror=type-limits] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:34 +10:00
Matt Evans	0e524e761f	powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the user context when single_step_exception() prepares the SIGTRAP delivery. The resulting branch-trap-within-the-SIGTRAP-handler isn't healthy. Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by replacing an MSR mask operation of ~(MSR_SE \| MSR_BE) with a call to clear_single_step() which only clears MSR_SE. This patch adds a new helper, clear_br_trace(), which clears the debug trap before invoking the signal handler. This helper is a NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:33 +10:00
Nicholas Piggin	4b7e5532d2	powerpc/64s: Add POWER9 CPU type selection Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:32 +10:00
Nicholas Piggin	db5ae1c155	powerpc/64s: Refine feature sets for little endian builds This reduces vmlinux text size by 1kB and data by 1.5kB with a small build! Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little endian possible mask as noticed by Nick.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:14:40 +10:00
Nicholas Piggin	a73657ea19	powerpc/64: Add GENERIC_CPU support for little endian Add GENERIC_CPU support for little-endian rather than using POWER8 specific selection for POWER9 and above. Restrict GENERIC_CPU to POWER8 and above on little endian. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt being redefined. Spell out that GENERIC means >= POWER4 for BE.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 21:52:52 +10:00
Nicholas Piggin	471d7ff8b5	powerpc/64s: Remove POWER4 support POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	3735eb850e	powerpc: Remove unused CPU_FTR_ARCH_201 The last usage was removed in c17b98cf6028 ("KVM: PPC: Book3S HV: Remove code for PPC970 processors") (Dec 2014). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	9e9626ed3a	powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the cputable setup does). Fix DT CPU features quirk setup to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:49 +11:00
Nicholas Piggin	15a3204d24	powerpc/64s: Set assembler machine type to POWER4 Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:49 +11:00
Nicholas Piggin	d50614fa45	powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLE ALTIVEC and VSX features are not added by to default to the POWERx CPU feature sets because they are intended to be enabled by firmware. Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in other the set for other CPUs, eg. PPC970. But they should be added individually to the CPU_FTRS_POSSIBLE set, because if we reduce the set of CPUs that are built-for they may disappear from the possible mask. It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features should be used because they won't be present if compiled out. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Nicholas Piggin	b842bd0f7a	powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYS It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a missed opportunity for optimisation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Mark Greer	147704534e	powerpc/boot: Remove duplicate typedefs from libfdt_env.h When building a uImage or zImage using ppc6xx_defconfig and some other defconfigs, the following error occurs with GCC 4.5.1: /arch/powerpc/boot/libfdt_env.h:10:13: error: redefinition of typedef 'uint32_t' /arch/powerpc/boot/types.h:21:13: note: previous declaration of 'uint32_t' was here /arch/powerpc/boot/libfdt_env.h:11:13: error: redefinition of typedef 'uint64_t' /arch/powerpc/boot/types.h:22:13: note: previous declaration of 'uint64_t' was here The problem is that commit 656ad58ef19e (powerpc/boot: Add OPAL console to epapr wrappers) adds typedefs for uint32_t and uint64_t to type.h but doesn't remove the pre-existing (and now duplicate) typedefs from libfdt_env.h. Fix the error by removing the duplicate typedefs from libfdt_env.h Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:47 +11:00
Nicholas Piggin	8c1c7fb0b5	powerpc/64s/idle: avoid sync for KVM state when waking from idle When waking from a CPU idle instruction (e.g., nap or stop), the sync for ordering the KVM secondary thread state can be avoided if there wakeup is coming from a kernel context rather than KVM context. This improves performance for ping-pong benchmark with the stop0 idle state by 0.46% for 2 threads in the same core, and 1.02% for different cores. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:47 +11:00
Nicholas Piggin	3d4fbffdd7	powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:46 +11:00
Nicholas Piggin	d40b6768e4	powerpc/64s: sreset panic if there is no debugger or crash dump handlers system_reset_exception does most of its own crash handling now, invoking the debugger or crash dumps if they are registered. If not, then it goes through to die() to print stack traces, and then is supposed to panic (according to comments). However after die() prints oopses, it does its own handling which doesn't allow system_reset_exception to panic (e.g., it may just kill the current process). This patch causes sreset exceptions to return from die after it prints messages but before acting. This also stops die from invoking the debugger on 0x100 crashes. system_reset_exception similarly calls the debugger. It had been thought this was harmless (because if the debugger was disabled, neither call would fire, and if it was enabled the first call would return). However in some cases like xmon 'X' command, the debugger returns 0, which currently causes it to be entered again (first in system_reset_exception, then in die), which is confusing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:46 +11:00
Nicholas Piggin	15b4dd7981	powerpc/64s: return more carefully from sreset NMI System Reset, being an NMI, must return more carefully than other interrupts. It has traditionally returned via the nromal return from exception path, but that has a number of problems. - r13 does not get restored if returning to kernel. This is for interrupts which may cause a context switch, which sreset will never do. Interrupting OPAL (which uses a different r13) is one place where this causes breakage. - It may cause several other problems returning to kernel with preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time. It's safer just to have a simple restore and return, like machine check which is the other NMI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:45 +11:00
Michael Neuling	f0295e047f	powerpc/eeh: Fix race with driver un/bind The current EEH callbacks can race with a driver unbind. This can result in a backtraces like this: EEH: Frozen PHB#0-PE#1fc detected EEH: PE location: S000009, PHB location: N/A CPU: 2 PID: 2312 Comm: kworker/u258:3 Not tainted 4.15.6-openpower1 #2 Workqueue: nvme-wq nvme_reset_work [nvme] Call Trace: dump_stack+0x9c/0xd0 (unreliable) eeh_dev_check_failure+0x420/0x470 eeh_check_failure+0xa0/0xa4 nvme_reset_work+0x138/0x1414 [nvme] process_one_work+0x1ec/0x328 worker_thread+0x2e4/0x3a8 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 nvme nvme1: Removing after probe failure status: -19 <snip> cpu 0x23: Vector: 300 (Data Access) at [c000000ff50f3800] pc: c0080000089a0eb0: nvme_error_detected+0x4c/0x90 [nvme] lr: c000000000026564: eeh_report_error+0xe0/0x110 sp: c000000ff50f3a80 msr: 9000000000009033 dar: 400 dsisr: 40000000 current = 0xc000000ff507c000 paca = 0xc00000000fdc9d80 softe: 0 irq_happened: 0x01 pid = 782, comm = eehd Linux version 4.15.6-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2017.11.2-00008-g4b6188e)) #2 SM P Tue Feb 27 12:33:27 PST 2018 enter ? for help eeh_report_error+0xe0/0x110 eeh_pe_dev_traverse+0xc0/0xdc eeh_handle_normal_event+0x184/0x4c4 eeh_handle_event+0x30/0x288 eeh_event_handler+0x124/0x170 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 The first part is an EEH (on boot), the second half is the resulting crash. nvme probe starts the nvme_reset_work() worker thread. This worker thread starts touching the device which see a device error (EEH) and hence queues up an event in the powerpc EEH worker thread. nvme_reset_work() then continues and runs nvme_remove_dead_ctrl_work() which results in unbinding the driver from the device and hence releases all resources. At the same time, the EEH worker thread starts doing the EEH .error_detected() driver callback, which no longer works since the resources have been freed. This fixes the problem in the same way the generic PCIe AER code (in drivers/pci/pcie/aer/aerdrv_core.c) does. It makes the EEH code hold the device_lock() while performing the driver EEH callbacks and associated code. This ensures either the callbacks are no longer register, or if they are registered the driver will not be removed from underneath us. This has been broken forever. The EEH call backs were first introduced in 2005 (in 77bd7415610) but it's not clear if a lock was needed back then. Fixes: 77bd74156101 ("[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines") Cc: stable@vger.kernel.org # v2.6.16+ Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:45 +11:00
Thiago Jung Bauermann	bf8a1abc3d	powerpc/kexec_file: Fix error code when trying to load kdump kernel kexec_file_load() on powerpc doesn't support kdump kernels yet, so it returns -ENOTSUPP in that case. I've recently learned that this errno is internal to the kernel and isn't supposed to be exposed to userspace. Therefore, change to -EOPNOTSUPP which is defined in an uapi header. This does indeed make kexec-tools happier. Before the patch, on ppc64le: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Unknown error 524 After the patch: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Operation not supported Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:44 +11:00
Jonathan Neuschäfer	7e1405917c	powerpc/mm/32: Remove the reserved memory hack This hack, introduced in commit c5df7f775148 ("powerpc: allow ioremap within reserved memory regions") is now unnecessary. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:44 +11:00
Jonathan Neuschäfer	57deb8fea0	powerpc/wii: Don't rely on the reserved memory hack Because the two memory blocks (usually called MEM1 and MEM2) are not merged anymore, __request_region in kernel/resource.c will correctly allow reserving regions in the physical address space between MEM1 and MEM2, where many important peripherals are (GPIO, MMC, USB, ...). A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c ensures that multiple memblocks are properly considered in ioremap; this makes it unnecessary to set __allow_ioremap_reserved. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:43 +11:00
Jonathan Neuschäfer	2bbf63264a	powerpc/mm/32: Use page_is_ram to check for RAM On systems where there is MMIO space between different blocks of RAM in the physical address space, __ioremap_caller did not allow mapping these MMIO areas, because they were below the end RAM and thus considered RAM as well. Use the memblock-based page_is_ram function, which returns false for such MMIO holes. v2: Keep the check for p < virt_to_phys(high_memory). On 32-bit systems with high memory (memory above physical address 4GiB), the high memory is expected to be available though ioremap. The high_memory variable marks the end of low memory; comparing against it means that only ioremap requests for low RAM will be denied. Reported by Michael Ellerman. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:43 +11:00
Jonathan Neuschäfer	f65e67c7e3	powerpc/mm: Use memblock API for PPC32 page_is_ram To support accurate checking for different blocks of memory on PPC32, use the same memblock-based approach that's already used on PPC64 also on PPC32. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:42 +11:00
Jonathan Neuschäfer	2615c93e5f	powerpc/mm: Simplify page_is_ram by using memblock_is_memory Instead of open-coding the search in page_is_ram, call memblock_is_memory. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:42 +11:00
Jonathan Neuschäfer	041413b88d	powerpc/wii.dts: Add drive slot LED The Wii has a blue LED in the disk drive slot, which is controlled via a GPIO line. Add this LED to wii.dts, and mark it as a panic-indicator. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:41 +11:00
Jonathan Neuschäfer	80873a0b3a	powerpc/wii.dts: Add GPIO line names These are the GPIO line names on a Nintendo Wii, as documented in: https://wiibrew.org/wiki/Hardware/Hollywood_GPIOs Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:40 +11:00
Jonathan Neuschäfer	9693d5709f	powerpc/wii.dts: Add ngpios property The Hollywood GPIO controller supports 32 GPIOs, but on the Wii, only 24 are used. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:40 +11:00
Jonathan Neuschäfer	9cbaaec1cf	powerpc/wii: Explicitly configure GPIO owner for poweroff pin The Hollywood chipset's GPIO controller has two sets of registers: One for access by the PowerPC CPU, and one for access by the ARM coprocessor (but both are accessible from the PPC because the memory firewall (AHBPROT) is usually disabled when booting Linux, today). The wii_power_off function currently assumes that the poweroff GPIO pin is configured for use via the ARM side, but the upcoming GPIO driver configures all pins for use via the PPC side, breaking poweroff. Configure the owner register explicitly in wii_power_off to make wii_power_off work with and without the new GPIO driver. I think the Wii can be switched to the generic gpio-poweroff driver, after the GPIO driver is merged. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:39 +11:00
Jonathan Neuschäfer	7ab96c0a08	powerpc/wii: Probe the whole devicetree Previously, wii_device_probe would only initialize devices under the /hollywood node. After this patch, platform devices placed outside of /hollywood will also be initialized. The intended usecase for this are devices located outside of the Hollywood chip, such as GPIO LEDs and GPIO buttons. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:39 +11:00
Michael Ellerman	1d0afc0d5a	powerpc/64e: Fix oops due to deferral of paca allocation On 64-bit Book3E systems, in setup_tlb_core_data() we reference other CPUs pacas. But in commit 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") the allocation of non-boot-CPU pacas was deferred until later in boot. This leads to an oops: CPU maps initialized for 1 thread per core Unable to handle kernel paging request for data at address 0x8888888888888918 Faulting instruction address: 0xc000000000e2f0d0 Oops: Kernel access of bad area, sig: 11 [#1] NIP .setup_tlb_core_data+0xdc/0x160 Call Trace: .setup_tlb_core_data+0x5c/0x160 (unreliable) .setup_arch+0x80/0x348 .start_kernel+0x7c/0x598 start_here_common+0x1c/0x40 Luckily setup_tlb_core_data() is called immediately prior to smp_setup_pacas(). So simply switching their order is sufficient to fix the oops and seems unlikely to have any other unwanted side effects. Fixes: 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:38 +11:00
Aneesh Kumar K.V	ca9a16c3bc	powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with H_SET_DABR. Since the recent commit e8ebedbf3131 ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to return H_UNSUPPORTED on Power9, we see guest boot failures, the symptom is the boot seems to just stop in SLOF, eg: SLOF *************************************************************** QEMU Starting Build Date = Sep 24 2017 12:23:07 FW Version = buildd@ release 20170724 <no further output> SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break the guest boot. That does mean we return a different error to PowerVM in this case, but that's probably not a big concern. Fixes: e8ebedbf3131 ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:13 +11:00
Michael Ellerman	f437c51748	Merge branch 'topic/paca' into next Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.	2018-03-31 09:09:36 +11:00
Linus Torvalds	72573481eb	KVM fixes for v4.16-rc8 PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (The wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing.) -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJavUt5AAoJEED/6hsPKofo8uQH/RuijrsAIUnymkYY+6BYFXlh Ri8qhG8VB+C3SpWEtsqcqNVkjJTepCD2Ej5BJTL4Gc9BSTWy7Ht6kqskEgwcnzu2 xRfkg0q0vTj1+GDd+UiTZfxiinoHtB9x3fiXali5UNTCd1fweLxdidETfO+GqMMq KDhTR+S8dXE5VG7r+iJ80LZPtHQJ94f0fh9XpQk3X2ExTG5RBxag1U2nCfiKRAZk xRv1CNAxNaBxS38CgYfHzg31NJx38fnq/qREsIdOx0Ju9WQkglBFkhLAGUb4vL0I nn8YX/oV9cW2G8tyPWjC245AouABOLbzu0xyj5KgCY/z1leA9tdLFX/ET6Zye+E= =++uZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (the wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix pv tlb flush dependencies KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 KVM: PPC: Book3S HV: Fix duplication of host SLB entries	2018-03-30 07:24:14 -10:00
Aneesh Kumar K.V	872a100a49	powerpc/mm/hash: Don't memset pgd table if not needed We need to zero-out pgd table only if we share the slab cache with pud/pmd level caches. With the support of 4PB, we don't share the slab cache anymore. Instead of removing the code completely hide it within an #ifdef. We don't need to do this with any other page table level, because they all allocate table of double the size and we take of initializing the first half corrrectly during page table zap. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Consolidate multiple #if / #ifdef into one] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:39 +11:00
Aneesh Kumar K.V	c2b4d8b741	powerpc/mm/hash64: Increase the VA range This patch increases the max virtual (effective) address value to 4PB. With 4K page size config we continue to limit ourself to 64TB. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Keep the H_PGTABLE_RANGE test, update it to work] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	f384796c40	powerpc/mm: Add support for handling > 512TB address in SLB miss For addresses above 512TB we allocate additional mmu contexts. To make it all easy, addresses above 512TB are handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Minor formatting tweaks and comment wording, switch BUG to WARN in get_ea_context().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	0dea04b288	powerpc/mm/slice: Consolidate return path in slice_get_unmapped_area() In a following patch, on finding a free area we will need to do allocatinon of extra contexts as needed. Consolidating the return path for slice_get_unmapped_area() will make that easier. Split into a separate patch to make review easy. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:37 +11:00
Aneesh Kumar K.V	1a2f778970	powerpc/mm/keys: Move pte bits to correct headers Memory keys are supported only with hash translation mode. Instead of using #ifdef in generic code move the key related pte bits to respective headers Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:36 +11:00
Frederic Barrat	16b19f1a03	powerpc/xive: Fix wrong xmon output caused by typo Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:36 +11:00
Nicholas Piggin	0bfdf59890	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently asm/barrier.h is not always included after asm/synch.h, which meant it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would be eieio when it should be lwsync. kernel/time/hrtimer.c is one case. __SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in to where it's used. Previously with my small simulator config, 377 instances of eieio in the tree. After this patch there are 55. Fixes: 46d075be585e ("powerpc: Optimise smp_wmb") Cc: stable@vger.kernel.org # v2.6.29+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:34 +11:00

... 3 4 5 6 7 ...

18248 Commits