linux

iv/linux

Author	SHA1	Message	Date
Naveen N. Rao	1cabd2f8f7	powerpc/kprobes: Factor out code to emulate instruction into a helper Factor out code to emulate instruction into a try_to_emulate() helper function. This makes no functional changes. Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:56 +10:00
Naveen N. Rao	a64e3f35a4	powerpc/kretprobes: Override default function entry offset With ABIv2, we offset 8 bytes into a function to get at the local entry point. mpe: NB this function is currently not called, the change to generic code to call it is being merged via the tip tree. Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:55 +10:00
Naveen N. Rao	290e307076	powerpc/kprobes: Fix handling of function offsets on ABIv2 commit 239aeba76409 ("perf powerpc: Fix kprobe and kretprobe handling with kallsyms on ppc64le") changed how we use the offset field in struct kprobe on ABIv2. perf now offsets from the global entry point if an offset is specified and otherwise chooses the local entry point. Fix the same in kernel for kprobe API users. We do this by extending kprobe_lookup_name() to accept an additional parameter to indicate the offset specified with the kprobe registration. If offset is 0, we return the local function entry and return the global entry point otherwise. With: # cd /sys/kernel/debug/tracing/ # echo "p _do_fork" >> kprobe_events # echo "p _do_fork+0x10" >> kprobe_events before this patch: # cat ../kprobes/list c0000000000d0748 k _do_fork+0x8 [DISABLED] c0000000000d0758 k _do_fork+0x18 [DISABLED] c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED] and after: # cat ../kprobes/list c0000000000d04c8 k _do_fork+0x8 [DISABLED] c0000000000d04d0 k _do_fork+0x10 [DISABLED] c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED] Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:55 +10:00
Naveen N. Rao	49e0b4658f	kprobes: Convert kprobe_lookup_name() to a function The macro is now pretty long and ugly on powerpc. In the light of further changes needed here, convert it to a __weak variant to be over-ridden with a nicer looking function. Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:54 +10:00
Nicholas Piggin	a050d20d02	powerpc/64s: Use relon prolog for EXC_VIRT_OOL_MASKABLE_HV handlers Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the real mode interrupt prolog. This means we needlessly rfid from virtual mode to virtual mode. For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between threads with snooze disabled (which causes IPI) gets about 3% faster, about 370 cycles. Should be more important on POWER9 with global doorbells and HVI for host interrupts. Use the RELON variant instead to reduce overhead. Fixes: 1707dd1613 ("powerpc: Save CFAR before branching in interrupt entry paths") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold some more detail into the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 17:05:11 +10:00
Michael Ellerman	686978b15c	powerpc/xive: Fix missing check of rc != OPAL_BUSY Dan Carpenter noticed that the code in __xive_native_disable_queue() has a for loop with an unconditional break in the middle, which doesn't make a lot of sense. What the code's supposed to do is loop as long as OPAL says it's busy, if we get any other return code, either success or failure, then we should break the loop. So add the missing check. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 14:43:19 +10:00
Thomas Huth	feafd13c96	KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs According to the PowerISA 2.07, mtspr and mfspr should not always generate an illegal instruction exception when being used with an undefined SPR, but rather treat the instruction as a NOP or inject a privilege exception in some cases, too - depending on the SPR number. Also turn the printk here into a ratelimited print statement, so that the guest can not flood the dmesg log of the host by issueing lots of illegal mtspr/mfspr instruction here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:32 +10:00
Alexey Kardashevskiy	121f80ba68	KVM: PPC: VFIO: Add in-kernel acceleration for VFIO This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:26 +10:00
Alexey Kardashevskiy	b1af23d836	KVM: PPC: iommu: Unify TCE checking This reworks helpers for checking TCE update parameters in way they can be used in KVM. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:21 +10:00
Alexey Kardashevskiy	da6f59e192	KVM: PPC: Use preregistered memory API to access TCE list VFIO on sPAPR already implements guest memory pre-registration when the entire guest RAM gets pinned. This can be used to translate the physical address of a guest page containing the TCE list from H_PUT_TCE_INDIRECT. This makes use of the pre-registrered memory API to access TCE list pages in order to avoid unnecessary locking on the KVM memory reverse map as we know that all of guest memory is pinned and we have a flat array mapping GPA to HPA which makes it simpler and quicker to index into that array (even with looking up the kernel page tables in vmalloc_to_phys) than it is to find the memslot, lock the rmap entry, look up the user page tables, and unlock the rmap entry. Note that the rmap pointer is initialized to NULL where declared (not in this patch). If a requested chunk of memory has not been preregistered, this will fall back to non-preregistered case and lock rmap. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:16 +10:00
Alexey Kardashevskiy	503bfcbe18	KVM: PPC: Pass kvm* to kvmppc_find_table() The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm* there. This will be used in the following patches where we will be attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than to VCPU). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:12 +10:00
Alexey Kardashevskiy	e91aa8e6ec	KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently It does not make much sense to have KVM in book3s-64 and not to have IOMMU bits for PCI pass through support as it costs little and allows VFIO to function on book3s KVM. Having IOMMU_API always enabled makes it unnecessary to have a lot of "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those ifdef's we could have only user space emulated devices accelerated (but not VFIO) which do not seem to be very useful. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:07 +10:00
Paul Mackerras	644d2d6fef	Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next This merges in the commits in the topic/ppc-kvm branch of the powerpc tree to get the changes to arch/powerpc which subsequent patches will rely on. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:33 +10:00
Alexey Kardashevskiy	3762d45aa7	KVM: PPC: Align the table size to system page size At the moment the userspace can request a table smaller than a page size and this value will be stored as kvmppc_spapr_tce_table::size. However the actual allocated size will still be aligned to the system page size as alloc_page() is used there. This aligns the table size up to the system page size. It should not change the existing behaviour but when in-kernel TCE acceleration patchset reaches the upstream kernel, this will allow small TCE tables be accelerated as well: PCI IODA iommu_table allocator already aligns the size and, without this patch, an IOMMU group won't attach to LIOBN due to the mismatching table size. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:20 +10:00
Alexey Kardashevskiy	96df226769	KVM: PPC: Book3S PR: Preserve storage control bits PR KVM page fault handler performs eaddr to pte translation for a guest, however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits (storage control) in the kvmppc_pte struct. If PR KVM is running as a second level guest under HV KVM, and PR KVM tries inserting HPT entry, this fails in HV KVM if it already has this mapping. This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and kvmppc_mmu_map_page(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:14 +10:00
Alexey Kardashevskiy	bd9166ffe6	KVM: PPC: Book3S PR: Exit KVM on failed mapping At the moment kvmppc_mmu_map_page() returns -1 if mmu_hash_ops.hpte_insert() fails for any reason so the page fault handler resumes the guest and it faults on the same address again. This adds distinction to kvmppc_mmu_map_page() to return -EIO if mmu_hash_ops.hpte_insert() failed for a reason other than full pteg. At the moment only pSeries_lpar_hpte_insert() returns -2 if plpar_pte_enter() failed with a code other than H_PTEG_FULL. Other mmu_hash_ops.hpte_insert() instances can only fail with -1 "full pteg". With this change, if PR KVM fails to update HPT, it can signal the userspace about this instead of returning to guest and having the very same page fault over and over again. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:04 +10:00
Alexey Kardashevskiy	9eecec126e	KVM: PPC: Book3S PR: Get rid of unused local variable @is_mmio has never been used since introduction in commit 2f4cf5e42d13 ("Add book3s.c") from 2009. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:00 +10:00
Markus Elfring	37655490db	KVM: PPC: e500: Use kcalloc() in e500_mmu_host_init() * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:55 +10:00
Markus Elfring	a1c52e1c7c	KVM: PPC: Book3S HV: Use common error handling code in kvmppc_clr_passthru_irq() Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:50 +10:00
Paul Mackerras	9b5ab00513	KVM: PPC: Add MMIO emulation for remaining floating-point instructions For completeness, this adds emulation of the lfiwax and lfiwzx instructions. With this, all floating-point load and store instructions as of Power ISA V2.07 are emulated. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:44 +10:00
Paul Mackerras	ceba57df43	KVM: PPC: Emulation for more integer loads and stores This adds emulation for the following integer loads and stores, thus enabling them to be used in a guest for accessing emulated MMIO locations. - lhaux - lwaux - lwzux - ldu - lwa - stdux - stwux - stdu - ldbrx - stdbrx Previously, most of these would cause an emulation failure exit to userspace, though ldu and lwa got treated incorrectly as ld, and stdu got treated incorrectly as std. This also tidies up some of the formatting and updates the comment listing instructions that still need to be implemented. With this, all integer loads and stores that are defined in the Power ISA v2.07 are emulated, except for those that are permitted to trap when used on cache-inhibited or write-through mappings (and which do in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx, lq/stq, and l[bhwdq]arx/st[bhwdq]cx. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:38 +10:00
Alexey Kardashevskiy	91242fd1a3	KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed) This adds missing stdx emulation for emulated MMIO accesses by KVM guests. This allows the Mellanox mlx5_core driver from recent kernels to work when MMIO emulation is enforced by userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:33 +10:00
Bin Lu	6f63e81bda	KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions This patch provides the MMIO load/store emulation for instructions of 'double & vector unsigned char & vector signed char & vector unsigned short & vector signed short & vector unsigned int & vector signed int & vector double '. The instructions that this adds emulation for are: - ldx, ldux, lwax, - lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux, - stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx, - lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx, - stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x [paulus@ozlabs.org - some cleanups, fixes and rework, make it compile for Book E, fix build when PR KVM is built in] Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:36:41 +10:00
Paul Mackerras	307d927967	KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts This provides functions that can be used for generating interrupts indicating that a given functional unit (floating point, vector, or VSX) is unavailable. These functions will be used in instruction emulation code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 10:39:50 +10:00
Dan Williams	60fcd55cc2	axon_ram: add dax_operations support Setup a dax_device to have the same lifetime as the axon_ram block device and add a ->direct_access() method that is equivalent to axon_ram_direct_access(). Once fs/dax.c has been converted to use dax_operations the old axon_ram_direct_access() will be removed. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-19 15:14:36 -07:00
Yongji Xie	3827463769	powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Override pcibios_default_alignment() to set default alignment to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page BARs would not share a page and could be mapped into guest when VFIO passthrough them. Signed-off-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-19 12:51:26 -05:00
Nicholas Piggin	ca80d5d0a8	powerpc/64s: Remove SAO feature from Power9 DD1 Power9 DD1 does not implement SAO. Although it's not widely used, its presence or absence is visible to user space via arch_validate_prot() so it's moderately important that we get the value right. Fixes: 7dccfbc325bb ("powerpc/book3s: Add a cpu table entry for different POWER9 revs") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:48:25 +10:00
Nicholas Piggin	2384d2d7ad	powerpc/64s: Remove ICSWX feature from Power9 Power9 does not implement the icswx instruction. This CPU feature is not visible to userspace and is only used in the CONFIG_PPC_ICSWX code, which is generally not enabled, and can only be triggered by other code using icswx, which should not happen on Power9 systems in the first place. So impact should be minimal. Fixes: c3ab300ea5 ("powerpc: Add POWER9 cputable entry") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:21:50 +10:00
Madhavan Srinivasan	f2080b9ac3	powerpc/perf: Add Power8 mem_access event to sysfs Patch add "mem_access" event to sysfs. This as-is not a raw event supported by Power8 pmu. Instead, it is formed based on raw event encoding specificed in isa207-common.h. Primary PMU event used here is PM_MRK_INST_CMPL. This event tracks only the completed marked instructions. Random sampling mode (MMCRA[SM]) with Random Instruction Sampling (RIS) is enabled to mark type of instructions. With Random sampling in RLS mode with PM_MRK_INST_CMPL event, the LDST /DATA_SRC fields in SIER identifies the memory hierarchy level (eg: L1, L2 etc) statisfied a data-cache miss for a marked instruction. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:23 +10:00
Madhavan Srinivasan	d148c94c27	powerpc/perf: Support to export SIERs bit in Power9 Patch to export SIER bits to userspace via perf_mem_data_src and perf_sample_data struct. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:23 +10:00
Madhavan Srinivasan	453ce7a943	powerpc/perf: Support to export SIERs bit in Power8 Patch to export SIER bits to userspace via perf_mem_data_src and perf_sample_data struct. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:22 +10:00
Madhavan Srinivasan	170a315f41	powerpc/perf: Support to export MMCRA[TEC*] field to userspace Threshold feature when used with MMCRA [Threshold Event Counter Event], MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event Counter Multiplier] with the corresponding threshold event count values. Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of struct perf_sample_data. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:22 +10:00
Madhavan Srinivasan	79e96f8f93	powerpc/perf: Export memory hierarchy info to user space The LDST field and DATA_SRC in SIER identifies the memory hierarchy level (eg: L1, L2 etc), from which a data-cache miss for a marked instruction was satisfied. Use the 'perf_mem_data_src' object to export this hierarchy level to user space. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:21 +10:00
Alexey Kardashevskiy	e889e96e98	powerpc/iommu: Do not call PageTransHuge() on tail pages The CMA pages migration code does not support compound pages at the moment so it performs few tests before proceeding to actual page migration. One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as it is designed to be called on head pages only. Since we also test for PageCompound(), and it contains PageTail() and PageHead(), we can simplify the check by leaving just PageCompound() and therefore avoid possible VM_BUG_ON_PAGE. Fixes: 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:20 +10:00
Aneesh Kumar K.V	321f7d29e5	powerpc/mmap: Any hint > 128TB searches the full VA space As part of the new large address space support, processes start out life with a 128TB virtual address space. However when calling mmap() a process can pass a hint address, and if that hint is > 128TB the kernel will use the full 512TB address space to try and satisfy the mmap() request. Currently we have a check that the hint is > 128TB and < 512TB (TASK_SIZE), which was added as an optimisation to avoid updating addr_limit unnecessarily and also to avoid calling slice_flush_segments() on all CPUs more than necessary. However this has the user-visible side effect that an mmap() hint above 512TB does not search the full address space unless a preceding mmap() used a hint value > 128TB && < 512TB. So fix it to treat any hint above 128TB as a hint to search the full address space, instead of checking the hint against TASK_SIZE, we instead check if the addr_limit is already == TASK_SIZE. This also brings the ABI in-line with what is proposed on x86. ie, that a hint address above 128TB up to and including (2^64)-1 is an indication to search the full address space. Fixes: f4ea6dcb08ea2c (powerpc/mm: Enable mappings above 128TB) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:19 +10:00
Nicholas Piggin	95dbdf4fa0	powerpc/64s: Minor fix for MCE TLB flush for radix The TLB flush for radix first flushes TLB for radix configuration, then flushes for hash configuration. The second flush is unnecessary but does not affect correctness. Fixes: 1a472c9dba6b9 ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:18 +10:00
Aneesh Kumar K.V	be77e999e3	powerpc/mm/radix: Use mm->task_size for boundary checking instead of addr_limit We don't init addr_limit correctly for 32 bit applications. So default to using mm->task_size for boundary condition checking. We use addr_limit to only control free space search. This makes sure that we do the right thing with 32 bit applications. We should consolidate the usage of TASK_SIZE/mm->task_size and mm->context.addr_limit later. This partially reverts commit fbfef9027c2a7ad (powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit). Fixes: fbfef9027c2a ("powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:18 +10:00
Nicholas Piggin	8d1b48ef58	powerpc/64s: Revert setting of LPCR[LPES] on POWER9 The XIVE enablement patches included a change to set the LPES (Logical Partitioning Environment Selector) bit (bit # 3) in LPCR (Logical Partitioning Control Register) on POWER9 hosts. This bit sets external interrupts to guest delivery mode, which uses SRR0/1. The host's EE interrupt handler is written to expect HSRR0/1 (for earlier CPUs). This should be fine because XIVE is configured not to deliver EEs to the host (Hypervisor Virtulization Interrupt is used instead) so the EE handler should never be executed. However a bug in interrupt controller code, hardware, or odd configuration of a simulator could result in the host getting an EE incorrectly. Keeping the EE delivery mode matching the host EE handler prevents strange crashes due to using the wrong exception registers. KVM will configure the LPCR to set LPES prior to running a guest so that EEs are delivered to the guest using SRR0/1. Fixes: 08a1e650cc ("powerpc: Fixup LPCR:PECE and HEIC setting on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Massage change log to avoid referring to LPES0 which is now renamed LPES] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:17 +10:00
Laura Abbott	f318dd083c	cma: Store a name in the cma structure Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it useful to have an explicit name attached to each region. Store the name in each CMA structure. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-18 20:41:12 +02:00
David Woodhouse	e854d8b2a8	PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space This is relatively esoteric, and knowing that we don't have it makes life easier in some cases rather than just an eventual -EINVAL from pci_mmap_page_range(). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:02:26 -05:00
David Woodhouse	11df19546f	PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h> We can declare it <linux/pci.h> even on platforms where it isn't going to be defined. There's no need to have it littered through the various <asm/pci.h> files. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:02:11 -05:00
David Woodhouse	ae749c7ab4	PCI: Add arch_can_pci_mmap_wc() macro Most of the almost-identical versions of pci_mmap_page_range() silently ignore the 'write_combine' argument and give uncached mappings. Yet we allow the PCIIOC_WRITE_COMBINE ioctl in /proc/bus/pci, expose the 'resourceX_wc' file in sysfs, and allow an attempted mapping to apparently succeed. To fix this, introduce a macro arch_can_pci_mmap_wc() which indicates whether the platform can do a write-combining mapping. On x86 this ends up being pat_enabled(), while the few other platforms that support it can just set it to a literal '1'. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:01:42 -05:00
Michael Ellerman	be5c5e843c	powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y Prior to commit 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode() was just a bl hmi_exception_realmode, which the linker would turn into a bl to the local entry point of hmi_exception_realmode. This was broken when CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of the kernel text that is copied down to 0x0. But in fixing that, we added a new bug on little endian kernels. Because the branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry point of hmi_exception_realmode(). The global entry point must be called with r12 containing the address of hmi_exception_realmode(), because it uses that value to calculate the TOC value (r2). This may manifest as a checkstop, because we take a junk value from r12 which came from HSRR1, add a small constant to it and then use that as the TOC pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above RAM and somewhere in MMIO space. Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the label we're branching to. This means r12 will be setup correctly on LE, fixing this bug, and r12 is also volatile across function calls on BE so it's a good choice anyway. Fixes: 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts") Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-18 20:19:52 +10:00
Ravi Bangoria	9e1ba4f27f	powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel OOPS: Bad kernel stack pointer cd93c840 at c000000000009868 Oops: Bad kernel stack pointer, sig: 6 [#1] ... GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840 ... NIP [c000000000009868] resume_kernel+0x2c/0x58 LR [c000000000006208] program_check_common+0x108/0x180 On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does not emulate actual store in emulate_step() because it may corrupt the exception frame. So the kernel does the actual store operation in exception return code i.e. resume_kernel(). resume_kernel() loads the saved stack pointer from memory using lwz, which only loads the low 32-bits of the address, causing the kernel crash. Fix this by loading the 64-bit value instead. Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> [mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-18 20:19:21 +10:00
David S. Miller	6b6cbc1471	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-15 21:16:30 -04:00
Thomas Gleixner	6d11b87d55	powerpc/smp: Replace open coded task affinity logic Init task invokes smp_ops->setup_cpu() from smp_cpus_done(). Init task can run on any online CPU at this point, but the setup_cpu() callback requires to be invoked on the boot CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. That's actually not a problem in this context as neither CPU hotplug nor affinity settings can happen, but the access to task_struct::cpus_allowed is about to restricted. Replace it with a call to work_on_cpu_safe() which achieves the same result. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.518053336@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-15 12:20:54 +02:00
Nicolai Stange	115631c350	powerpc/time: Set ->min_delta_ticks and ->max_delta_ticks In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the powerpc arch's clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2017-04-14 13:11:10 -07:00
Michael Ellerman	270e2dc9b8	powerpc/pseries: Always enable SMP when building pseries The pseries platform supports Power4 and later CPUs, all of which are multithreaded and/or multicore. In practice no one ever builds a SMP=n kernel for these machines. So as we did for powernv, have the pseries platform imply SMP=y. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:37:23 +10:00
Michael Ellerman	40e275653e	powerpc/powernv: Always enable SMP when building powernv The powernv platform supports Power7 and later CPUs, all of which are multithreaded and multicore. As such we never build a SMP=n kernel for those machines, other than possibly for debugging or running in a simulator. In the debugging case we can get a similar effect by booting with nr_cpus=1, or there's always the option of building a custom kernel with SMP hacked out. For running in simulators the code size reduction from building without SMP is not particularly important, what matters is the number of instructions executed. A quick test shows that a SMP=y kernel takes ~6% more instructions to boot to a shell. Booting with nr_cpus=1 recovers about half that deficit. On the flip side, keeping the SMP=n kernel building can be a pain at times. And although we've mostly kept it building in recent years, no one is regularly testing that the SMP=n kernel actually boots and works well on these machines. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:37:17 +10:00
Michael Ellerman	ebbe9d7d3a	powerpc: Allow platforms to force-enable CONFIG_SMP Of the 64-bit Book3S platforms, only powermac supports booting on an actual non-SMP system. The other platforms can be built with SMP disabled, but it doesn't make a lot of sense given the CPUs they support are all multicore or multithreaded. So give platforms the option of forcing SMP=y. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:36:43 +10:00

1 2 3 4 5 ...

16437 Commits