16832 Commits

Author SHA1 Message Date
Paul Mackerras
3d3efb68c1 KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
POWER9 DD1 has an erratum where writing to the TBU40 register, which
is used to apply an offset to the timebase, can cause the timebase to
lose counts.  This results in the timebase on some CPUs getting out of
sync with other CPUs, which then results in misbehaviour of the
timekeeping code.

To work around the problem, we make KVM ignore the timebase offset for
all guests on POWER9 DD1 machines.  This means that live migration
cannot be supported on POWER9 DD1 machines.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 16:04:57 +10:00
Paul Mackerras
7ceaa6dcd8 KVM: PPC: Book3S HV: Save/restore host values of debug registers
At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 11:53:19 +10:00
David S. Miller
0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Benjamin Herrenschmidt
25642705b2 powerpc/xive: Fix offset for store EOI MMIOs
Architecturally we should apply a 0x400 offset for these. Not doing
it will break future HW implementations.

The offset of 0 is supposed to remain for "triggers" though not all
sources support both trigger and store EOI, and in P9 specifically,
some sources will treat 0 as a store EOI. But future chips will not.
So this makes us use the properly architected offset which should work
always.

Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 23:29:39 +10:00
Nicholas Piggin
07d2a628bc powerpc/64s: Avoid cpabort in context switch when possible
The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.

POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.

This increases context switch performance by about 5% on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
9145effd62 powerpc/64: Drop explicit hwsync in context switch
The sync (aka. hwsync, aka. heavyweight sync) in the context switch
code to prevent MMIO access being reordered from the point of view of
a single process if it gets migrated to a different CPU is not
required because there is an hwsync performed earlier in the context
switch path.

Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.

This improves context switch performance by 2-3% on POWER8.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
837e72f78a powerpc/64: Drop reservation-clearing ldarx in context switch
There is no need to explicitly break the reservation in _switch,
because we are guaranteed that the context switch path will include a
larx/stcx.

Comment the guarantee and remove the reservation clear from _switch.

This is worth 1-2% in context switch performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
e4c0fc5f72 powerpc/64s: Leave interrupts hard enabled in context switch for radix
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.

Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.

This is worth 1-2% in context switch performance on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
bc4f65e4cf powerpc/64: Avoid restore_math call if possible in syscall exit
The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.

So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.

For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore

POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
acd7d8cef0 powerpc/64s: Optimize hypercall/syscall entry
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%) on POWER8. This is due significantly to the scratch SPR
used by the hypercall check.

It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads. This brings getppid to 320 cycles
(+4%).

Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.

POWER9 syscall is improved by about the same amount, hcall not tested.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Michael Ellerman
9abcc981de powerpc/mm/radix: Only add X for pages overlapping kernel text
Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
should check if the page overlaps the kernel text and only then add
PAGE_KERNEL_X.

Note that we still use 1G pages if they're available, so this will
typically still result in a 1G executable page at KERNELBASE. So this fix is
primarily useful for catching stray branches to high linear mapping addresses.

Without this patch, we can execute at 1G in xmon using:

  0:mon> m c000000040000000
  c000000040000000  00 l
  c000000040000000  00000000 01006038
  c000000040000004  00000000 2000804e
  c000000040000008  00000000 x
  0:mon> di c000000040000000
  c000000040000000  38600001      li      r3,1
  c000000040000004  4e800020      blr
  0:mon> p c000000040000000
  return value is 0x1

After we get a 400 as expected:

  0:mon> p c000000040000000
  *** 400 exception occurred

Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
2017-06-15 16:34:39 +10:00
Michael Ellerman
0edc2ca9cc Revert "powerpc: Handle simultaneous interrupts at once"
This reverts commit 45cb08f4791ce6a15c54598b4cb73db4b4b8294f.

For some reason this is causing IRQ problems on Freescale Book3E
machines, eg on my p5020ds:

  irq 25: nobody cared (try booting with the "irqpoll" option)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4791c #624
  Call Trace:
  [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
  [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
  [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
  [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
  [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
  [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
  [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
  [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
  [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
  [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
  [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
  --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
      LR = .fsl_elbc_run_command+0xc8/0x23c
  [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
  [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
  [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
  [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
  [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
  [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
  [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
  [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
  [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
  [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
  [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
  [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
  [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
  [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
  [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
  handlers:
  [<c000000000ed85c8>] .fsl_lbc_ctrl_irq
  Disabling IRQ #25

Ben also had concerns with the implementation being potentially slow on
some PICs, so revert it for now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:20:46 +10:00
Paul Mackerras
46a704f840 KVM: PPC: Book3S HV: Preserve userspace HTM state properly
If userspace attempts to call the KVM_RUN ioctl when it has hardware
transactional memory (HTM) enabled, the values that it has put in the
HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
guest values.  To fix this, we detect this condition and save those
SPR values in the thread struct, and disable HTM for the task.  If
userspace goes to access those SPRs or the HTM facility in future,
a TM-unavailable interrupt will occur and the handler will reload
those SPRs and re-enable HTM.

If userspace has started a transaction and suspended it, we would
currently lose the transactional state in the guest entry path and
would almost certainly get a "TM Bad Thing" interrupt, which would
cause the host to crash.  To avoid this, we detect this case and
return from the KVM_RUN ioctl with an EINVAL error, with the KVM
exit reason set to KVM_EXIT_FAIL_ENTRY.

Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:18:17 +10:00
Paul Mackerras
4c3bb4ccd0 KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl).  We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace.  We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit.  However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory.  On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0.  On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0.  We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:17:09 +10:00
Alistair Popple
377aa6b0ef powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
Commit 4c3b89effc28 ("powerpc/powernv: Add sanity checks to
pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in
pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree
node. However not all PCIe devices have an of_node and
pnv_pci_get_npu_dev() gets indirectly called at least once for every
PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.

The same situation should not exist for pnv_pci_get_gpu_dev() as any
NPU based PCIe device requires a device-tree node.

Fixes: 4c3b89effc28 ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-14 15:23:19 +10:00
Thomas Falcon
40c9db8ad8 ibmvnic: Client-initiated failover
The IBM vNIC protocol provides support for the user to initiate
a failover from the client LPAR in case the current backing infrastructure
is deemed inadequate or in an error state.

Support for two H_VIOCTL sub-commands for vNIC devices are required
to implement this function. These commands are H_GET_SESSION_TOKEN
and H_SESSION_ERR_DETECTED.

"[H_GET_SESSION_TOKEN] is used to obtain a session token from a VNIC client
adapter.  This token is opaque to the caller and is intended to be used in
tandem with the SESSION_ERROR_DETECTED vioctl subfunction."

"[H_SESSION_ERR_DETECTED] is used to report that the currently active
backing device for a VNIC client adapter is behaving poorly, and that
the hypervisor should attempt to fail over to a different backing device,
if one is available."

To provide tools access to this functionality the vNIC driver creates a
sysfs file that, when written to, will send a request to pHyp to failover
to a different backing device.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13 12:53:35 -04:00
Kirill A. Shutemov
e585513b76 x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:50 +02:00
Paul Mackerras
ca8efa1df1 KVM: PPC: Book3S HV: Context-switch EBB registers properly
This adds code to save the values of three SPRs (special-purpose
registers) used by userspace to control event-based branches (EBBs),
which are essentially interrupts that get delivered directly to
userspace.  These registers are loaded up with guest values when
entering the guest, and their values are saved when exiting the
guest, but we were not saving the host values and restoring them
before going back to userspace.

On POWER8 this would only affect userspace programs which explicitly
request the use of EBBs and also use the KVM_RUN ioctl, since the
only source of EBBs on POWER8 is the PMU, and there is an explicit
enable bit in the PMU registers (and those PMU registers do get
properly context-switched between host and guest).  On POWER9 there
is provision for externally-generated EBBs, and these are not subject
to the control in the PMU registers.

Since these registers only affect userspace, we can save them when
we first come in from userspace and restore them before returning to
userspace, rather than saving/restoring the host values on every
guest entry/exit.  Similarly, we don't need to worry about their
values on offline secondary threads since they execute in the context
of the idle task, which never executes in userspace.

Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-13 14:12:02 +10:00
Greg Kroah-Hartman
205a1ee15d powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-12 16:18:37 +02:00
Greg Kroah-Hartman
451e3f1a74 powerpc: vio: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-12 16:18:37 +02:00
Linus Torvalds
32627645e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key subsystem fixes from James Morris:
 "Here are a bunch of fixes for Linux keyrings, including:

   - Fix up the refcount handling now that key structs use the
     refcount_t type and the refcount_t ops don't allow a 0->1
     transition.

   - Fix a potential NULL deref after error in x509_cert_parse().

   - Don't put data for the crypto algorithms to use on the stack.

   - Fix the handling of a null payload being passed to add_key().

   - Fix incorrect cleanup an uninitialised key_preparsed_payload in
     key_update().

   - Explicit sanitisation of potentially secure data before freeing.

   - Fixes for the Diffie-Helman code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  KEYS: fix refcount_inc() on zero
  KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP API
  crypto : asymmetric_keys : verify_pefile:zero memory content before freeing
  KEYS: DH: add __user annotations to keyctl_kdf_params
  KEYS: DH: ensure the KDF counter is properly aligned
  KEYS: DH: don't feed uninitialized "otherinfo" into KDF
  KEYS: DH: forbid using digest_null as the KDF hash
  KEYS: sanitize key structs before freeing
  KEYS: trusted: sanitize all key material
  KEYS: encrypted: sanitize all key material
  KEYS: user_defined: sanitize key payloads
  KEYS: sanitize add_key() and keyctl() key payloads
  KEYS: fix freeing uninitialized memory in key_update()
  KEYS: fix dereferencing NULL payload with nonzero length
  KEYS: encrypted: use constant-time HMAC comparison
  KEYS: encrypted: fix race causing incorrect HMAC calculations
  KEYS: encrypted: fix buffer overread in valid_master_desc()
  KEYS: encrypted: avoid encrypting/decrypting stack buffers
  KEYS: put keyring if install_session_keyring_to_cred() fails
  KEYS: Delete an error message for a failed memory allocation in get_derived_key()
  ...
2017-06-11 16:17:29 -07:00
Linus Torvalds
a92f63cd13 powerpc fixes for 4.12 #5
Mostly fairly minor, of note are:
  - Fix percpu allocations to be NUMA aware
  - Limit 4k page size config to 64TB virtual address space
  - Avoid needlessly restoring FP and vector registers
 
 Thanks to:
   Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic Barrat, Madhavan
   Srinivasan, Michael Bringmann, Nicholas Piggin, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZOobDAAoJEFHr6jzI4aWA/2AQALYfeTGo5Pl+c4E1wd5XpDae
 cXNxDaFFqdyNBW3OzWcDDW7w+pGvuu6SLPdki1CNosp4MprqpZurjTe9bhfYoAy1
 SQAcUgk7vra6WlbfguinwlCDeeKt53Np9ZYs4jhq/wy9bp6+T0evpdoY5NvHRr+x
 wR4y3KtP1t3PSlnJFfXq2jHg6xYcqJ14cVjWD6KPW8c8YB81WU7FkeVF5YtzinxE
 OWfuJlwzXG3JtG5q9CwwcxUiwpWSPZ7WPHovD8sCQc1pkdXx98LY5syhYy+8n2pa
 B2DbHoRwwkxQwT2z6KAGBUPtf94Fhe4bHcH24RXVeTnr1Wu1sIrkJkQ7c0OR6wN/
 nT8H+uJMpuWPNmbVk6n1LWqLem3YliVB6HwcP0944TF9P/f1GCF6YotkZJ/aXTSc
 FLA/teN0tzQyaX+DOrgTGTgmuscCOkDMRUHPmB9HoBYP3XLgeA+N9mphbxtj03d0
 ch62OqZBw6MiweEx3Ji1s7mfTr8/VaCtSx+rzz7GsQllXeo8k2TVoBewr2X1wesv
 0hB779hRYt9JvDVgmJlLyClLsE/kEy6ZPD6r7kf9DgBYwQRaHdMZDZIR6jDZjklJ
 viVEmQ91lCRx+g3q7hqgZO8VPr2EY12h7PqGgcDvkzHq0vawkxif4aBJ5AIZHpWs
 J23ZZB08M8llurG99+uA
 =KDBG
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Mostly fairly minor, of note are:

   - Fix percpu allocations to be NUMA aware

   - Limit 4k page size config to 64TB virtual address space

   - Avoid needlessly restoring FP and vector registers

  Thanks to Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic
  Barrat, Madhavan Srinivasan, Michael Bringmann, Nicholas Piggin,
  Vaibhav Jain"

* tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
  powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
  cxl: Fix error path on bad ioctl
  powerpc/perf: Fix Power9 test_adder fields
  powerpc/numa: Fix percpu allocations to be NUMA aware
  cxl: Avoid double free_irq() for psl,slice interrupts
  powerpc/kernel: Initialize load_tm on task creation
  powerpc/kernel: Fix FP and vector register restoration
  powerpc/64: Reclaim CPU_FTR_SUBCORE
  powerpc/hotplug-mem: Fix missing endian conversion of aa_index
  powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
  powerpc/spufs: Fix coredump of SPU contexts
  powerpc/64s: Add dt_cpu_ftrs boot time setup option
2017-06-09 09:44:46 -07:00
Aleksa Sarai
54ebbfb160 tty: add TIOCGPTPEER ioctl
When opening the slave end of a PTY, it is not possible for userspace to
safely ensure that /dev/pts/$num is actually a slave (in cases where the
mount namespace in which devpts was mounted is controlled by an
untrusted process). In addition, there are several unresolvable
race conditions if userspace were to attempt to detect attacks through
stat(2) and other similar methods [in addition it is not clear how
userspace could detect attacks involving FUSE].

Resolve this by providing an interface for userpace to safely open the
"peer" end of a PTY file descriptor by using the dentry cached by
devpts. Since it is not possible to have an open master PTY without
having its slave exposed in /dev/pts this interface is safe. This
interface currently does not provide a way to get the master pty (since
it is not clear whether such an interface is safe or even useful).

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Valentin Rothberg <vrothberg@suse.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 12:27:54 +02:00
Greg Kroah-Hartman
823a44ac23 powerpc: ibmebus: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 11:00:46 +02:00
Greg Kroah-Hartman
892533e191 powerpc: ps3: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Seems-ok: Geoff Levand <geoff@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 11:00:46 +02:00
Bilal Amarni
47b2c3fff4 security/keys: add CONFIG_KEYS_COMPAT to Kconfig
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.

At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.

This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.

[DH: Modified to remove arm64 compat enablement also as requested by Eric
 Biggers]

Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09 13:29:45 +10:00
Michael Ellerman
c6ee9619e2 powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
other general kernel options. It's really more at home in the "Platform
Support" section, so move it there.

Also enable it by default, for Book3s 64. It does mostly nothing unless
the device tree properties are found, and we will want it enabled
eventually in distro kernels, so turn it on to start getting more
testing.

Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:57 +10:00
Aneesh Kumar K.V
92d9dfda8b powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.

Fixes: f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:56 +10:00
Greg Kroah-Hartman
6f428096a4 driver core: remove CLASS_ATTR usage
There was only 2 remaining users of CLASS_ATTR() so let's finally get
rid of them and force everyone to use the correct RW/RO/WO versions
instead.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 11:15:22 +02:00
David S. Miller
216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Martin KaFai Lau
783d28dd11 bpf: Add jited_len to struct bpf_prog
Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:41:24 -04:00
Madhavan Srinivasan
8c218578fc powerpc/perf: Fix Power9 test_adder fields
Commit 8d911904f3ce4 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.

Fixes: 8d911904f3ce ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:21:19 +10:00
Michael Ellerman
ba4a648f12 powerpc/numa: Fix percpu allocations to be NUMA aware
In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Cc: stable@vger.kernel.org # v3.16+
Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:19:46 +10:00
Nicholas Piggin
90df4bfb4d powerpc/64s: Machine check handle ifetch from foreign real address for POWER9
The i-side 0111b machine check, which is "Instruction Fetch to foreign
address space", was missed by 7b9f71f974 ("powerpc/64s: POWER9 machine
check handler").

    The POWER9 processor core considers host real addresses with a
    nonzero value in RA(8:12) as foreign address space, accessible only
    by the copy and paste instructions. The copy and paste instruction
    pair can be used to invoke the Nest accelerators via the Virtual
    Accelerator Switchboard (VAS).

It is an error for any regular load/store or ifetch to go to a foreign
addresses. When relocation is on, this causes an MMU exception. When
relocation is off, a machine check exception. It is possible to trigger
this machine check by branching to a foreign address with MSR[IR]=0.

Fixes: 7b9f71f974a1 ("powerpc/64s: POWER9 machine check handler")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:17:15 +10:00
Breno Leitao
7f22ced437 powerpc/kernel: Initialize load_tm on task creation
Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.

This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).

Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 19:09:22 +10:00
Christophe Leroy
4386c096c2 powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:59:03 +10:00
Balbir Singh
d2485644c7 powerpc/mm/hugetlb: Add support for page accounting
Add __GFP_ACCOUNT to __hugepte_alloc()

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:12 +10:00
Balbir Singh
abd667be15 powerpc/mm/book(e)(3s)/32: Add page table accounting
Add support in pte_alloc_one() and pgd_alloc() by
passing __GFP_ACCOUNT in the flags

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:11 +10:00
Balbir Singh
de3b87611d powerpc/mm/book(e)(3s)/64: Add page table accounting
Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead

1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7 arch: x86: charge page tables to kmemcg)

The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.

NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86.  The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.

For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:10 +10:00
Balbir Singh
c5cee6421c powerpc/mm/hash: Do a local flush if possible when no batch is active
Currently in hpte_need_flush() if there is no batch pending we always do a
global TLB flush, which is inefficient if the mm has never run on another
thread.

Instead do the same check that __flush_tlb_pending() does and check if a local
flush is sufficient when batch->active is false. Instead of open-coding it we
use mm_is_thread_local().

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Don't use a local, just inline mm_is_thread_local()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:02:55 +10:00
Colin Ian King
b802ab46ba powerpc: Fix some spelling mistakes
Collation of some spelling fixes from Colin.

 Attemping   -> Attempting
 intialized  -> initialized
 missmanaged -> mismanaged

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 16:50:15 +10:00
Breno Leitao
1195892c09 powerpc/kernel: Fix FP and vector register restoration
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).

These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.

Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 15:55:30 +10:00
Radim Krčmář
2fa6e1e12a KVM: add kvm_request_pending
A first step in vcpu->requests encapsulation.  Additionally, we now
use READ_ONCE() when accessing vcpu->requests, which ensures we
always load vcpu->requests when it's accessed.  This is important as
other threads can change it any time.  Also, READ_ONCE() documents
that vcpu->requests is used with other threads, likely requiring
memory barriers, which it does.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[ Documented the new use of READ_ONCE() and converted another check
  in arch/mips/kvm/vz.c ]
Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:00 +02:00
Andrew Jones
2387149ead KVM: improve arch vcpu request defining
Marc Zyngier suggested that we define the arch specific VCPU request
base, rather than requiring each arch to remember to start from 8.
That suggestion, along with Radim Krcmar's recent VCPU request flag
addition, snowballed into defining something of an arch VCPU request
defining API.

No functional change.

(Looks like x86 is running out of arch VCPU request bits.  Maybe
 someday we'll need to extend to 64.)

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:00 +02:00
Matt Brown
f718d426d7 powerpc/lib/xor_vmx: Ensure no altivec code executes before enable_kernel_altivec()
The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled.

The code uses enable_kernel_altivec() around the core of the algorithm, however
the whole file is built with -maltivec, so the compiler is within its rights to
generate altivec code anywhere. This has been seen at least once in the wild:

  0:mon> di $xor_altivec_2
  c0000000000b97d0  3c4c01d9	addis   r2,r12,473
  c0000000000b97d4  3842db30	addi    r2,r2,-9424
  c0000000000b97d8  7c0802a6	mflr    r0
  c0000000000b97dc  f8010010	std     r0,16(r1)
  c0000000000b97e0  60000000	nop
  c0000000000b97e4  7c0802a6	mflr    r0
  c0000000000b97e8  faa1ffa8	std     r21,-88(r1)
  ...
  c0000000000b981c  f821ff41	stdu    r1,-192(r1)
  c0000000000b9820  7f8101ce	stvx    v28,r1,r0		<-- POP
  c0000000000b9824  38000030	li      r0,48
  c0000000000b9828  7fa101ce	stvx    v29,r1,r0
  ...
  c0000000000b984c  4bf6a06d	bl      c0000000000238b8 # enable_kernel_altivec

This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be executed
outside of the enable/disable block.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Rework change log and include disassembly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:17:52 +10:00
Hari Bathini
48a316e350 powerpc/fadump: Set an upper limit for boot memory size
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.

Memory Reservation during first kernel looks like below:

  Low memory                                        Top of memory
  0      boot memory size                                       |
  |           |                       |<--Reserved dump area -->|
  V           V                       |   Permanent Reservation V
  +-----------+----------/ /----------+---+----+-----------+----+
  |           |                       |CPU|HPTE|  DUMP     |ELF |
  +-----------+----------/ /----------+---+----+-----------+----+
        |                                           ^
        |                                           |
        \                                           /
         -------------------------------------------
          Boot memory content gets transferred to
          reserved area by firmware at the time of
          crash

This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:50 +10:00
Hari Bathini
e7467dc694 powerpc/fadump: Update comment about offset where fadump is reserved
With commit f6e6bedb7731 ("powerpc/fadump: Reserve memory at an offset
closer to bottom of RAM"), memory for fadump is no longer reserved at
the top of RAM. But there are still a few places which say so. Change
them appropriately.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:49 +10:00
Hari Bathini
81d9eca502 powerpc/fadump: Add a warning when 'fadump_reserve_mem=' is used
With commit 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter
for fadump memory reservation"), 'fadump_reserve_mem=' parameter is
deprecated in favor of 'crashkernel=' parameter. Add a warning if
'fadump_reserve_mem=' is still used.

Fixes: 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Suggested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Unsplit long printk strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:35 +10:00
Michal Suchanek
98b8cd7f75 powerpc/fadump: Return error when fadump registration fails
- log an error message when registration fails and no error code listed
   in the switch is returned
 - translate the hv error code to posix error code and return it from
   fw_register
 - return the posix error code from fw_register to the process writing
   to sysfs
 - return EEXIST on re-registration
 - return success on deregistration when fadump is not registered
 - return ENODEV when no memory is reserved for fadump

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Tested-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Use pr_err() to shrink the error print]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:57 +10:00
Christophe Leroy
f782ddf297 powerpc: Remove __ilog2()s and use generic ones
With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.

The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.

This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()

For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
	return __ilog2(x);
}

int test__ilog2_u32(u32 n)
{
	return __ilog2_u32(n);
}

int test__ilog2_u64(u64 n)
{
	return __ilog2_u64(n);
}

On PPC32 before the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC32 after the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:56 +10:00