5599 Commits

Author SHA1 Message Date
Mathieu Malaterre
2fff0f07b8 powerpc/powermac: Add missing include of header pmac.h
The header `pmac.h` was not included, leading to the following warnings,
treated as error with W=1:

  arch/powerpc/platforms/powermac/time.c:69:13: error: no previous prototype for ‘pmac_time_init’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:207:15: error: no previous prototype for ‘pmac_get_boot_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:222:6: error: no previous prototype for ‘pmac_get_rtc_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:240:5: error: no previous prototype for ‘pmac_set_rtc_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:259:12: error: no previous prototype for ‘via_calibrate_decr’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:311:13: error: no previous prototype for ‘pmac_calibrate_decr’ [-Werror=missing-prototypes]

The function `via_calibrate_decr` was made static to silence a warning.

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:36 +10:00
Mahesh Salgaonkar
cd813e1cd7 powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

  Severe Machine check interrupt [Recovered]
    NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: d00000000ca70000
  cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
      pc: c0000000009694c0: vsnprintf+0x80/0x480
      lr: c0000000009698e0: vscnprintf+0x20/0x60
      sp: c0000000fc777830
     msr: 8000000002009033
     dar: a803a30c000000d0
    current = 0xc00000000bc9ef00
    paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
      pid   = 8860, comm = insmod
  vscnprintf+0x20/0x60
  vprintk_emit+0xb4/0x4b0
  vprintk_func+0x5c/0xd0
  printk+0x38/0x4c
  init_module+0x1c0/0x338 [bork_kernel]
  do_one_initcall+0x54/0x230
  do_init_module+0x8c/0x248
  load_module+0x12b8/0x15b0
  sys_finit_module+0xa8/0x110
  system_call+0x58/0x6c
  --- Exception: c00 (System Call) at 00007fff8bda0644
  SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:34 +10:00
Rashmica Gupta
d3da701d33 powerpc/powernv: Allow memory that has been hot-removed to be hot-added
This patch allows the memory removed by memtrace to be readded to the
kernel. So now you don't have to reboot your system to add the memory
back to the kernel or to have a different amount of memory removed.

Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:31 +10:00
Benjamin Herrenschmidt
77b5f703dc powerpc/powernv/opal: Use standard interrupts property when available
For (bad) historical reasons, OPAL used to create a non-standard pair
of properties "opal-interrupts" and "opal-interrupts-names" for
representing the list of interrupts it wants Linux to request on its
behalf.

Among other issues, the opal-interrupts doesn't have a way to carry
the type of interrupts, and they were assumed to be all level
sensitive.

This is wrong on some recent systems where some of them are edge
sensitive causing warnings in the XIVE code and possible misbehaviours
if they need to be retriggered (typically the NPU2 TCE error
interrupts).

This makes Linux switch to using the standard "interrupts" and
"interrupt-names" properties instead when they are available, using
standard of_irq helpers, which can carry all the desired type
information.

Newer versions of OPAL will generate those properties in addition to
the legacy ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
 of start = 0 in opal_event_shutdown() to avoid double free warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:38 +10:00
Christophe Leroy
d6690b1a9b powerpc: Allow CPU selection of e300core variants
GCC supports -mcpu=e300c2 and -mcpu=e300c3

This patch gives the opportunity to tune kernel to one of
those two types.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:37 +10:00
Christophe Leroy
0e00a8c9fd powerpc: Allow CPU selection also on PPC32
This patch extends to PPC32 the capability to select the exact
CPU type.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:37 +10:00
Christophe Leroy
cc62d20ce4 powerpc: Make CPU selection logic generic in Makefile
At the time being, when adding a new CPU for selection, both
Kconfig.cputype and Makefile have to be modified.

This patch moves into Kconfig.cputype the name of the CPU to me
passed to the -mcpu= argument.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the option to TARGET_CPU to echo the gcc documentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:36 +10:00
Rodrigo R. Galvao
badf436f6f powerpc/Makefiles: Convert ifeq to ifdef where possible
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.

Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:36 +10:00
zhong jiang
81d7b08b3c powerpc/powermac: of_node_put() is not needed after iterator
for_each_node_by_name() iterators only exit normally when the loop
cursor is NULL, So there is no need to call of_node_put().

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
Haren Myneni
656ecc16e8 crypto/nx: Initialize 842 high and normal RxFIFO control registers
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.

VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
Haren Myneni
6e708000ec powerpc/powernv: Export opal_check_token symbol
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:33 +10:00
Randy Dunlap
f5daf77a55 powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning
Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().

../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
 early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration

and
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:33 +10:00
Darren Stevens
e13606d732 powerpc/pasemi: Use pr_err/pr_warn... for kernel messages
Pasemi code still uses printk(KERN_ERR/KERN_WARN ... change these to
pr_err(, pr_warn(... to match other powerpc arch code.

No functional changes.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Unsplit some strings while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:31 +10:00
Michael Ellerman
99d54754d3 powerpc/powernv: Query firmware for count cache flush settings
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:27 +10:00
Michael Ellerman
ba72dc1719 powerpc/pseries: Query hypervisor for count cache flush settings
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:26 +10:00
Michael Ellerman
af375eefbf powerpc/64: Call setup_barrier_nospec() from setup_arch()
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:23 +10:00
Darren Stevens
250a93501d powerpc/pasemi: Search for PCI root bus by compatible property
Pasemi arch code finds the root of the PCI-e bus by searching the
device-tree for a node called 'pxp'. But the root bus has a compatible
property of 'pasemi,rootbus' so search for that instead.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:31 +10:00
Mahesh Salgaonkar
94675cceac powerpc/pseries: Defer the logging of rtas error to irq work queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.

Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:29 +10:00
Mahesh Salgaonkar
74e96bf44f powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX.
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.

Fixes: d368514c3097 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Benjamin Herrenschmidt
e27e0a9465 powerpc/xive: Remove xive_kexec_teardown_cpu()
It's identical to xive_teardown_cpu() so just use the latter

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Reza Arbab
9eab9901b0 powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.

The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.

What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.

There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.

Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:09:01 +10:00
Guenter Roeck
6e0495c2e8 powerpc/4xx: Fix error return path in ppc4xx_msi_probe()
An arbitrary error in ppc4xx_msi_probe() quite likely results in a
crash similar to the following, seen after dma_alloc_coherent()
returned an error.

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc001bff0
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE Canyonlands
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper Tainted: G        W
  4.18.0-rc6-00010-gff33d1030a6c #1
  NIP:  c001bff0 LR: c001c418 CTR: c01faa7c
  REGS: cf82db40 TRAP: 0300   Tainted: G        W
  (4.18.0-rc6-00010-gff33d1030a6c)
  MSR:  00029000 <CE,EE,ME>  CR: 28002024  XER: 00000000
  DEAR: 00000000 ESR: 00000000
  GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
  GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
  GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
  GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
  NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
  LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
  Call Trace:
  [cf82dbf0] [00029000] 0x29000 (unreliable)
  [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
  [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
  [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
  [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
  [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
  [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
  [cf82dd40] [c02050c8] device_add+0x404/0x5c4
  [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
  [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
  [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
  [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
  [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
  [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
  [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
  [cf82df30] [c0002600] kernel_init+0x18/0x104
  [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
  Instruction dump:
  90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
  409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
  ---[ end trace 8cf551077ecfc42a ]---

Fix it up. Specifically,

- Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean
  up after itself, and only access hardware after all possible error
  conditions have been handled.
- Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe()

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:08:20 +10:00
Nicholas Piggin
3127692deb powernv/cpuidle: Fix idle states all being marked invalid
Commit 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into
global structure") parses dt idle states into structs, but never marks
them valid. This results in all idle states being lost.

Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 18:30:33 +10:00
Sam Bobroff
b87b9cf493 powerpc/pseries: fix EEH recovery of some IOV devices
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)

Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not.  This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.

Correct this by zeroing the resource flags for IOV BARs when they
can't be configured (this is the same method used by sriov_init() and
__pci_read_base()).

VFs cleared this way can't be enabled later, because that requires
another device tree property, "ibm,number-of-configurable-vfs" as well
as support for the RTAS function "ibm_map_pes". These are all part of
hypervisor support for IOV and it seems unlikely that a hypervisor
would ever partially, but not fully, support it. (None are currently
provided by QEMU/KVM.)

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:46 +10:00
Shilpasri G Bhat
04baaf28f4 powerpc/powernv: Add support to enable sensor groups
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:45 +10:00
Akshay Adiga
9c7b185ab2 powernv/cpuidle: Parse dt idle properties into global structure
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Christophe Leroy
45ef5992e0 powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:20 +10:00
Christophe Leroy
2c86cd188f powerpc: clean inclusions of asm/feature-fixups.h
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy
5c35a02c54 powerpc: clean the inclusion of stringify.h
Only include linux/stringify.h is files using __stringify()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy
ec0c464cdb powerpc: move ASM_CONST and stringify_in_c() into asm-const.h
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:16 +10:00
Nicholas Piggin
17cc1dd492 powerpc/powernv: implement opal_put_chars_atomic
The RAW console does not need writes to be atomic, so relax
opal_put_chars to be able to do partial writes, and implement an
_atomic variant which does not take a spinlock. This API is used
in xmon, so the less locking that is used, the better chance there
is that a crash can be debugged.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:57 +10:00
Nicholas Piggin
ac4ac788fd powerpc/powernv: move opal console flushing to udbg
OPAL console writes do not have to synchronously flush firmware /
hardware buffers unless they are going through the udbg path.

Remove the unconditional flushing from opal_put_chars. Flush if
there was no space in the buffer as an optimisation (callers loop
waiting for success in that case). udbg flushing is moved to
udbg_opal_putc.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:57 +10:00
Nicholas Piggin
b74d2807ae powerpc/powernv: Remove OPALv1 support from opal console driver
opal_put_chars deals with partial writes because in OPALv1,
opal_console_write_buffer_space did not work correctly. That firmware
is not supported.

This reworks the opal_put_chars code to no longer deal with partial
writes by turning them into full writes. Partial write handling is still
supported in terms of what gets returned to the caller, but it may not
go to the console atomically. A warning message is printed in this
case.

This allows console flushing to be moved out of the opal_write_lock
spinlock. That could cause the lock to be held for long periods if the
console is busy (especially if it was being spammed by firmware),
which is dangerous because the lock is taken by xmon to debug the
system. Flushing outside the lock improves the situation a bit.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin
d2a2262e68 powerpc/powernv: Implement and use opal_flush_console
A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff69c55e
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.

The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.

So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin
e00da0f2db powerpc/powernv: opal-kmsg use flush fallback from console code
Use the more refined and tested event polling loop from opal_put_chars
as the fallback console flush in the opal-kmsg path. This loop is used
by the console driver today, whereas the opal-kmsg fallback is not
likely to have been used for years.

Use WARN_ONCE rather than a printk when the fallback is invoked to
prepare for moving the console flush into a common function.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin
3a80bfc7ea powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling
OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
so implement the standard OPAL_BUSY handling for it.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:55 +10:00
Nicholas Piggin
36d2dabc87 powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops
The OPAL console driver does not delay in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

It can't yet be made to sleep because it is called under spinlock,
but it can be changed to the standard OPAL_BUSY loop form, and a
delay added to keep it from hitting the firmware too frequently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:55 +10:00
Nicholas Piggin
bd90284cc6 powerpc/powernv: opal_put_chars partial write fix
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.

This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.

Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:54 +10:00
Mukesh Ojha
b29336c0e1 powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler
Fixes: 8034f715f ("powernv/opal-dump: Convert to irq domain")

Converts all the return explicit number to a more proper IRQ_HANDLED,
which looks proper incase of interrupt handler returning case.

Here, It also removes error message like "nobody cared" which was
getting unveiled while returning -1 or 0 from handler.

Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:24 +10:00
Mukesh Ojha
a5bbe8fd29 powerpc/powernv/opal-dump : Handles opal_dump_info properly
Moves the return value check of 'opal_dump_info' to a proper place which
was previously unnecessarily filling all the dump info even on failure.

Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:23 +10:00
Aneesh Kumar K.V
ca42d8d2d6 powerpc/pseries/mm: Improve error reporting on HCALL failures
This patch adds error reporting to H_ENTER and H_READ hcalls. A
failure for both these hcalls are mostly fatal and it would be good to
log the failure reason.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:19 +10:00
Aneesh Kumar K.V
65471d763e powerpc/pseries: Use pr_xxx() in lpar.c
Switch from printk to pr_fmt() / pr_xxx().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:19 +10:00
Alistair Popple
99c3ce33a0 powerpc/powernv/npu: Add a debugfs setting to change ATSD threshold
The threshold at which it becomes more efficient to coalesce a range
of ATSDs into a single per-PID ATSD is currently not well understood
due to a lack of real-world work loads. This patch adds a debugfs
parameter allowing the threshold to be altered at runtime in order to
aid future development and refinement of the value.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-19 21:58:10 +10:00
Randy Dunlap
a8bf9e504a chrp/nvram.c: add MODULE_LICENSE()
Add MODULE_LICENSE() to the chrp nvram.c driver to fix the build
warning message:

WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/chrp/nvram.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-19 14:38:46 +10:00
Michael Ellerman
ce57c6610c Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the KVM tree.

I manually propagated the change from commit d3d4ffaae439
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into
pci-ioda-tce.c.

Conflicts:
        arch/powerpc/include/asm/cputable.h
        arch/powerpc/platforms/powernv/pci-ioda.c
        arch/powerpc/platforms/powernv/pci.h
2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy
a68bd1267b powerpc/powernv/ioda: Allocate indirect TCE levels on demand
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000

the table to cover that all with 64K pages takes:
(((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy
9bc98c8a43 powerpc/powernv: Rework TCE level allocation
This moves actual pages allocation to a separate function which is going
to be reused later in on-demand TCE allocation.

While we are at it, remove unnecessary level size round up as the caller
does this already.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy
090bad39b2 powerpc/powernv: Add indirect levels to it_userspace
We want to support sparse memory and therefore huge chunks of DMA windows
do not need to be mapped. If a DMA window big enough to require 2 or more
indirect levels, and a DMA window is used to map all RAM (which is
a default case for 64bit window), we can actually save some memory by
not allocation TCE for regions which we are not going to map anyway.

The hardware tables alreary support indirect levels but we also keep
host-physical-to-userspace translation array which is allocated by
vmalloc() and is a flat array which might use quite some memory.

This converts it_userspace from vmalloc'ed array to a multi level table.

As the format becomes platform dependend, this replaces the direct access
to it_usespace with a iommu_table_ops::useraddrptr hook which returns
a pointer to the userspace copy of a TCE; future extension will return
NULL if the level was not allocated.

This should not change non-KVM handling of TCE tables and it_userspace
will not be allocated for non-KVM tables.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy
191c22879f powerpc/powernv: Move TCE manupulation code to its own file
Right now we have allocation code in pci-ioda.c and traversing code in
pci.c, let's keep them toghether. However both files are big enough
already so let's move this business to a new file.

While we at it, move the code which links IOMMU table groups to
IOMMU tables as it is not specific to any PNV PHB model.

These puts exported symbols from the new file together.

This fixes several warnings from checkpatch.pl like this:
"WARNING: Prefer 'unsigned int' to bare use of 'unsigned'".

As this is almost cut-n-paste, there should be no behavioral change.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:07 +10:00
Alexey Kardashevskiy
da2bb0da73 powerpc/powernv: Remove useless wrapper
This gets rid of a useless wrapper around
pnv_pci_ioda2_table_free_pages().

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:47:02 +10:00