3431 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
b1923caa6e powerpc: Merge 32-bit and 64-bit setup_arch()
There is little enough differences now.

mpe: Add a/p/k/setup.h to contain the prototypes and empty versions of
functions we need, rather than using weak functions. Add a few other
empty versions to avoid as many #ifdefs as possible in the code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:17:46 +10:00
Benjamin Herrenschmidt
f2d576948d powerpc: Get rid of ppc_md.init_early()
It is now called right after platform probe, so the probe function
can just do the job.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:07:26 +10:00
Benjamin Herrenschmidt
7025776ed1 powerpc/mm: Move hash table ops to a separate structure
Moving probe_machine() to after mmu init will cause the ppc_md
fields relative to the hash table management to be overwritten.

Since we have essentially disconnected the machine type from
the hash backend ops, finish the job by moving them to a different
structure.

The only callback that didn't quite fix is update_partition_table
since this is not specific to hash, so I moved it to a standalone
variable for now. We can revisit later if needed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fix ppc64e build failure in kexec]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:59:09 +10:00
Benjamin Herrenschmidt
166dd7d3fb powerpc/64: Move MMU backend selection out of platform code
We move it into early_mmu_init() based on firmware features. For PS3,
we have to move the setting of these into early_init_devtree().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:38 +10:00
Benjamin Herrenschmidt
91b6fad5cf powerpc/pmac: Remove early allocation of the SMU command buffer
The SMU command buffer needs to be allocated below 2G using memblock.

In the past, this had to be done very early from the arch code as
memblock wasn't available past that point. That is no longer the
case though, smu_init() is called from setup_arch() when memblock
is still functional these days. So move the allocation to the
SMU driver itself.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:38 +10:00
Benjamin Herrenschmidt
d3cbff1b5a powerpc: Put exception configuration in a common place
The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:31 +10:00
Benjamin Herrenschmidt
3808a88985 powerpc: Move FW feature probing out of pseries probe()
We move the function itself to pseries/firmware.c and call it along
with almost all other flat device-tree parsers from early_init_devtree()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Move #ifdefs into the header by providing pseries_probe_fw_features()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:13 +10:00
Benjamin Herrenschmidt
c40785ad30 powerpc/dart: Use a cachable DART
Instead of punching a hole in the linear mapping, just use normal
cachable memory, and apply the flush sequence documented in the
CPC625 (aka U3) user manual.

This allows us to remove quite a bit of code related to the early
allocation of the DART and the hole in the linear mapping. We can
also get rid of the copy of the DART for suspend/resume as the
original memory can just be saved/restored now, as long as we
properly sync the caches.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Integrate dart_init() fix to return ENODEV when DART disabled]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:55:54 +10:00
Benjamin Herrenschmidt
9402c68461 powerpc: Factor do_feature_fixup calls
32 and 64-bit do a similar set of calls early on, we move it all to
a single common function to make the boot code more readable.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:51:42 +10:00
Kevin Hao
27d1149667 powerpc/32: Remove RELOCATABLE_PPC32
It is seldom used in the kernel code and can be easily replaced by
either RELOCATABLE or PPC32. So there is no reason to keep a separate
kernel option for this.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:17:07 +10:00
Aneesh Kumar K.V
a4b349540a powerpc/mm: Cleanup LPCR defines
This makes it easy to verify we are not overloading the bits.
No functionality change by this patch.

mpe: Cleanup more. Completely fixup whitespace, convert all UL values to
ASM_CONST(), and replace all occurrences of 63-x with the actual shift.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:28 +10:00
Aneesh Kumar K.V
912cc87a65 powerpc/mm/radix: Add LPID based tlb flush helpers
We add a tlb flush variant, to flush LPID mappings.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:55 +10:00
Aneesh Kumar K.V
83209bc861 powerpc/mm/radix: Update machine call back to support new HCALL.
This update the machine dep callback such that we can use the same
callback to register process table. The interface is updated such that
we can easily call H_REGISTER_PROC_TBL hcall. The HCALL itself is
introduced in a later patch.

No functionality change introduced by this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:54 +10:00
Aneesh Kumar K.V
09cf5bcb0c powerpc/mm/radix: Update PID switch sequence
Update the PID switch as per ISA doc. slbia is needed in radix to
invalidate any implementation specific lookaside information.
We use the .long format due to build errors with the below compiler
version.

gcc (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413
GNU assembler (GNU Binutils for Ubuntu) 2.26

CC      arch/powerpc/mm//mmu_context_book3s64.o
{standard input}: Assembler messages:
{standard input}:506: Error: junk at end of line: `0x7'
scripts/Makefile.build:291: recipe for target 'arch/powerpc/mm//mmu_context_book3s64.o' failed
make[1]: *** [arch/powerpc/mm//mmu_context_book3s64.o] Error 1
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:53 +10:00
Aneesh Kumar K.V
accfad7d0a powerpc/mm: Clear top 16 bits of va only on older cpus
As per ISA, we need to do this only for architecture version 2.02 and
earlier. This continued to work even for 2.07. But let's not do this for
anything after 2.02. ISA 3.0 requires these top bits to be not cleared.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:52 +10:00
Aneesh Kumar K.V
e21fc93b70 powerpc/mm: Compile out radix related functions if RADIX_MMU is disabled
Currently we depend on mmu_has_feature to evalute to zero based on
MMU_FTRS_POSSIBLE mask. In a later patch, we want to update
radix_enabled() to runtime update the conditional operation to a jump
instruction. This implies we cannot depend on MMU_FTRS_POSSIBLE mask.
Instead define radix_enabled to return 0 if RADIX_MMU is not enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:51 +10:00
Aneesh Kumar K.V
66c570f545 powerpc/mm: use _raw variant of page table accessors
This switch few of the page table accessor to use the __raw variant
and does the cpu to big endian conversion of constants. This helps in
generating better code.

For ex: a pgd_none(pgd) check with and without fix is listed below

Without fix:
------------
   2240:	20 00 61 eb 	ld      r27,32(r1)
/* PGD level */
typedef struct { __be64 pgd; } pgd_t;
static inline unsigned long pgd_val(pgd_t x)
{
	return be64_to_cpu(x.pgd);

    2244:	22 00 66 78 	rldicl  r6,r3,32,32
    2248:	3e 40 7d 54 	rotlwi  r29,r3,8
    224c:	0e c0 7d 50 	rlwimi  r29,r3,24,0,7
    2250:	3e 40 c5 54 	rotlwi  r5,r6,8
    2254:	2e c4 7d 50 	rlwimi  r29,r3,24,16,23
    2258:	0e c0 c5 50 	rlwimi  r5,r6,24,0,7
    225c:	2e c4 c5 50 	rlwimi  r5,r6,24,16,23
    2260:	c6 07 bd 7b 	rldicr  r29,r29,32,31
    2264:	78 2b bd 7f 	or      r29,r29,r5
		if (pgd_none(pgd))
    2268:	00 00 bd 2f 	cmpdi   cr7,r29,0
    226c:	54 03 9e 41 	beq     cr7,25c0 <__get_user_pages_fast+0x500>

With fix:
---------
    2370:	20 00 61 eb 	ld      r27,32(r1)
		if (pgd_none(pgd))
    2374:	00 00 bd 2f 	cmpdi   cr7,r29,0
    2378:	a8 03 9e 41 	beq     cr7,2720 <__get_user_pages_fast+0x530>
			break;
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:51 +10:00
Aneesh Kumar K.V
bf16cdf48a powerpc/mm/radix: Update LPCR HR bit as per ISA
PowerISA 3.0 requires the MMU mode (radix vs. hash) of the hypervisor
to be mirrored in the LPCR register, in addition to the partition table.
This is done to avoid fetching from the table when deciding, among other
things, how to perform transitions to HV mode on some interrupts.
So let's set it up appropriately

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:50 +10:00
Balbir Singh
8cd6d3c23e powerpc/mm: Fix .long's in tlb-radix.c to more meaningful
The .longs with the shifts are harder to read, use more meaningful names
for the opcodes. PPC_TLBIE_5 is introduced for the 5 opcode variation of
the instruction due to an existing op-code for the 2 opcode variant.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:50 +10:00
Benjamin Herrenschmidt
08acce1cab powerpc/powernv/pci: Remove SWINV constants and obsolete TCE code
We have some obsolete code in pnv_pci_p7ioc_tce_invalidate()
to handle some internal lab tools that have stopped being
useful a long time ago. Remove that along with the definition
and test for the TCE_PCI_SWINV_* flags whose value is basically
always the same.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:47 +10:00
Benjamin Herrenschmidt
69c592ed40 powerpc/opal: Add real mode call wrappers
Replace the old generic opal_call_realmode() with proper per-call
wrappers similar to the normal ones and convert callers.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:46 +10:00
Benjamin Herrenschmidt
d74361881f powerpc/xics: Add ICP OPAL backend
This adds a new XICS backend that uses OPAL calls, which can be
used when we don't have native support for the platform interrupt
controller.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt
1d607bb3bd powerpc/irq: Add mechanism to force a replay of interrupts
Calling this function with interrupts soft-disabled will cause
a replay of the external interrupt vector when they are re-enabled.

This will be used by the OPAL XICS backend (and latter by the native
XIVE code) to handle EOI signaling that there are more interrupts to
fetch from the hardware since the hardware won't issue another HW
interrupt in that case.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt
9baaef0a22 powerpc/irq: Add support for HV virtualization interrupts
This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt
9fedd3f880 powerpc/powernv: Add XICS emulation APIs
OPAL provides an emulated XICS interrupt controller to
use as a fallback on newer processors that don't have a
XICS. It's meant as a way to provide backward compatibility
with future processors. Add the corresponding interfaces.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:43 +10:00
Shreyas B. Prabhu
bcef83a00d powerpc/powernv: Add platform support for stop instruction
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
	instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named Processor Stop Status and Control Register
	(PSSCR) is added which controls the behavior of stop instruction.

PSSCR layout:
----------------------------------------------------------
| PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
----------------------------------------------------------
0      4     41   42    43   44     48    54   56    60

PSSCR key fields:
	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
	power-saving state the thread entered since stop instruction was last
	executed.

	Bit 42 - Enable State Loss
	0 - No state is lost irrespective of other fields
	1 - Allows state loss

	Bits 44:47 - Power-Saving Level Limit
	This limits the power-saving level that can be entered into.

	Bits 60:63 - Requested Level
	Used to specify which power-saving level must be entered on executing
	stop instruction

This patch adds support for stop instruction and PSSCR handling.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:41 +10:00
Michael Ellerman
b5f1bf48f2 powerpc fixes for 4.7 #5
- tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur
  - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling
  - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
  - Initialise pci_io_base as early as possible from Darren Stevens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd
 kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h
 cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22
 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd
 INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5
 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg
 /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P
 sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7
 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE
 bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m
 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab
 Lh1o+G3gDo9MVaphoJ3w
 =oEAY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-5' into next

Pull in the fixes we sent during 4.7, we have code we want to merge into
next that depends on some of them.
2016-07-15 14:57:47 +10:00
Daniel Axtens
95ec77c06e powerpc: Make ppc_md.{halt, restart} __noreturn
powernv marks it's halt and restart calls as __noreturn. However,
ppc_md does not have this annotation. Add the annotation to ppc_md,
and then to every halt/restart function that is missing it.

Additionally, I have verified that all of these functions do not
return. Occasionally I have added a spin loop to be sure.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 21:12:06 +10:00
Andrew Donnellan
89379f165a PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
The cxl driver will use infrastructure from pnv_php to handle device tree
updates when switching bi-modal CAPI cards into CAPI mode.

To enable this, export pnv_php_find_slot() and
pnv_php_set_slot_power_state(), and add corresponding declarations, as well
as the definition of struct pnv_php_slot, to asm/pnv-pci.h.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:28:08 +10:00
Ian Munsie
4361b03430 powerpc/powernv: Add support for the cxl kernel api on the real phb
This adds support for the peer model of the cxl kernel api to the
PowerNV PHB, in which physical function 0 represents the cxl function on
the card (an XSL in the case of the CX4), which other physical functions
will use for memory access and interrupt services. It is referred to as
the peer model as these functions are peers of one another, as opposed
to the Virtual PHB model which forms a hierarchy.

This patch exports APIs to enable the peer mode, check if a PCI device
is attached to a PHB in this mode, and to set and get the peer AFU for
this mode.

The cxl driver will enable this mode for supported cards by calling
pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
that this mode is enabled, and switch out it's controller_ops for the
cxl version.

The cxl version of the controller_ops struct implements it's own
versions of the enable_device_hook and release_device to handle
refcounting on the peer AFU and to allocate a default context for the
device.

Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
there is no safe way to disable cxl mode short of a reboot, so until
that changes there is no reason to support the disable path.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:54 +10:00
Michael Ellerman
e0ddf7a245 powerpc/xmon: Dump ISA 2.07 SPRs
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:24 +10:00
Christophe Leroy
f86ef74ed9 powerpc/8xx: Fix vaddr for IMMR early remap
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
  * 0xfffdf000..0xfffff000  : fixmap
  * 0xfde00000..0xfe000000  : consistent mem
  * 0xfddf6000..0xfde00000  : early ioremap
  * 0xc9000000..0xfddf6000  : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1

Today, IMMR is mapped 1:1 at startup

Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area

This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.

The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
c223c90386 powerpc32: provide VIRT_CPU_ACCOUNTING
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:43:50 -05:00
Benjamin Herrenschmidt
a203658b5e powerpc/opal: Wake up kopald polling thread before waiting for events
On some environments (prototype machines, some simulators, etc...)
there is no functional interrupt source to signal completion, so
we rely on the fairly slow OPAL heartbeat.

In a number of cases, the calls complete very quickly or even
immediately. We've observed that it helps a lot to wakeup the OPAL
heartbeat thread before waiting for event in those cases, it will
call OPAL immediately to collect completions for anything that
finished fast enough.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 19:53:26 +10:00
Benjamin Herrenschmidt
e93d8e6773 powerpc/mm: Fix build of Book3E/64 with 64K pages
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 16:33:26 +10:00
Anton Blanchard
6dd7a82cc5 crypto: powerpc - Add POWER8 optimised crc32c
Use the vector polynomial multiply-sum instructions in POWER8 to
speed up crc32c.

This is just over 41x faster than the slice-by-8 method that it
replaces. Measurements on a 4.1 GHz POWER8 show it sustaining
52 GiB/sec.

A simple btrfs write performance test:

    dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096
    sync

is over 3.7x faster.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-05 23:05:19 +08:00
Anton Blanchard
151f25112f powerpc: define FUNC_START/FUNC_END
gcc provides FUNC_START/FUNC_END macros to help with creating
assembly functions. Mirror these in the kernel so we can more easily
share code between userspace and the kernel. FUNC_END is just a
stub since we don't currently annotate the end of kernel functions.

It might make sense to do a wholesale search and replace, but for
now just create a couple of defines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-05 23:05:16 +08:00
Oliver O'Halloran
7990102446 powerpc/timer: Large Decrementer support
Power ISAv3 adds a large decrementer (LD) mode which increases the size
of the decrementer register. The size of the enlarged decrementer
register is between 32 and 64 bits with the exact size being dependent
on the implementation. When in LD mode, reads are sign extended to 64
bits and a decrementer exception is raised when the high bit is set (i.e
the value goes below zero). Writes however are truncated to the physical
register width so some care needs to be taken to ensure that the high
bit is not set when reloading the decrementer. This patch adds support
for using the LD inside the host kernel on processors that support it.

When LD mode is supported firmware will supply the ibm,dec-bits property
for CPU nodes to allow the kernel to determine the maximum decrementer
value. Enabling LD mode is a hypervisor privileged operation so the kernel
can only enable it manually when running in hypervisor mode. Guests that
support LD mode can request it using the "ibm,client-architecture-support"
firmware call (not implemented in this patch) or some other platform
specific method. If this property is not supplied then the traditional
decrementer width of 32 bit is assumed and LD mode will not be enabled.

This patch was based on initial work by Jack Miller.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:53 +10:00
Chris Smart
ae26b36f80 powerpc: Send SIGBUS on unaligned copy and paste
Calling ISA 3.0 instructions copy, copy_first, paste and paste_last
generates an alignment fault when copying or pasting unaligned
data (128 byte). We catch this and send SIGBUS to the userspace
process that caused it.

We do not emulate these because paste may contain additional metadata
when pasting to a co-processor and paste_last is the synchronisation
point for preceding copy/paste sequences.

Thanks to Michael Neuling <mikey@neuling.org> for his help.

Signed-off-by: Chris Smart <chris@distroguy.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:51 +10:00
Linus Torvalds
70bd68d7f7 powerpc fixes for 4.7 #5
- tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur
  - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling
  - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
  - Initialise pci_io_base as early as possible from Darren Stevens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd
 kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h
 cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22
 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd
 INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5
 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg
 /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P
 sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7
 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE
 bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m
 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab
 Lh1o+G3gDo9MVaphoJ3w
 =oEAY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - tm: Always reclaim in start_thread() for exec() class syscalls from
   Cyril Bur

 - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
   Neuling

 - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan

 - Initialise pci_io_base as early as possible from Darren Stevens

* tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Initialise pci_io_base as early as possible
  powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
  powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
  powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
2016-07-02 17:47:54 -07:00
Darren Stevens
bfa37087aa powerpc: Initialise pci_io_base as early as possible
Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for
radix") turned kernel memory and IO addresses from #defined constants to
variables initialised at runtime.

On PA6T (pasemi) systems the setup_arch() machine call initialises the
onboard PCI-e root-ports, and uses pci_io_base to do this, which is now
before its value has been set, resulting in a panic early in boot before
console IO is initialised.

Move the pci_io_base initialisation to the same place as vmalloc ranges
are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the
earliest possible place we can initialise it.

Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add #ifdef CONFIG_PCI, massage change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-30 16:52:29 +10:00
Suraj Jitindar Singh
43a1dd9b5f powerpc/powernv: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space
to the operator panel display present on IBM Power Systems machines
with FSPs.

This will allow status information to be presented on the display which
is visible to a user.

The driver implements a character buffer which a user can read/write
by accessing the device (/dev/op_panel). This buffer is then displayed on
the operator panel display. Any attempt to write past the last character
position will have no effect and attempts to write more characters than
the size of the display will be truncated. The device may only be accessed
by a single process at a time.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 17:33:46 +10:00
Suraj Jitindar Singh
d0226d315d powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg
An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the
params[1] struct member. However this isn't intuitive or obvious when
reading the code and requires that a user look at the skiboot
documentation or opal-api.h to verify this.

Add an inline function to get the return code from an opal_msg and update
call sites accordingly.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 17:33:18 +10:00
Linus Torvalds
2f6e97477b powerpc fixes for 4.7 #4
- mm/radix: Update to tlb functions ric argument from Aneesh Kumar K.V
  - mm/radix: Flush page walk cache when freeing page table from Aneesh Kumar K.V
  - mm/hash: Use the correct PPP mask when updating HPTE from Aneesh Kumar K.V
  - mm/hash: Don't add memory coherence if cache inhibited is set from Aneesh Kumar K.V
  - mm/radix: Update Radix tree size as per ISA 3.0 from Aneesh Kumar K.V
  - eeh: Fix invalid cached PE primary bus from Gavin Shan
  - Fix faults caused by radix patching of SLB miss handler from Michael Ellerman
  - bpf/jit: Disable classic BPF JIT on ppc64le from Naveen N. Rao
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXbmITAAoJEFHr6jzI4aWANXwQAIUzjKcLpyQWEKwOKfMqBT5T
 EfsWDqJA/J3mYKNcZyiB7qv1NVPPkU9DSBK0OaAKwYdg5YWKDBl6R3mW+j4di0bP
 SkFACCyE2WbLTCiz5fzd8l974RUh5jKQpIrObp4/8xp40d0vsyAzz4J7d4HVRsrr
 BnoTS/KmytsaDQls5kYArxhW6U+Shag586Au1hNt3SS/be8lCNEXLfa3ltCr7WLJ
 k+xM0KM5kpO9/OK40A64TH7xUZKQIgPMUR5Ct43IJhMeHNnQctLmGQQjRWTrajv1
 K/TfrYwCl66xzKaH5G3MKJgqJAJm1LTwGs+2aOn91x5hPrbmW+bLqr1Mm0ukjROz
 oaANO5fgEQjl0JRGCNAhLHvaoqJX6v5/7GbmFRoaigX4UKJ63nK1ABiwAgKDGnyj
 OchwwJywU5UIX/+9Qpig3CxQNhEV33Nnp8t+dsg8CPd9o/G0mIe0QP1eGdhD09mM
 X9eMfN08hLj5ERKvlpW0rrq1b/wizOGmUXbmt02HZi7iLNsyQMwShiOvwOaAvH6/
 SzEFBJdp11jNoe4GJDt5rH4HlnTnTAYwcLFMTDCCPdJXy7voI/J+MaAmG89S30dQ
 ph0+4v/8K2N0VDZ7kkgi0GL1gp9ULkgtimrN5Z0R8U7qEapEW6ybvv+0Ewln742f
 SCRNVMZgzcwe3CcCKzdn
 =fxml
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "mm/radix (Aneesh Kumar K.V):
   - Update to tlb functions ric argument
   - Flush page walk cache when freeing page table
   - Update Radix tree size as per ISA 3.0

  mm/hash (Aneesh Kumar K.V):
   - Use the correct PPP mask when updating HPTE
   - Don't add memory coherence if cache inhibited is set

  eeh (Gavin Shan):
   - Fix invalid cached PE primary bus

  bpf/jit (Naveen N. Rao):
   - Disable classic BPF JIT on ppc64le

  .. and fix faults caused by radix patching of SLB miss handler
  (Michael Ellerman)"

* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
  powerpc: Fix faults caused by radix patching of SLB miss handler
  powerpc/eeh: Fix invalid cached PE primary bus
  powerpc/mm/radix: Update Radix tree size as per ISA 3.0
  powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
  powerpc/mm/hash: Use the correct PPP mask when updating HPTE
  powerpc/mm/radix: Flush page walk cache when freeing page table
  powerpc/mm/radix: Update to tlb functions ric argument
2016-06-25 06:01:48 -07:00
Michal Hocko
2379a23e34 powerpc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
size and that fits into !costly allocation request size.

PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
or order-4 requests.  The first one doesn't need the flag while the
second does.  Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
order-4 one.

This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
32d6bd9059 tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
This is the third version of the patchset previously sent [1].  I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree.  I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2.  I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.

Motivation:

While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree.  It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often.  It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.

I think it makes some sense to get rid of them because they are just
making the semantic more unclear.  Please note that GFP_REPEAT is
documented as

* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt

* _might_ fail.  This depends upon the particular VM implementation.
  while !costly requests have basically nofail semantic.  So one could
  reasonably expect that order-0 request with __GFP_REPEAT will not loop
  for ever.  This is not implemented right now though.

I would like to move on with __GFP_REPEAT and define a better semantic
for it.

  $ git grep __GFP_REPEAT origin/master | wc -l
  111
  $ git grep __GFP_REPEAT | wc -l
  36

So we are down to the third after this patch series.  The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests.  This still needs some double checking which I will do later
after all the simple ones are sorted out.

I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them.  Patches should be quite trivial to
review for stupid compile mistakes though.  The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.

[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org

This patch (of 19):

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.  Yet we
have the full kernel tree with its usage for apparently order-0
allocations.  This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).

Let's simply drop __GFP_REPEAT from those places.  This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.

Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Naveen N. Rao
156d0e290e powerpc/ebpf/jit: Implement JIT compiler for extended BPF
PPC64 eBPF JIT compiler.

Enable with:
  echo 1 > /proc/sys/net/core/bpf_jit_enable
or
  echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':

 test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:17:57 +10:00
Naveen N. Rao
277285b854 powerpc/bpf/jit: Introduce rotate immediate instructions
Since we will be using the rotate immediate instructions for extended
BPF JIT, let's introduce macros for the same. And since the shift
immediate operations use the rotate immediate instructions, let's redo
those macros to use the newly introduced instructions.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:15:14 +10:00
Gavin Shan
ea0d856cb2 powerpc/powernv: Functions to get/set PCI slot state
This exports 4 functions, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

   pnv_pci_get_device_tree()    opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()    opal_pci_get_power_state()
   pnv_pci_set_power_state()    opal_pci_set_power_state()

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:58 +10:00
Gavin Shan
7e19bf32c8 powerpc/powernv: Introduce pnv_pci_get_slot_id()
This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI
slot ID from the corresponding device node. It will be used by
hotplug driver.

Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:58 +10:00