Commit Graph

8433 Commits

Author SHA1 Message Date
Michael Ellerman
87c78b612f powerpc: Fix all occurences of "the the"
Rather than waiting for the bots to fix these one-by-one, fix all
occurences of "the the" throughout arch/powerpc.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220518142629.513007-1-mpe@ellerman.id.au
2022-05-22 15:59:43 +10:00
Daniel Axtens
41b7a347bf powerpc: Book3S 64-bit outline-only KASAN support
Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in powerpc docs.

Background
----------

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

    [ppc64 mm note: The kernel installs a linear mapping at effective
    address c000...-c008.... This is a one-to-one mapping with physical
    memory from 0000... onward. Because of how memory accesses work on
    powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
    same memory both with translations on (accessing as an 'effective
    address'), and with translations off (accessing as a 'real
    address'). This works in both guests and the hypervisor. For more
    details, see s5.7 of Book III of version 3 of the ISA, in particular
    the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
    KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

[paulus@ozlabs.org - Rebased onto 5.17.  Note that a kernel with
 CONFIG_KASAN=y will crash during boot on a machine using HPT
 translation because not all the entry points to the generic
 KASAN code are protected with a call to kasan_arch_is_ready().]

Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Update copyright year and comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-22 15:58:29 +10:00
Daniel Axtens
5352090a99 powerpc/kasan: Don't instrument non-maskable or raw interrupts
Disable address sanitization for raw and non-maskable interrupt
handlers, because they can run in real mode, where we cannot access
the shadow memory.  (Note that kasan_arch_is_ready() doesn't test for
real mode, since it is a static branch for speed, and in any case not
all the entry points to the generic KASAN code are protected by
kasan_arch_is_ready guards.)

The changes to interrupt_nmi_enter/exit_prepare() look larger than
they actually are.  The changes are equivalent to adding
!IS_ENABLED(CONFIG_KASAN) to the conditions for calling nmi_enter() or
nmi_exit() in real mode.  That is, the code is equivalent to using the
following condition for calling nmi_enter/exit:

	if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
			!firmware_has_feature(FW_FEATURE_LPAR) ||
			radix_enabled()) &&
		    !IS_ENABLED(CONFIG_KASAN) ||
		(mfmsr() & MSR_DR))

That unwieldy condition has been split into several statements with
comments, for easier reading.

The nmi_ipi_lock functions that call atomic functions (i.e.,
nmi_ipi_lock_start(), nmi_ipi_lock() and nmi_ipi_unlock()), besides
being marked noinstr, now call arch_atomic_* functions instead of
atomic_* functions because with KASAN enabled, the atomic_* functions
are wrappers which explicitly do address sanitization on their
arguments.  Since we are trying to avoid address sanitization, we have
to use the lower-level arch_atomic_* versions.

In hv_nmi_check_nonrecoverable(), the regs_set_unrecoverable() call
has been open-coded so as to avoid having to either trust the inlining
or mark regs_set_unrecoverable() as noinstr.

[paulus@ozlabs.org: combined a few work-in-progress commits of
 Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTFGaKM8Pd46PIK@cleo
2022-05-22 15:58:29 +10:00
Naveen N. Rao
84ade0a665 powerpc/ftrace: Remove ftrace init tramp once kernel init is complete
Stop using the ftrace trampoline for init section once kernel init is
complete.

Fixes: 67361cf807 ("powerpc/ftrace: Handle large kernel configs")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com
2022-05-22 15:58:29 +10:00
Christophe Leroy
5fe855169f powerpc/irq: Remove arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO
All supported versions of GCC & clang support asm goto.

Remove the !CONFIG_CC_HAS_ASM_GOTO version of arch_local_irq_restore()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/58df50c9e77e2ed945bacdead30412770578886b.1652715336.git.christophe.leroy@csgroup.eu
2022-05-22 15:58:28 +10:00
Christophe Leroy
e0c2ef4321 powerpc/modules: Use PPC_LI macros instead of opencoding
Use PPC_LI_MASK and PPC_LI() instead of opencoding.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3d56d7bc3200403773d54e62659d0e01292a055d.1652074503.git.christophe.leroy@csgroup.eu
2022-05-22 15:58:27 +10:00
Christophe Leroy
8052d043a4 powerpc/ftrace: Don't use copy_from_kernel_nofault() in module_trampoline_target()
module_trampoline_target() is quite a hot path used when
activating/deactivating function tracer.

Avoid the heavy copy_from_kernel_nofault() by doing four calls
to copy_inst_from_kernel_nofault().

Use __copy_inst_from_kernel_nofault() for the 3 last calls. First call
is done to copy_from_kernel_nofault() to check address is within
kernel space. No risk to wrap out the top of kernel space because the
last page is never mapped so if address is in last page the first copy
will fails and the other ones will never be performed.

And also make it notrace just like all functions that call it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c55559103e014b7863161559d340e8e9484eaaa6.1652074503.git.christophe.leroy@csgroup.eu
2022-05-22 15:58:27 +10:00
Christophe Leroy
af8b9f352f powerpc/ftrace: Minimise number of #ifdefs
A lot of #ifdefs can be replaced by IS_ENABLED()

Do so.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fold in changes suggested by Naveen and Christophe on list]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18ce6708d6f8c71d87436f9c6019f04df4125128.1652074503.git.christophe.leroy@csgroup.eu
2022-05-22 15:58:26 +10:00
Christophe Leroy
b97d0e3dcf powerpc/ftrace: Simplify expected_nop_sequence()
Avoid ifdefs around expected_nop_sequence().

While at it make it a bool.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/305d22472f1f92127fba09692df6bb5d079a8cd0.1652074503.git.christophe.leroy@csgroup.eu
2022-05-22 15:58:26 +10:00
Christophe Leroy
c8deb28095 powerpc/ftrace: Use size macro instead of opencoding
0x80000000 is SZ_2G. Use it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix comparison against unsigned -SZ_2G]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bb6626e884acffe87b58736291df57db3deaa9b9.1652074503.git.christophe.leroy@csgroup.eu
2022-05-22 15:57:38 +10:00
Christophe Leroy
e89aa642be powerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.
PPC_RAW_xxx() macros are self explanatory and less error prone
than open coding.

Use them in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9292094c9a69cef6d29ee83f435a557b59c45065.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
cf9df92a82 powerpc/ftrace: Use BRANCH_SET_LINK instead of value 1
To make it explicit, use BRANCH_SET_LINK instead of value 1
when calling create_branch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d57847063ac93660a5af620d4df1847f10edf61a.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
ccf6607e45 powerpc/ftrace: Remove ftrace_plt_tramps[]
ftrace_plt_tramps table is never filled so it is useless.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeeb618a6619e3a7e3f82f1bd83ca7c25af6330.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
c2cba93d1a powerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE
Since commit 0c0c52306f ("powerpc: Only support DYNAMIC_FTRACE not
static"), CONFIG_DYNAMIC_FTRACE is always selected when
CONFIG_FUNCTION_TRACER is selected.

To avoid confusion and have the reader wonder what's happen when
CONFIG_FUNCTION_TRACER is selected and CONFIG_DYNAMIC_FTRACE is not,
use CONFIG_FUNCTION_TRACER in ifdefs instead of CONFIG_DYNAMIC_FTRACE.

As CONFIG_FUNCTION_GRAPH_TRACER depends on CONFIG_FUNCTION_TRACER,
ftrace.o doesn't need to appear for both symbols in Makefile.

Then as ftrace.o is built only when CONFIG_FUNCTION_TRACER is selected
ifdef CONFIG_FUNCTION_TRACER is not needed in ftrace.c, and since it
implies CONFIG_DYNAMIC_FTRACE, CONFIG_DYNAMIC_FTRACE is not needed
in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/628d357503eb90b4a034f99b7df516caaff4d279.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
a3d0f5b4b7 powerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS
Since commit 7bea7ac0ca ("powerpc/syscalls: Fix syscall tracing")
ftrace.o is not needed anymore for CONFIG_FTRACE_SYSCALLS.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/275932a5d61543b825ff9a64f61abed6da5d4a2a.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
23b44fc248 powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64
Since c93d4f6ecf ("powerpc/ftrace: Add module_trampoline_target()
for PPC32"), __ftrace_make_nop() for PPC32 is very similar to the
one for PPC64.

Same for __ftrace_make_call().

Make them common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f53c237316dab4b1b8c682685266faa92da816.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
5b89492c03 powerpc: Finalise cleanup around ABI use
Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2,
get rid of all indirect detection of ABI version.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/709d9d69523c14c8a9fba4486395dca0f2d675b1.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
7d40aff821 powerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}
Replace all uses of PPC64_ELF_ABI_v1 and PPC64_ELF_ABI_v2 by
resp CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba13d59e8c50bc9aa6328f1c7f0c0d0278e0a3a7.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:29 +10:00
Christophe Leroy
bbffdd2fc7 powerpc/ftrace: Use patch_instruction() return directly
Instead of returning -EPERM when patch_instruction() fails,
just return what patch_instruction returns.

That simplifies ftrace_modify_code():

	   0:	94 21 ff c0 	stwu    r1,-64(r1)
	   4:	93 e1 00 3c 	stw     r31,60(r1)
	   8:	7c 7f 1b 79 	mr.     r31,r3
	   c:	40 80 00 30 	bge     3c <ftrace_modify_code+0x3c>
	  10:	93 c1 00 38 	stw     r30,56(r1)
	  14:	7c 9e 23 78 	mr      r30,r4
	  18:	7c a4 2b 78 	mr      r4,r5
	  1c:	80 bf 00 00 	lwz     r5,0(r31)
	  20:	7c 1e 28 40 	cmplw   r30,r5
	  24:	40 82 00 34 	bne     58 <ftrace_modify_code+0x58>
	  28:	83 c1 00 38 	lwz     r30,56(r1)
	  2c:	7f e3 fb 78 	mr      r3,r31
	  30:	83 e1 00 3c 	lwz     r31,60(r1)
	  34:	38 21 00 40 	addi    r1,r1,64
	  38:	48 00 00 00 	b       38 <ftrace_modify_code+0x38>
				38: R_PPC_REL24	patch_instruction

Before:

	   0:	94 21 ff c0 	stwu    r1,-64(r1)
	   4:	93 e1 00 3c 	stw     r31,60(r1)
	   8:	7c 7f 1b 79 	mr.     r31,r3
	   c:	40 80 00 4c 	bge     58 <ftrace_modify_code+0x58>
	  10:	93 c1 00 38 	stw     r30,56(r1)
	  14:	7c 9e 23 78 	mr      r30,r4
	  18:	7c a4 2b 78 	mr      r4,r5
	  1c:	80 bf 00 00 	lwz     r5,0(r31)
	  20:	7c 08 02 a6 	mflr    r0
	  24:	90 01 00 44 	stw     r0,68(r1)
	  28:	7c 1e 28 40 	cmplw   r30,r5
	  2c:	40 82 00 48 	bne     74 <ftrace_modify_code+0x74>
	  30:	7f e3 fb 78 	mr      r3,r31
	  34:	48 00 00 01 	bl      34 <ftrace_modify_code+0x34>
				34: R_PPC_REL24	patch_instruction
	  38:	80 01 00 44 	lwz     r0,68(r1)
	  3c:	20 63 00 00 	subfic  r3,r3,0
	  40:	83 c1 00 38 	lwz     r30,56(r1)
	  44:	7c 63 19 10 	subfe   r3,r3,r3
	  48:	7c 08 03 a6 	mtlr    r0
	  4c:	83 e1 00 3c 	lwz     r31,60(r1)
	  50:	38 21 00 40 	addi    r1,r1,64
	  54:	4e 80 00 20 	blr

It improves ftrace activation/deactivation duration by about 3%.

Modify patch_instruction() return on failure to -EPERM in order to
match with ftrace expectations. Other users of patch_instruction()
do not care about the exact error value returned.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/49a8597230713e2633e7d9d7b56140787c4a7e20.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:28 +10:00
Christophe Leroy
2c920fca8c powerpc/ftrace: Inline ftrace_modify_code()
Inlining ftrace_modify_code(), it increases a bit the
size of ftrace code but brings 5% improvment on ftrace
activation.

Usually in C files we let gcc decide what to do but here
it really help to 'help' gcc to decide to inline, thought
we don't want to force it with an __always_inline that
would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1597a06d57cfc80e6853838c4066e799bf6c7977.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:28 +10:00
Christophe Leroy
a1facd2578 powerpc/ftrace: Use is_offset_in_branch_range()
Use is_offset_in_branch_range() instead of create_branch()
to check if a target is within branch range.

This patch together with the previous one improves
ftrace activation time by 7%

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/912ae51782f5a53c44e435497c8c3fb5cc632387.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:28 +10:00
Christophe Leroy
ae3a2a2188 powerpc/ftrace: Remove redundant create_branch() calls
Since commit d5937db114 ("powerpc/code-patching: Fix patch_branch()
return on out-of-range failure") patch_branch() fails with -ERANGE
when trying to branch out of range.

No need to perform the test twice. Remove redundant create_branch()
calls.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa45fbad0b4b7493080835d8276c0cb4ce146503.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:28 +10:00
Christophe Leroy
d996d5053e powerpc/ftrace: Refactor prepare_ftrace_return()
When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS,
prepare_ftrace_return() is called by ftrace_graph_func()
otherwise prepare_ftrace_return() is called from assembly.

Refactor prepare_ftrace_return() into a static
__prepare_ftrace_return() that will be called by both
prepare_ftrace_return() and ftrace_graph_func().

It will allow GCC to fold __prepare_ftrace_return() inside
ftrace_graph_func().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d42deafe353980c66cf19d3132638c05ba9f4a9.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19 23:11:28 +10:00
Nicholas Piggin
804c0a166f powerpc/rtas: enture rtas_call is called with MMU enabled
rtas_call must not be called with the MMU disabled because in case
of rtas error, log_error is called which requires MMU enabled. Add
a test and warning for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-14-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
014b2e896c powerpc/rtas: Leave MSR[RI] enabled over RTAS call
PAPR specifies that RTAS may be called with MSR[RI] enabled if the
calling context is recoverable, and RTAS will manage RI as necessary.
Call the rtas entry point with RI enabled, and add a check to ensure
the caller has RI enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-10-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
5c86bd02b3 powerpc/rtas: PACA can be restored directly from SPRG
On 64-bit, PACA is saved in a SPRG so it does not need to be saved on
stack. We also don't need to mask off the top bits for real mode
addresses because the architecture does this for us.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-8-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
c5a65e0a42 powerpc/rtas: Call enter_rtas with MSR[EE] disabled
Disable MSR[EE] in C code rather than asm.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-5-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
4e949faae2 powerpc/rtas: Fix whitespace in rtas_entry.S
The code was moved verbatim including whitespace cruft. Fix that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-4-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
07940b4b61 powerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bit
This symbol is marked nokprobe on 32-bit but not 64-bit, add it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-3-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
838ee286ec powerpc/rtas: Move rtas entry assembly into its own file
This makes working on the code a bit easier.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-2-npiggin@gmail.com
2022-05-19 23:11:27 +10:00
Nicholas Piggin
2896b2dff4 powerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ
Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
dynamically size stack allocations in a manner forward-compatible with
new processor state saved in the signal frame

For now these statically find the maximum signal frame size rather than
doing any runtime testing of features to minimise the size.

glibc 2.34 will take advantage of this, as will applications that use
use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
References: 94b07c1f8c ("arm64: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-2-npiggin@gmail.com
2022-05-19 23:11:26 +10:00
Nathan Chancellor
4406b12214 powerpc/vdso: Link with ld.lld when requested
The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:

  $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
  ...

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms.

This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place.

Ultimately, the PowerPC vDSO should be converted to compiling and
linking with $(CC) and $(LD) respectively but there were issues last
time this was tried, potentially due to older but supported tool
versions. To avoid regressing GCC + binutils, use the compiler option
'-fuse-ld', which tells the compiler which linker to use when it is
invoked as both the compiler and linker. Use '-fuse-ld=lld' when
LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.

Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: 1bc5c84710
Link: https://lore.kernel.org/r/20220511185001.3269404-3-nathan@kernel.org
2022-05-19 23:11:26 +10:00
Nathan Chancellor
e247172854 powerpc/vdso: Remove unused ENTRY in linker scripts
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:

  ld.lld: warning: cannot find entry symbol _start; not setting start address
  ld.lld: warning: cannot find entry symbol _start; not setting start address

Looking at GCC + GNU ld, the entry point address is 0x0:

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

This matches what ld.lld emits:

  $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg

  File: vdso32.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: vdso64.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511185001.3269404-2-nathan@kernel.org
2022-05-19 23:11:26 +10:00
Kevin Hao
d9e5c3e9e7 powerpc: Export mmu_feature_keys[] as non-GPL
When the mmu_feature_keys[] was introduced in the commit c12e6f24d4
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we exported it as GPL only.

But with the evolution of the codes, especially the PPC_KUAP support, it
may be indirectly referenced by some primitive macro or inline functions
such as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL
modules too.

Fixes: 7613f5a66b ("powerpc/64s/kuap: Use mmu_has_feature()")
Reported-by: Nathaniel Filardo <nwfilardo@gmail.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220329085709.4132729-1-haokexin@gmail.com
2022-05-19 23:11:26 +10:00
Guilherme G. Piccoli
e2aa34ce80 powerpc/setup: Refactor/untangle panic notifiers
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
2022-05-19 23:11:26 +10:00
Michael Ellerman
b104e41cda Merge branch 'topic/ppc-kvm' into next
Merge our KVM topic branch.
2022-05-19 23:10:42 +10:00
Alexey Kardashevskiy
cad32d9d42 KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-19 00:44:01 +10:00
Michael Ellerman
a5fc286f69 Merge branch 'fixes' into next
Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.
2022-05-19 00:11:51 +10:00
Laurent Dufour
b6b1c3ce06 powerpc/rtas: Keep MSR[RI] set when calling RTAS
RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big
endian mode (MSR[SF,LE] unset).

The change in MSR is done in enter_rtas() in a relatively complex way,
since the MSR value could be hardcoded.

Furthermore, a panic has been reported when hitting the watchdog interrupt
while running in RTAS, this leads to the following stack trace:

  watchdog: CPU 24 Hard LOCKUP
  watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago)
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  Oops: Unrecoverable System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 3ddec07f638c34a2 ]---

This happens because MSR[RI] is unset when entering RTAS but there is no
valid reason to not set it here.

RTAS is expected to be called with MSR[RI] as specified in PAPR+ section
"7.2.1 Machine State":

  R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect
  its own critical regions from recursion by setting the MSR[RI] bit to
  0 when in the critical regions.

Fixing this by reviewing the way MSR is compute before calling RTAS. Now a
hardcoded value meaning real mode, 32 bits big endian mode and Recoverable
Interrupt is loaded. In the case MSR[S] is set, it will remain set while
entering RTAS as only urfid can unset it (thanks Fabiano).

In addition a check is added in do_enter_rtas() to detect calls made with
MSR[RI] unset, as we are forcing it on later.

This patch has been tested on the following machines:
Power KVM Guest
  P8 S822L (host Ubuntu kernel 5.11.0-49-generic)
PowerVM LPAR
  P8 9119-MME (FW860.A1)
  p9 9008-22L (FW950.00)
  P10 9080-HEX (FW1010.00)

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
2022-05-11 23:06:40 +10:00
Christophe Leroy
e59596a2d6 powerpc: Use static call for get_irq()
__do_irq() inconditionnaly calls ppc_md.get_irq()

That's definitely a hot path.

At the time being ppc_md.get_irq address is read every time
from ppc_md structure.

Replace that call by a static call, and initialise that
call after ppc_md.init_IRQ() has set ppc_md.get_irq.

Emit a warning and don't set the static call if ppc_md.init_IRQ()
is still NULL, that way the kernel won't blow up if for some
reason ppc_md.get_irq() doesn't get properly set.

With the patch:

	00000000 <__SCT__ppc_get_irq>:
	   0:	48 00 00 20 	b       20 <__static_call_return0>	<== Replaced by 'b <ppc_md.get_irq>' at runtime
...
	00000020 <__static_call_return0>:
	  20:	38 60 00 00 	li      r3,0
	  24:	4e 80 00 20 	blr
...
	00000058 <__do_irq>:
...
	  64:	48 00 00 01 	bl      64 <__do_irq+0xc>
				64: R_PPC_REL24	__SCT__ppc_get_irq
	  68:	2c 03 00 00 	cmpwi   r3,0
...

Before the patch:

	00000038 <__do_irq>:
...
	  3c:	3d 20 00 00 	lis     r9,0
				3e: R_PPC_ADDR16_HA	ppc_md+0x1c
...
	  44:	81 29 00 00 	lwz     r9,0(r9)
				46: R_PPC_ADDR16_LO	ppc_md+0x1c
...
	  4c:	7d 29 03 a6 	mtctr   r9
	  50:	4e 80 04 21 	bctrl
	  54:	2c 03 00 00 	cmpwi   r3,0
...

On PPC64 which doesn't implement static calls yet we get:

00000000000000d0 <__do_irq>:
...
      dc:	00 00 22 3d 	addis   r9,r2,0
			dc: R_PPC64_TOC16_HA	.data+0x8
...
      e4:	00 00 89 e9 	ld      r12,0(r9)
			e4: R_PPC64_TOC16_LO_DS	.data+0x8
...
      f0:	a6 03 89 7d 	mtctr   r12
      f4:	18 00 41 f8 	std     r2,24(r1)
      f8:	21 04 80 4e 	bctrl
      fc:	18 00 41 e8 	ld      r2,24(r1)
...

So on PPC64 that's similar to what we get without static calls.
But at least until ppc_md.get_irq() is set the call is to
__static_call_return0.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afb92085f930651d8b1063e4d4bf0396c80ebc7d.1647002274.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
e6f6390ab7 powerpc: Add missing headers
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h,
asm/pci.h etc...

Include the needed headers, and remove asm/prom.h when it was
needed exclusively for pulling necessary headers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
86c38fec69 powerpc: Remove asm/prom.h from all files that don't need it
Several files include asm/prom.h for no reason.

Clean it up.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop change to prom_parse.c as reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:04 +10:00
Christophe Leroy
0aa297e73b powerpc/64: Move pci_device_from_OF_node() out of asm/pci-bridge.h
Move pci_device_from_OF_node() in pci64.c because it needs definition
of struct device_node and is not worth inlining.

ppc32.c already has it in pci32.c.

That way pci-bridge.h doesn't need linux/of.h (Brought by asm/prom.h
via asm/pci.h)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3c88286b55413730d7784133993a46ef4a3607ce.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:20 +10:00
Nicholas Piggin
a553476c44 powerpc/64: remove system call instruction emulation
emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.

This patch removes system call emulation and disables stepping system
calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-06 00:00:20 +10:00
Naveen N. Rao
54cdacd7d3 powerpc: Reject probes on instructions that can't be single stepped
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software

Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.

For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.

For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().

Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.

In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.

In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f56ee979d50b8711fae350fc97870f3ca34acd75.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-06 00:00:20 +10:00
Julia Lawall
1fd02f6605 powerpc: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-05-05 22:12:44 +10:00
Christophe Leroy
3ba4289a3e powerpc: Simplify and move arch_randomize_brk()
arch_randomize_brk() is only needed for hash on book3s/64, for other
platforms the one provided by the default mmap layout is good enough.

Move it to hash_utils.c and use randomize_page() like the generic one.

And properly opt out the radix case instead of making an assumption
on mmu_highuser_ssize.

Also change to a 32M range like most other architectures instead of 8M.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05 22:11:58 +10:00
Christophe Leroy
f693d38d94 powerpc/mm: Remove CONFIG_PPC_MM_SLICES
CONFIG_PPC_MM_SLICES is always selected by hash book3s/64.
CONFIG_PPC_MM_SLICES is never selected by other platforms.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dc2cdc204de8978574bf7c02329b6cfc4db0bce7.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05 22:11:57 +10:00
Michael Ellerman
d0a31acc34 Linux 5.18-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmJlxloeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGBoIH/3b1GGuTBq8XndVl
 1EaCJe/3auE8cHklNpLyTWsQY7He9CcIe4b0fGmtkUlwqAE6E5fUPfqwzjjb3eux
 MDPYYKbnjm/jA73dbTbr3lxMlc/caZpuRrwwuek0+vS0DLYhP917NmDvGX8q3l5U
 84RCEHztrTmzOivS0BwNJV1XFpcqnODTDN4zNR43o9ZY9tVY4/OqL0+lcQIHM2Nh
 6urEzWMMi+BRGaOqdgtt+NxmgKQNTRAkPan6FpJloRSxrOzl4LiYBKMfE2iyisND
 91i1CGvOQjaQNIw9JYNvtSWGawMb+obTyCFHyj1Qm7LwD0VAZ+FQrvrdGz4rJrAY
 Sq01XYs=
 =Rl6l
 -----END PGP SIGNATURE-----

Merge tag 'v5.18-rc4' into next

Merge master into next, to bring in commit 5f24d5a579 ("mm, hugetlb:
allow for "high" userspace addresses"), which is needed as a
prerequisite for the series converting powerpc to the generic mmap
logic.
2022-05-05 22:09:35 +10:00
Michael Ellerman
6d65028eb6 powerpc/vdso: Fix incorrect CFI in gettimeofday.S
As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d8056e3 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x00007fffffffd960 in ?? ()
  #3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x0000000100015b60 in gettime ()
  #3  0x000000010000c8bc in print_long_format ()
  #4  0x000000010000d180 in print_current_files ()
  #5  0x00000001000054ac in main ()
  (gdb) up
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: ce7d8056e3 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20220502125010.1319370-1-mpe@ellerman.id.au
2022-05-04 22:12:10 +10:00