Commit Graph

23555 Commits

Author SHA1 Message Date
Christoph Hellwig
4eeb96f6ef iommu/fsl_pamu: replace DOMAIN_ATTR_FSL_PAMU_STASH with a direct call
Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly
instead of indirecting through the iommu attr API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-8-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:52 +02:00
Greg Kroah-Hartman
9594408763 Merge 5.12-rc6 into tty-next
We need the serial/tty fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 08:59:21 +02:00
Christophe Leroy
b0b3b2c78e powerpc: Switch to relative jump labels
Convert powerpc to relative jump labels.

Before the patch, pseries_defconfig vmlinux.o has:
9074 __jump_table  0003f2a0  0000000000000000  0000000000000000  01321fa8  2**0

With the patch, the same config gets:
9074 __jump_table  0002a0e0  0000000000000000  0000000000000000  01321fb4  2**0

Size is 258720 without the patch, 172256 with the patch.
That's a 33% size reduction.

Largely copied from commit c296146c05 ("arm64/kernel: jump_label:
Switch to relative references")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/828348da7868eda953ce023994404dfc49603b64.1616514473.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy
40272035e1 powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC32
When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to
avoid having to save them/restore on the stack.

Before this patch, the test #359 ADD default X is:

   0:	7c 64 1b 78 	mr      r4,r3
   4:	38 60 00 00 	li      r3,0
   8:	94 21 ff b0 	stwu    r1,-80(r1)
   c:	60 00 00 00 	nop
  10:	92 e1 00 2c 	stw     r23,44(r1)
  14:	93 01 00 30 	stw     r24,48(r1)
  18:	93 21 00 34 	stw     r25,52(r1)
  1c:	93 41 00 38 	stw     r26,56(r1)
  20:	39 80 00 00 	li      r12,0
  24:	39 60 00 00 	li      r11,0
  28:	3b 40 00 00 	li      r26,0
  2c:	3b 20 00 00 	li      r25,0
  30:	7c 98 23 78 	mr      r24,r4
  34:	7c 77 1b 78 	mr      r23,r3
  38:	39 80 00 42 	li      r12,66
  3c:	39 60 00 00 	li      r11,0
  40:	7d 8c d2 14 	add     r12,r12,r26
  44:	39 60 00 00 	li      r11,0
  48:	7d 83 63 78 	mr      r3,r12
  4c:	82 e1 00 2c 	lwz     r23,44(r1)
  50:	83 01 00 30 	lwz     r24,48(r1)
  54:	83 21 00 34 	lwz     r25,52(r1)
  58:	83 41 00 38 	lwz     r26,56(r1)
  5c:	38 21 00 50 	addi    r1,r1,80
  60:	4e 80 00 20 	blr

After this patch, the same test has become:

   0:	7c 64 1b 78 	mr      r4,r3
   4:	38 60 00 00 	li      r3,0
   8:	94 21 ff b0 	stwu    r1,-80(r1)
   c:	60 00 00 00 	nop
  10:	39 80 00 00 	li      r12,0
  14:	39 60 00 00 	li      r11,0
  18:	39 00 00 00 	li      r8,0
  1c:	38 e0 00 00 	li      r7,0
  20:	7c 86 23 78 	mr      r6,r4
  24:	7c 65 1b 78 	mr      r5,r3
  28:	39 80 00 42 	li      r12,66
  2c:	39 60 00 00 	li      r11,0
  30:	7d 8c 42 14 	add     r12,r12,r8
  34:	39 60 00 00 	li      r11,0
  38:	7d 83 63 78 	mr      r3,r12
  3c:	38 21 00 50 	addi    r1,r1,80
  40:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b94562d7d2bb21aec89de0c40bb3cd91054b65a2.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy
51c66ad849 powerpc/bpf: Implement extended BPF on PPC32
Implement Extended Berkeley Packet Filter on Powerpc 32

Test result with test_bpf module:

	test_bpf: Summary: 378 PASSED, 0 FAILED, [354/366 JIT'ed]

Registers mapping:

	[BPF_REG_0] = r11-r12
	/* function arguments */
	[BPF_REG_1] = r3-r4
	[BPF_REG_2] = r5-r6
	[BPF_REG_3] = r7-r8
	[BPF_REG_4] = r9-r10
	[BPF_REG_5] = r21-r22 (Args 9 and 10 come in via the stack)
	/* non volatile registers */
	[BPF_REG_6] = r23-r24
	[BPF_REG_7] = r25-r26
	[BPF_REG_8] = r27-r28
	[BPF_REG_9] = r29-r30
	/* frame pointer aka BPF_REG_10 */
	[BPF_REG_FP] = r17-r18
	/* eBPF jit internal registers */
	[BPF_REG_AX] = r19-r20
	[TMP_REG] = r31

As PPC32 doesn't have a redzone in the stack, a stack frame must always
be set in order to host at least the tail count counter.

The stack frame remains for tail calls, it is set by the first callee
and freed by the last callee.

r0 is used as temporary register as much as possible. It is referenced
directly in the code in order to avoid misusing it, because some
instructions interpret it as value 0 instead of register r0
(ex: addi, addis, stw, lwz, ...)

The following operations are not implemented:

		case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
		case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
		case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */

The following operations are only implemented for power of two constants:

		case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
		case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/61d8b149176ddf99e7d5cef0b6dc1598583ca202.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy
355a8d26cd powerpc/asm: Add some opcodes in asm/ppc-opcode.h for PPC32 eBPF
The following opcodes will be needed for the implementation
of eBPF for PPC32. Add them in asm/ppc-opcode.h

PPC_RAW_ADDE
PPC_RAW_ADDZE
PPC_RAW_ADDME
PPC_RAW_MFLR
PPC_RAW_ADDIC
PPC_RAW_ADDIC_DOT
PPC_RAW_SUBFC
PPC_RAW_SUBFE
PPC_RAW_SUBFIC
PPC_RAW_SUBFZE
PPC_RAW_ANDIS
PPC_RAW_NOR

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7bd573a368edd78006f8a5af508c726e7ce1ed2.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy
c426810fcf powerpc/bpf: Change values of SEEN_ flags
Because PPC32 will use more non volatile registers,
move SEEN_ flags to positions 0-2 which corresponds to special
registers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/608faa1dc3ecfead649e15392abd07b00313d2ba.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
4ea76e90a9 powerpc/bpf: Move common functions into bpf_jit_comp.c
Move into bpf_jit_comp.c the functions that will remain common to
PPC64 and PPC32 when we add support of EBPF for PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2c339d77fb168ef12b213ccddfee3cb6c8ce8ae1.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
f1b1583d5f powerpc/bpf: Move common helpers into bpf_jit.h
Move functions bpf_flush_icache(), bpf_is_seen_register() and
bpf_set_seen_register() in order to reuse them in future
bpf_jit_comp32.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/28e8d5a75e64807d7e9d39a4b52658755e259f8c.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
ed573b57e7 powerpc/bpf: Change register numbering for bpf_set/is_seen_register()
Instead of using BPF register number as input in functions
bpf_set_seen_register() and bpf_is_seen_register(), use
CPU register number directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0cd2506f598e7095ea43e62dca1f472de5474a0d.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
6944caad78 powerpc/bpf: Remove classical BPF support for PPC32
At the time being, PPC32 has Classical BPF support.

The test_bpf module exhibits some failure:

	test_bpf: #298 LD_IND byte frag jited:1 ret 202 != 66 FAIL (1 times)
	test_bpf: #299 LD_IND halfword frag jited:1 ret 51958 != 17220 FAIL (1 times)
	test_bpf: #301 LD_IND halfword mixed head/frag jited:1 ret 51958 != 1305 FAIL (1 times)
	test_bpf: #303 LD_ABS byte frag jited:1 ret 202 != 66 FAIL (1 times)
	test_bpf: #304 LD_ABS halfword frag jited:1 ret 51958 != 17220 FAIL (1 times)
	test_bpf: #306 LD_ABS halfword mixed head/frag jited:1 ret 51958 != 1305 FAIL (1 times)

	test_bpf: Summary: 371 PASSED, 7 FAILED, [119/366 JIT'ed]

Fixing this is not worth the effort. Instead, remove support for
classical BPF and prepare for adding Extended BPF support instead.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fbc3e4fcc9c8f6131d6c705212530b2aa50149ee.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
c7393a71eb powerpc/signal32: Simplify logging in sigreturn()
Same spirit as commit debf122c77 ("powerpc/signal32: Simplify logging
in handle_rt_signal32()"), remove this intermediate 'addr' local var.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/638fa99530beb29f82f94370057d110e91272acc.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:20 +11:00
Christophe Leroy
887f3ceb51 powerpc/signal32: Convert do_setcontext[_tm]() to user access block
Add unsafe_get_user_sigset() and transform PPC32 get_sigset_t()
into an unsafe version unsafe_get_sigset_t().

Then convert do_setcontext() and do_setcontext_tm() to use
user_read_access_begin/end.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9273ba664db769b8d9c7540ae91395e346e4945e.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
627b72bee8 powerpc/signal32: Convert restore_[tm]_user_regs() to user access block
Convert restore_user_regs() and restore_tm_user_regs()
to use user_access_read_begin/end blocks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/181adf15a6f644efcd1aeafb355f3578ff1b6bc5.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
036fc2cb1d powerpc/signal32: Reorder user reads in restore_tm_user_regs()
In restore_tm_user_regs(), regroup the reads from 'sr' and the ones
from 'tm_sr' together in order to allow two block user accesses
in following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7c518b9a4c8e5ae9a3bfb647bc8b20bf820233af.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
362471b319 powerpc/signal32: Perform access_ok() inside restore_user_regs()
In preparation of using user_access_begin/end in restore_user_regs(),
move the access_ok() inside the function.

It makes no difference as the behaviour on a failed access_ok() is
the same as on failed restore_user_regs().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c106eb2f37c3040f1fd38b40e50c670feb7cb835.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
ca9e1605cd powerpc/signal32: Remove ifdefery in middle of if/else in sigreturn()
In the same spirit as commit f1cf4f93de ("powerpc/signal32: Remove
ifdefery in middle of if/else")

MSR_TM_ACTIVE() is always defined and returns always 0 when
CONFIG_PPC_TRANSACTIONAL_MEM is not selected, so the awful
ifdefery in the middle of an if/else can be removed.

Make 'msr_hi' a 'long long' to avoid build failure on PPC32
due to the 32 bits left shift.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4b48b2f0be1ef13fc8e57452b7f8350da28d521.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
f918a81e20 powerpc/signal32: Rename save_user_regs_unsafe() and save_general_regs_unsafe()
Convention is to prefix functions with __unsafe_ instead of
suffixing it with _unsafe.

Rename save_user_regs_unsafe() and save_general_regs_unsafe()
accordingly, that is respectively __unsafe_save_general_regs() and
__unsafe_save_user_regs().

Suggested-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8cef43607e5b35a7fd0829dec812d88beb570df2.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
7c11f8893a powerpc/signal: Add unsafe_copy_ck{fpr/vsx}_from_user
Add unsafe_copy_ckfpr_from_user() and unsafe_copy_ckvsx_from_user()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1040687aa27553d19f749f7fb48f0c07af98ee2d.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy
c1cc1570bc powerpc/uaccess: Also perform 64 bits copies in unsafe_copy_from_user() on ppc32
Similarly to commit 5cf773fc8f37 ("powerpc/uaccess: Also perform
64 bits copies in unsafe_copy_to_user() on ppc32")

ppc32 has an efficiant 64 bits unsafe_get_user(), so also use it in
order to unroll loops more.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/308e65d9237a14e8c0e3b22919fcf0b5e5592608.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:18 +11:00
Christophe Leroy
5cd29b1fd3 powerpc/uaccess: Use asm goto for get_user when compiler supports it
clang 11 and future GCC are supporting asm goto with outputs.

Use it to implement get_user in order to get better generated code.

Note that clang requires to set x in the default branch of
__get_user_size_goto() otherwise is compliant about x not being
initialised :puzzled:

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/403745b5aaa1b315bb4e8e46c1ba949e77eecec0.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:16 +11:00
Christophe Leroy
035785ab28 powerpc/uaccess: Introduce __get_user_size_goto()
We have got two places doing a goto based on the result
of __get_user_size_allowed().

Refactor that into __get_user_size_goto().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/def8a39289e02653cfb1583b3b19837de9efed3a.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:13 +11:00
Christophe Leroy
e72fcdb26c powerpc/uaccess: Refactor get/put_user() and __get/put_user()
Make get_user() do the access_ok() check then call __get_user().
Make put_user() do the access_ok() check then call __put_user().

Then embed  __get_user_size() and __put_user_size() in
__get_user() and __put_user().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eebc554f6a81f570c46ea3551000ff5b886e4faa.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:10 +11:00
Christophe Leroy
17f8c0bc21 powerpc/uaccess: Rename __get/put_user_check/nocheck
__get_user_check() becomes get_user()
__put_user_check() becomes put_user()
__get_user_nocheck() becomes __get_user()
__put_user_nocheck() becomes __put_user()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/41d7e45f4733f0e61e63824e4865b4e049db74d6.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:08 +11:00
Christophe Leroy
f904c22f2a powerpc/uaccess: Split out __get_user_nocheck()
One part of __get_user_nocheck() is used for __get_user(),
the other part for unsafe_get_user().

Move the part dedicated to unsafe_get_user() in it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/618fe2e0626b308a5a063d5baac827b968e85c32.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:05 +11:00
Christophe Leroy
9975f852ce powerpc/uaccess: Remove calls to __get_user_bad() and __put_user_bad()
__get_user_bad() and __put_user_bad() are functions that are
declared but not defined, in order to make the link fail in
case they are called.

Nowadays, we have BUILD_BUG() and BUILD_BUG_ON() for that, and
they have the advantage to break the build earlier as it breaks
it at compile time instead of link time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d7d839e994f49fae4ff7b70fac72bd951272436b.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:02 +11:00
Christophe Leroy
028e156168 powerpc/uaccess: Remove __chk_user_ptr() in __get/put_user
Commit d02f6b7dab ("powerpc/uaccess: Evaluate macro arguments once,
before user access is allowed") changed the __chk_user_ptr()
argument from the passed ptr pointer to the locally
declared __gu_addr. But __gu_addr is locally defined as __user
so the check is pointless.

During kernel build __chk_user_ptr() voids and is only evaluated
during sparse checks so it should have been armless to leave the
original pointer check there.

Nevertheless, this check is indeed redundant with the assignment
above which casts the ptr pointer to the local __user __gu_addr.
In case of mismatch, sparse will detect it there, so the
__check_user_ptr() is not needed anywhere else than in access_ok().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69f17d75046733b891ab2e668dbf464787cdf598.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:59 +11:00
Christophe Leroy
be15a16579 powerpc/uaccess: Remove __unsafe_put_user_goto()
__unsafe_put_user_goto() is just an intermediate layer to
__put_user_size_goto() without added value other than doing
the __user pointer type checking.

Do the __user pointer type checking in __put_user_size_goto()
and remove __unsafe_put_user_goto().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b6552149209aebd887a6977272b06a41256bdb9f.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:55 +11:00
Christophe Leroy
ed0d9c66f9 powerpc/uaccess: Call might_fault() inconditionaly
Commit 6bfd93c32a ("powerpc: Fix incorrect might_sleep in
__get_user/__put_user on kernel addresses") added a check to not call
might_sleep() on kernel addresses. This was to enable the use of
__get_user() in the alignment exception handler for any address.

Then commit 95156f0051 ("lockdep, mm: fix might_fault() annotation")
added a check of the address space in might_fault(), based on
set_fs() logic. But this didn't solve the powerpc alignment exception
case as it didn't call set_fs(KERNEL_DS).

Nowadays, set_fs() is gone, previous patch fixed the alignment
exception handler and __get_user/__put_user are not supposed to be
used anymore to read kernel memory.

Therefore the is_kernel_addr() check has become useless and can be
removed.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e0a980a4dc7a2551183dd5cb30f46eafdbee390c.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:52 +11:00
Christophe Leroy
111631b5e9 powerpc/align: Don't use __get_user_instr() on kernel addresses
In the old days, when we didn't have kernel userspace access
protection and had set_fs(), it was wise to use __get_user()
and friends to read kernel memory.

Nowadays, get_user() is granting userspace access and is exclusively
for userspace access.

In alignment exception handler, use probe_kernel_read_inst()
instead of __get_user_instr() for reading instructions in kernel.

This will allow to remove the is_kernel_addr() check in
__get/put_user() in a following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9ecbce00178484e66ca7adec2ff210058037704.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:49 +11:00
Christophe Leroy
35506a3e2d powerpc/uaccess: Move get_user_instr helpers in asm/inst.h
Those helpers use get_user helpers but they don't participate
in their implementation, so they do not belong to asm/uaccess.h

Move them in asm/inst.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2c6e83581b4fa434aa7cf2fa7714c41e98f57007.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:45 +11:00
Christophe Leroy
bad956b8fe powerpc/uaccess: Remove __get/put_user_inatomic()
Powerpc is the only architecture having _inatomic variants of
__get_user() and __put_user() accessors. They were introduced
by commit e68c825bb0 ("[POWERPC] Add inatomic versions of __get_user
and __put_user").

Those variants expand to the _nosleep macros instead of expanding
to the _nocheck macros. The only difference between the _nocheck
and the _nosleep macros is the call to might_fault().

Since commit 662bbcb274 ("mm, sched: Allow uaccess in atomic with
pagefault_disable()"), __get/put_user() can be used in atomic parts
of the code, therefore __get/put_user_inatomic() have become useless.

Remove __get_user_inatomic() and __put_user_inatomic().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e5c895669e8d54a7810b62dc61eb111f33c2c37.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:41 +11:00
Christophe Leroy
3fa3db3295 powerpc/align: Convert emulate_spe() to user_access_begin
This patch converts emulate_spe() to using user_access_begin
logic.

Since commit 662bbcb274 ("mm, sched: Allow uaccess in atomic with
pagefault_disable()"), might_fault() doesn't fire when called from
sections where pagefaults are disabled, which must be the case
when using _inatomic variants of __get_user and __put_user. So
the might_fault() in user_access_begin() is not a problem.

There was a verification of user_mode() together with the access_ok(),
but there is a second verification of user_mode() just after, that
leads to immediate return. The access_ok() is now part of the
user_access_begin which is called after that other user_mode()
verification, so no need to check user_mode() again.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c95a648fdf75992c9d88f3c73cc23e7537fcf2ad.1615555354.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:39 +11:00
Christophe Leroy
9bd68dc5d7 powerpc/uaccess: Define ___get_user_instr() for ppc32
Define simple ___get_user_instr() for ppc32 instead of
defining ppc32 versions of the three get_user_instr()
helpers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e02f83ec74f26d76df2874f0ce4d5cc69c3469ae.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:32 +11:00
Christophe Leroy
8cdf748d55 powerpc/uaccess: Remove __get_user_allowed() and unsafe_op_wrap()
Those two macros have only one user which is unsafe_get_user().

Put everything in one place and remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/439179c5e54c18f2cb8bdf1eea13ea0ef6b98375.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:21:26 +11:00
Christophe Leroy
791f9e3659 powerpc/vdso: Make sure vdso_wrapper.o is rebuilt everytime vdso.so is rebuilt
Commit bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of
vgettimeofday.o") moved vdso32_wrapper.o and vdso64_wrapper.o out
of arch/powerpc/kernel/vdso[32/64]/ and removed the dependencies in
the Makefile. This leads to the wrappers not being re-build hence the
kernel embedding the old vdso library.

Add back missing dependencies to ensure vdso32_wrapper.o and vdso64_wrapper.o
are rebuilt when vdso32.so.dbg and vdso64.so.dbg are changed.

Fixes: bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8bb015bc98c51d8ced581415b7e3d157e18da7c9.1617181918.git.christophe.leroy@csgroup.eu
2021-04-02 00:18:09 +11:00
Christophe Leroy
acca57217c powerpc/signal32: Fix Oops on sigreturn with unmapped VDSO
PPC32 encounters a KUAP fault when trying to handle a signal with
VDSO unmapped.

	Kernel attempted to read user page (7fc07ec0) - exploit attempt? (uid: 0)
	BUG: Unable to handle kernel data access on read at 0x7fc07ec0
	Faulting instruction address: 0xc00111d4
	Oops: Kernel access of bad area, sig: 11 [#1]
	BE PAGE_SIZE=16K PREEMPT CMPC885
	CPU: 0 PID: 353 Comm: sigreturn_vdso Not tainted 5.12.0-rc4-s3k-dev-01553-gb30c310ea220 #4814
	NIP:  c00111d4 LR: c0005a28 CTR: 00000000
	REGS: cadb3dd0 TRAP: 0300   Not tainted  (5.12.0-rc4-s3k-dev-01553-gb30c310ea220)
	MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 48000884  XER: 20000000
	DAR: 7fc07ec0 DSISR: 88000000
	GPR00: c0007788 cadb3e90 c28d4a40 7fc07ec0 7fc07ed0 000004e0 7fc07ce0 00000000
	GPR08: 00000001 00000001 7fc07ec0 00000000 28000282 1001b828 100a0920 00000000
	GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e
	GPR24: ffffffff 105c43c8 00000000 7fc07ec8 cadb3f40 cadb3ec8 c28d4a40 00000000
	NIP [c00111d4] flush_icache_range+0x90/0xb4
	LR [c0005a28] handle_signal32+0x1bc/0x1c4
	Call Trace:
	[cadb3e90] [100d0000] 0x100d0000 (unreliable)
	[cadb3ec0] [c0007788] do_notify_resume+0x260/0x314
	[cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184
	[cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28
	--- interrupt: c00 at 0xfe807f8
	NIP:  0fe807f8 LR: 10001060 CTR: c0139378
	REGS: cadb3f40 TRAP: 0c00   Not tainted  (5.12.0-rc4-s3k-dev-01553-gb30c310ea220)
	MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 28000482  XER: 20000000

	GPR00: 00000025 7fc081c0 77bb1690 00000000 0000000a 28000482 00000001 0ff03a38
	GPR08: 0000d032 00006de5 c28d4a40 00000009 88000482 1001b828 100a0920 00000000
	GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e
	GPR24: ffffffff 105c43c8 00000000 77ba7628 10002398 10010000 10002124 00024000
	NIP [0fe807f8] 0xfe807f8
	LR [10001060] 0x10001060
	--- interrupt: c00
	Instruction dump:
	38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c 4e800020 7c001fac
	2c0a0000 38630010 4082ffcc 4bffffe4 <7c00186c> 2c070000 39430010 4082ff8c
	---[ end trace 3973fb72b049cb06 ]---

This is because flush_icache_range() is called on user addresses.

The same problem was detected some time ago on PPC64. It was fixed by
enabling KUAP in commit 59bee45b97 ("powerpc/mm: Fix missing KUAP
disable in flush_coherent_icache()").

PPC32 doesn't use flush_coherent_icache() and fallbacks on
clean_dcache_range() and invalidate_icache_range().

We could fix it similarly by enabling user access in those functions,
but this is overkill for just flushing two instructions.

The two instructions are 8 bytes aligned, so a single dcbst/icbi is
enough to flush them. Do like __patch_instruction() and inline
a dcbst followed by an icbi just after the write of the instructions,
while user access is still allowed. The isync is not required because
rfi will be used to return to user.

icbi() is handled as a read so read-write user access is needed.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bde9154e5351a5ac7bca3d59cdb5a5e8edacbb79.1617199569.git.christophe.leroy@csgroup.eu
2021-04-02 00:16:23 +11:00
Christophe Leroy
3618250c83 powerpc/ptrace: Don't return error when getting/setting FP regs without CONFIG_PPC_FPU_REGS
An #ifdef CONFIG_PPC_FPU_REGS is missing in arch_ptrace() leading
to the following Oops because [REGSET_FPR] entry is not initialised in
native_regsets[].

[   41.917608] BUG: Unable to handle kernel instruction fetch
[   41.922849] Faulting instruction address: 0xff8fd228
[   41.927760] Oops: Kernel access of bad area, sig: 11 [#1]
[   41.933089] BE PAGE_SIZE=4K PREEMPT CMPC885
[   41.940753] Modules linked in:
[   41.943768] CPU: 0 PID: 366 Comm: gdb Not tainted 5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty #4835
[   41.952800] NIP:  ff8fd228 LR: c004d9e0 CTR: ff8fd228
[   41.957790] REGS: caae9df0 TRAP: 0400   Not tainted  (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty)
[   41.966741] MSR:  40009032 <EE,ME,IR,DR,RI>  CR: 82004248  XER: 20000000
[   41.973540]
[   41.973540] GPR00: c004d9b4 caae9eb0 c1b64f60 c1b64520 c0713cd4 caae9eb8 c1bacdfc 00000004
[   41.973540] GPR08: 00000200 ff8fd228 c1bac700 00001032 28004242 1061aaf4 00000001 106d64a0
[   41.973540] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538
[   41.973540] GPR24: 7fa0a580 7fa0a570 c1bacc00 c1b64520 c1bacc00 caae9ee8 00000108 c0713cd4
[   42.009685] NIP [ff8fd228] 0xff8fd228
[   42.013300] LR [c004d9e0] __regset_get+0x100/0x124
[   42.018036] Call Trace:
[   42.020443] [caae9eb0] [c004d9b4] __regset_get+0xd4/0x124 (unreliable)
[   42.026899] [caae9ee0] [c004da94] copy_regset_to_user+0x5c/0xb0
[   42.032751] [caae9f10] [c002f640] sys_ptrace+0xe4/0x588
[   42.037915] [caae9f30] [c0011010] ret_from_syscall+0x0/0x28
[   42.043422] --- interrupt: c00 at 0xfd1f8e4
[   42.047553] NIP:  0fd1f8e4 LR: 1004a688 CTR: 00000000
[   42.052544] REGS: caae9f40 TRAP: 0c00   Not tainted  (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty)
[   42.061494] MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 48004442  XER: 00000000
[   42.068551]
[   42.068551] GPR00: 0000001a 7fa0a040 77dad7e0 0000000e 00000170 00000000 7fa0a078 00000004
[   42.068551] GPR08: 00000000 108deb88 108dda40 106d6010 44004442 1061aaf4 00000001 106d64a0
[   42.068551] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538
[   42.068551] GPR24: 7fa0a580 7fa0a570 1078fe00 1078fd70 1078fd70 00000170 0fdd3244 0000000d
[   42.104696] NIP [0fd1f8e4] 0xfd1f8e4
[   42.108225] LR [1004a688] 0x1004a688
[   42.111753] --- interrupt: c00
[   42.114768] Instruction dump:
[   42.117698] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[   42.125443] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[   42.133195] ---[ end trace d35616f22ab2100c ]---

Adding the missing #ifdef is not good because gdb doesn't like getting
an error when getting registers.

Instead, make ptrace return 0s when CONFIG_PPC_FPU_REGS is not set.

Fixes: b6254ced4d ("powerpc/signal: Don't manage floating point regs when no FPU")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9121a44a2d50ba1af18d8aa5ada06c9a3bea8afd.1617200085.git.christophe.leroy@csgroup.eu
2021-04-02 00:15:37 +11:00
Aneesh Kumar K.V
937c49d10b powerpc/mm: Revert "powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc"
This reverts commit 675bceb097 ("powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc")

All the related issues are fixed as of commit:
  f14312e1ed ("mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.")

Hence re-enable it.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210318034855.74513-1-aneesh.kumar@linux.ibm.com
2021-03-31 16:46:55 +11:00
Michael Ellerman
11d92156f7 powerpc/pseries: Only register vio drivers if vio bus exists
The vio bus is a fake bus, which we use on pseries LPARs (guests) to
discover devices provided by the hypervisor. There's no need or sense
in creating the vio bus on bare metal systems.

Which is why commit 4336b93378 ("powerpc/pseries: Make vio and
ibmebus initcalls pseries specific") made the initialisation of the
vio bus only happen in LPARs.

However as a result of that commit we now see errors at boot on bare
metal systems:

  Driver 'hvc_console' was unable to register with bus_type 'vio' because the bus was not initialized.
  Driver 'tpm_ibmvtpm' was unable to register with bus_type 'vio' because the bus was not initialized.

This happens because those drivers are built-in, and are calling
vio_register_driver(). It in turn calls driver_register() with a
reference to vio_bus_type, but we haven't registered vio_bus_type with
the driver core.

Fix it by also guarding vio_register_driver() with a check to see if
we are on pseries.

Fixes: 4336b93378 ("powerpc/pseries: Make vio and ibmebus initcalls pseries specific")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20210316010938.525657-1-mpe@ellerman.id.au
2021-03-31 14:32:58 +11:00
dingsenjie
69931cc387 powerpc/powernv: Remove unneeded variable: "rc"
Remove unneeded variable: "rc".

Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210326115356.12444-1-dingsenjie@163.com
2021-03-29 13:22:19 +11:00
Chen Huang
4fe529449d powerpc: Fix HAVE_HARDLOCKUP_DETECTOR_ARCH build configuration
When compiling the powerpc with the SMP disabled, it shows the issue:

arch/powerpc/kernel/watchdog.c: In function ‘watchdog_smp_panic’:
arch/powerpc/kernel/watchdog.c:177:4: error: implicit declaration of function ‘smp_send_nmi_ipi’; did you mean ‘smp_send_stop’? [-Werror=implicit-function-declaration]
  177 |    smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000);
      |    ^~~~~~~~~~~~~~~~
      |    smp_send_stop
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:273: arch/powerpc/kernel/watchdog.o] Error 1
make[1]: *** [scripts/Makefile.build:534: arch/powerpc/kernel] Error 2
make: *** [Makefile:1980: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs....

We found that powerpc used ipi to implement hardlockup watchdog, so the
HAVE_HARDLOCKUP_DETECTOR_ARCH should depend on the SMP.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210327094900.938555-1-chenhuang5@huawei.com
2021-03-29 13:22:19 +11:00
Daniel Henrique Barboza
d19b3ad02c powerpc/pseries/hotplug-cpu: Show 'last online CPU' error in dlpar_cpu_offline()
One of the reasons that dlpar_cpu_offline can fail is when attempting to
offline the last online CPU of the kernel. This can be observed in a
pseries QEMU guest that has hotplugged CPUs. If the user offlines all
other CPUs of the guest, and a hotplugged CPU is now the last online
CPU, trying to reclaim it will fail.

The current error message in this situation returns rc with -EBUSY and a
generic explanation, e.g.:

  pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16

EBUSY can be caused by other conditions, such as cpu_hotplug_disable
being true. Throwing a more specific error message for this case,
instead of just "Failed to offline CPU", makes it clearer that the error
is in fact a known error situation instead of other generic/unknown
cause.

This patch adds a 'last online' check in dlpar_cpu_offline() to catch
the 'last online CPU' offline error, eturning a more informative error
message:

  pseries-hotplug-cpu: Unable to remove last online CPU PowerPC,POWER9

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210323205056.52768-2-danielhb413@gmail.com
2021-03-29 13:22:18 +11:00
Christophe Leroy
48cf12d889 powerpc/irq: Inline call_do_irq() and call_do_softirq()
call_do_irq() and call_do_softirq() are simple enough to be
worth inlining.

Inlining them avoids an mflr/mtlr pair plus a save/reload on stack.

This is inspired from S390 arch. Several other arches do more or
less the same. The way sparc arch does seems odd thought.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210320122227.345427-1-mpe@ellerman.id.au
2021-03-29 13:22:17 +11:00
He Ying
d2313da4ff powerpc/setup_64: Fix sparse warnings
Sparse warns:
  warning: symbol 'rfi_flush' was not declared.
  warning: symbol 'entry_flush' was not declared.
  warning: symbol 'uaccess_flush' was not declared.

Define 'entry_flush' and 'uaccess_flush' as static because they are
not referenced outside the file. Include asm/security_features.h in
which 'rfi_flush' is declared.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: He Ying <heying24@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316041148.29694-1-heying24@huawei.com
2021-03-29 13:22:17 +11:00
Christophe Leroy
a329ddd472 powerpc/embedded6xx: Remove CONFIG_MV64X60
Commit 92c8c16f34 ("powerpc/embedded6xx: Remove C2K board support")
moved the last selector of CONFIG_MV64X60.

As it is not a user selectable config, it can be removed.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/19e57d16692dcd1ca67ba880d7273a57fab416aa.1616085654.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:17 +11:00
kernel test robot
bbbe563f84 powerpc/iommu/debug: fix ifnullfree.cocci warnings
arch/powerpc/kernel/iommu.c:76:2-16: WARNING: NULL check before some freeing functions is not needed.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Fixes: 691602aab9 ("powerpc/iommu/debug: Add debugfs entries for IOMMU tables")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210318234441.GA63469@f8e20a472e81
2021-03-29 13:22:17 +11:00
Christophe Leroy
a230883688 powerpc: Fix arch_stack_walk() to have running function as first entry
It seems like other architectures, namely x86 and arm64 and riscv
at least, include the running function as top entry when saving
stack trace with save_stack_trace_regs().

Functionnalities like KFENCE expect it.

Do the same on powerpc, it allows KFENCE and other users to
properly identify the faulting function as depicted below.
Before the patch KFENCE was identifying finish_task_switch.isra
as the faulting function.

[   14.937370] ==================================================================
[   14.948692] BUG: KFENCE: invalid read in test_invalid_access+0x54/0x108
[   14.948692]
[   14.956814] Invalid read at 0xdf98800a:
[   14.960664]  test_invalid_access+0x54/0x108
[   14.964876]  finish_task_switch.isra.0+0x54/0x23c
[   14.969606]  kunit_try_run_case+0x5c/0xd0
[   14.973658]  kunit_generic_run_threadfn_adapter+0x24/0x30
[   14.979079]  kthread+0x15c/0x174
[   14.982342]  ret_from_kernel_thread+0x14/0x1c
[   14.986731]
[   14.988236] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G    B             5.12.0-rc1-01537-g95f6e2088d7e-dirty #4682
[   14.999795] NIP:  c016ec2c LR: c02f517c CTR: c016ebd8
[   15.004851] REGS: e2449d90 TRAP: 0301   Tainted: G    B              (5.12.0-rc1-01537-g95f6e2088d7e-dirty)
[   15.015274] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 22000004  XER: 00000000
[   15.022043] DAR: df98800a DSISR: 20000000
[   15.022043] GPR00: c02f517c e2449e50 c1142080 e100dd24 c084b13c 00000008 c084b32b c016ebd8
[   15.022043] GPR08: c0850000 df988000 c0d10000 e2449eb0 22000288
[   15.040581] NIP [c016ec2c] test_invalid_access+0x54/0x108
[   15.046010] LR [c02f517c] kunit_try_run_case+0x5c/0xd0
[   15.051181] Call Trace:
[   15.053637] [e2449e50] [c005a68c] finish_task_switch.isra.0+0x54/0x23c (unreliable)
[   15.061338] [e2449eb0] [c02f517c] kunit_try_run_case+0x5c/0xd0
[   15.067215] [e2449ed0] [c02f648c] kunit_generic_run_threadfn_adapter+0x24/0x30
[   15.074472] [e2449ef0] [c004e7b0] kthread+0x15c/0x174
[   15.079571] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c
[   15.085798] Instruction dump:
[   15.088784] 8129d608 38e7ebd8 81020280 911f004c 39000000 995f0024 907f0028 90ff001c
[   15.096613] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 3908adb0 812a4b98 3d40c02f
[   15.104612] ==================================================================

Fixes: 35de3b1aa1 ("powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/21324f9e2f21d1640c8397b4d1d857a9355a2283.1615881400.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:16 +11:00
Christophe Leroy
a1cdef04f2 powerpc: Convert stacktrace to generic ARCH_STACKWALK
This patch converts powerpc stacktrace to the generic ARCH_STACKWALK
implemented by commit 214d8ca6ee ("stacktrace: Provide common
infrastructure")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/73b36bbb101299760b95ecd2cd3a46554bea8bf9.1615881400.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:16 +11:00
Christophe Leroy
826a307b0a powerpc: Rename 'tsk' parameter into 'task'
To better match generic code, rename 'tsk' to 'task' in
some stacktrace functions in preparation of following
patch which converts powerpc to generic ARCH_STACKWALK.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/117f0200e11961af6c0fdf85c98373e5dcf96a47.1615881400.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:16 +11:00
Christophe Leroy
accdd093f2 powerpc: Activate HAVE_RELIABLE_STACKTRACE for all
CONFIG_HAVE_RELIABLE_STACKTRACE is applicable to all, no
reason to limit it to book3s/64le

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/955248c6423cb068c5965923121ba31d4dd2fdde.1615881400.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:15 +11:00
Aneesh Kumar K.V
8b8adeb300 powerpc/book3s64/kuap: Move Kconfig varriables to BOOK3S_64
With below two commits:
commit c91435d95c ("powerpc/book3s64/hash/kuep: Enable KUEP on hash")
commit b2ff33a10c ("powerpc/book3s64/hash/kuap: Enable kuap on hash")
the kernel now supports kuap/kuep with hash translation. Hence select the
Kconfig even when radix is disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210318034829.72255-1-aneesh.kumar@linux.ibm.com
2021-03-29 13:22:15 +11:00
Bhaskar Chowdhury
89f7d2927a powerpc/kernel: Trivial typo fix in kgdb.c
s/procesing/processing/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210317090413.120891-1-unixbhaskar@gmail.com
2021-03-29 13:22:15 +11:00
Nicholas Piggin
1479e3d3b7 powerpc/64s: Fix hash fault to use TRAP accessor
Hash faults use the trap vector to decide whether this is an
instruction or data fault. This should use the TRAP accessor
rather than open access regs->trap.

This won't cause a problem at the moment because 64s only uses
trap flags for system call interrupts (the norestart flag), but
that could change if any other trap flags get used in future.

Fixes: a4922f5442 ("powerpc/64s: move the hash fault handling logic to C")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316105205.407767-1-npiggin@gmail.com
2021-03-29 13:22:15 +11:00
Christophe Leroy
98c26a7275 powerpc/mm: Remove unneeded #ifdef CONFIG_PPC_MEM_KEYS
In fault.c, #ifdef CONFIG_PPC_MEM_KEYS is not needed because all
functions are always defined, and arch_vma_access_permitted()
always returns true when CONFIG_PPC_MEM_KEYS is not defined so
access_pkey_error() will return false so bad_access_pkey()
will never be called.

Include linux/pkeys.h to get a definition of vma_pkeys() for
bad_access_pkey().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8038392f38d81f2ad169347efac29146f553b238.1615819955.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:15 +11:00
Michael Ellerman
b77878052a powerpc/fsl-pci: Fix section mismatch warning
Section mismatch in reference from the function .fsl_add_bridge() to
the function .init.text:.setup_pci_cmd()

fsl_add_bridge() is not __init, and can't be, and is the only caller
of setup_pci_cmd(). Fix it by making setup_pci_cmd() non-init.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093341.132986-1-mpe@ellerman.id.au
2021-03-29 13:22:14 +11:00
Michael Ellerman
55c2f5574a powerpc: Fix section mismatch warning in smp_setup_pacas()
Section mismatch in reference from the function .smp_setup_pacas() to
the function .init.text:.allocate_paca()

The only caller of smp_setup_pacas() is setup_arch() which is __init,
so mark smp_setup_pacas() __init.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093333.132657-1-mpe@ellerman.id.au
2021-03-29 13:22:14 +11:00
Michael Ellerman
c2a2a5d027 powerpc/64s: Fold update_current_thread_[i]amr() into their only callers
lkp reported warnings in some configuration due to
update_current_thread_amr() being unused:

  arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr'
  static inline void update_current_thread_amr(u64 value)

Which is because it's only use is inside an ifdef. We could move it
inside the ifdef, but it's a single line function and only has one
caller, so just fold it in.

Similarly update_current_thread_iamr() is small and only called once,
so fold it in also.

Fixes: 48a8ab4eeb ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093320.132331-1-mpe@ellerman.id.au
2021-03-29 13:22:14 +11:00
Michael Ellerman
7a7685acd2 powerpc/eeh: Fix build failure with CONFIG_PROC_FS=n
The build fails with CONFIG_PROC_FS=n:

  arch/powerpc/kernel/eeh.c:1571:12: error: ‘proc_eeh_show’ defined but not used
   1571 | static int proc_eeh_show(struct seq_file *m, void *v)

Wrap proc_eeh_show() in an ifdef to avoid it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093300.131998-1-mpe@ellerman.id.au
2021-03-29 13:22:14 +11:00
Jiapeng Chong
7a0fdc19f2 powerpc/pci: fix warning comparing pointer to 0
Fix the following coccicheck warning:

./arch/powerpc/platforms/maple/pci.c:37:16-17: WARNING comparing pointer
to 0.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1615793724-97015-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-03-29 13:22:13 +11:00
Yang Li
9214cf0f48 powerpc/xive: use true and false for bool variable
fixed the following coccicheck:
./arch/powerpc/sysdev/xive/spapr.c:552:8-9: WARNING: return of 0/1 in
function 'xive_spapr_match' with return type bool

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1615793096-83758-1-git-send-email-yang.lee@linux.alibaba.com
2021-03-29 13:22:13 +11:00
Christophe Leroy
6eeca7a113 powerpc/asm-offsets: GPR14 is not needed either
Commit aac6a91fea ("powerpc/asm: Remove unused symbols in
asm-offsets.c") removed GPR15 to GPR31 but kept GPR14,
probably because it pops up in a couple of comments when doing
a grep.

However, it was never used either, so remove it as well.

Fixes: aac6a91fea ("powerpc/asm: Remove unused symbols in asm-offsets.c")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9881c68fbca004f9ea18fc9473f630e11ccd6417.1615806071.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:13 +11:00
Christophe Leroy
e448e1e774 powerpc/math: Fix missing __user qualifier for get_user() and other sparse warnings
Sparse reports the following problems:

arch/powerpc/math-emu/math.c:228:21: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:31: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:41: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:51: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:237:13: warning: incorrect type in initializer (different address spaces)
arch/powerpc/math-emu/math.c:237:13:    expected unsigned int [noderef] __user *_gu_addr
arch/powerpc/math-emu/math.c:237:13:    got unsigned int [usertype] *
arch/powerpc/math-emu/math.c:226:1: warning: symbol 'do_mathemu' was not declared. Should it be static?

Add missing __user qualifier when casting pointer used in get_user()

Use NULL instead of 0 to initialise opX local variables.

Add a prototype for do_mathemu() (Added in processor.h like sparc)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e4d1aae7604d89c98a52dfd8ce8443462e595670.1615809591.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:12 +11:00
Bhaskar Chowdhury
7a7d744ffe powerpc/mm/book3s64: Fix a typo in mmu_context.c
s/detalis/details/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210312112537.4585-1-unixbhaskar@gmail.com
2021-03-29 13:22:12 +11:00
Bhaskar Chowdhury
f239873fcd powerpc/64e: Trivial spelling fixes throughout head_fsl_booke.S
Trivial spelling fixes throughout the file.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314220436.3417083-1-unixbhaskar@gmail.com
2021-03-29 13:22:12 +11:00
Christophe Leroy
802b556039 powerpc/Makefile: Remove workaround for gcc versions below 4.9
Commit 6ec4476ac8 ("Raise gcc version requirement to 4.9")
made it impossible to build with gcc 4.8 and under.

Remove related workaround.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a1e552006b8c51f23edd2f6cabdd9a986c631146.1615380184.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:11 +11:00
Christophe Leroy
c16728835e powerpc/32: Manage KUAP in C
Move all KUAP management in C.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/199365ddb58d579daf724815f2d0acb91cc49d19.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:11 +11:00
Christophe Leroy
0b45359aa2 powerpc/8xx: Create C version of kuap save/restore/check helpers
In preparation of porting PPC32 to C syscall entry/exit,
create C version of kuap_save_and_lock() and kuap_user_restore() and
kuap_kernel_restore() and kuap_assert_locked() and
kuap_get_and_assert_locked() on 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156a7c4b669d26785391422a5581a1d919544c9a.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:11 +11:00
Christophe Leroy
21eb58ae4f powerpc/32s: Create C version of kuap save/restore/check helpers
In preparation of porting PPC32 to C syscall entry/exit,
create C version of kuap_save_and_lock() and kuap_user_restore() and
kuap_kernel_restore() and kuap_assert_locked() and
kuap_get_and_assert_locked() on book3s/32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2be8fb729da4a0f9863b25e1b9d547174fcd5056.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:11 +11:00
Christophe Leroy
ad2d234477 powerpc/64s: Make kuap_check_amr() and kuap_get_and_check_amr() generic
In preparation of porting powerpc32 to C syscall entry/exit,
rename kuap_check_amr() and kuap_get_and_check_amr() as
kuap_assert_locked() and kuap_get_and_assert_locked(), and move in the
generic asm/kup.h the stub for when CONFIG_PPC_KUAP is not selected.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f82614d9b17b83abd739aa18fc08811815d0c2e3.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:11 +11:00
Christophe Leroy
b5efec00b6 powerpc/32s: Move KUEP locking/unlocking in C
This can be done in C, do it.

Unrolling the loop gains approx. 15% performance.

From now on, prepare_transfer_to_handler() is only for
interrupts from kernel.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4eadd873927e9a73c3d1dfe2f9497353465514cf.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:10 +11:00
Christophe Leroy
a2b3e09ae4 powerpc/32: Only use prepare_transfer_to_handler function on book3s/32 and e500
Only book3s/32 and e500 have significative work to do in
prepare_transfer_to_handler.

Other 32 bit have nothing to do at all.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b5e29ca0e557c11340415a13fe8b107189d315e1.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:10 +11:00
Christophe Leroy
a5d33be051 powerpc/32: Return directly from power_save_ppc32_restore()
transfer_to_handler_cont: is now just a blr.

Directly perform blr in power_save_ppc32_restore().

Also remove useless setting of r11 in e500 version of
power_save_ppc32_restore().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e337506e08a4df95b11d2290104b92f0dcdb5548.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:10 +11:00
Christophe Leroy
16db54369d powerpc/32: Save remaining registers in exception prolog
Save non volatile registers, XER, CTR, MSR and NIP in exception prolog.

Also assign proper value to r2 and r3 there.

For now, recalculate thread pointer in prepare_transfer_to_handler.
It will disappear once KUAP is ported to C.

And remove the comment which is now completely wrong.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/56f0cde9dd0362edf2ddba4d887552013eee7329.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:10 +11:00
Christophe Leroy
a305597850 powerpc/32: Refactor saving of volatile registers in exception prologs
Exception prologs all do the same at the end:
- Save trapno in stack
- Mark stack with exception marker
- Save r0
- Save r3 to r8

Refactor that into a COMMON_EXCEPTION_PROLOG_END macro.
At the same time use r1 instead of r11.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e1c45d2e895e0693c42d2a6840df1105a148efea.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:10 +11:00
Christophe Leroy
acc142b623 powerpc/32: Remove the xfer parameter in EXCEPTION() macro
The xfer parameter is not used anymore, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/17c7d68bd18f7d2f1ab24a1a20d9ed33bbcda741.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:09 +11:00
Christophe Leroy
4c0104a83f powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
In order to get more control in exception prolog, dismantle
all non standard exception macros, finishing with EXC_XFER_STD
and EXC_XFER_LITE and EXC_XFER_TEMPLATE.

Also remove transfer_to_handler_full and ret_from_except and
ret_from_except_full as they are not used anymore.

Last parameter of EXCEPTION() is now ignored, will be removed
in a later patch to avoid too much churn.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ca5795d04a220586b7037dbbbe6951dfa9e768eb.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:09 +11:00
Christophe Leroy
8f6ff5bd9b powerpc/32: Only restore non volatile registers when required
Until now, non volatile registers were restored everytime they
were saved, ie using EXC_XFER_STD meant saving and restoring
them while EXC_XFER_LITE meant neither saving not restoring them.

Now that they are always saved, EXC_XFER_STD means to restore
them and EXC_XFER_LITE means to not restore them.

Most of the users of EXC_XFER_STD only need to retrieve the
non volatile registers. For them there is no need to restore
the non volatile registers as they have not been modified.

Only very few exceptions require non volatile registers restore.

Opencode the few places which require saving of non volatile
registers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1cb12d8023cc6afc1f07150565571373c04945c.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:09 +11:00
Christophe Leroy
bce4c26a4e powerpc/32: Add a prepare_transfer_to_handler macro for exception prologs
In order to increase flexibility, add a macro that will for now
call transfer_to_handler.

As transfer_to_handler doesn't do the actual transfer anymore,
also name it prepare_transfer_to_handler. The following patches
will progressively remove the use of transfer_to_handler label.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7f757c52518ab1d7b27ad5113b10f860e803f467.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:09 +11:00
Christophe Leroy
719e7e212c powerpc/32: Save trap number on stack in exception prolog
Saving the trap number into the stack goes into
the exception prolog, as EXC_XFER_xxx will soon disappear.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ac7a0c9cde2ec2b23cd79e3a54cfedd816a91ae.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:08 +11:00
Christophe Leroy
af6f2ce84b powerpc/32: Call bad_page_fault() from do_page_fault()
Now that non volatile registers are saved at all time, no
need to split bad_page_fault() out of do_page_fault().

Remove handle_page_fault() and use do_page_fault() directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cfb95be8863204cc2bf45a22ea44dd1d0dc16b7f.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:08 +11:00
Christophe Leroy
e72915560b powerpc/32: Set regs parameter in r3 in transfer_to_handler
All exception handlers take regs as first parameter.

Instead of setting r3 just before each call to a handler, set
it in transfer_to_handler.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f994a379bb895a2cbd518cb82460ad3f3d3ccdf5.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:08 +11:00
Christophe Leroy
db297c3b07 powerpc/32: Don't save thread.regs on interrupt entry
Since commit 06d67d5474 ("powerpc: make process.c suitable for both
32-bit and 64-bit"), thread.regs is set on task creation, no need to
set it again and again at each interrupt entry as it never change.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20d52c627303d63e461797df13e6890fc04017d0.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:08 +11:00
Christophe Leroy
b96bae3ae2 powerpc/32: Replace ASM exception exit by C exception exit from ppc64
This patch replaces the PPC32 ASM exception exit by C exception exit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/48f8bae91da899d8e73fc0d75c9af66cc97b4d5b.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:07 +11:00
Christophe Leroy
e9f99704aa powerpc/32: Always save non volatile registers on exception entry
In preparation of handling exception entry and exit in C,
in order to simplify the handling, always save non volatile registers
when entering an exception.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3ce8ced87a4f1467fa36fcc50763d53b45e466c1.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:07 +11:00
Christophe Leroy
0f2793e33d powerpc/32: Perform normal function call in exception entry
Now that the MMU is re-enabled before calling the transfer function,
we don't need anymore that hack with the address of the handler and
the return function sitting just after the 'bl' to the transfer
fonction, that function is retrieving via a read relative to 'lr'.

Do a regular call to the transfer function, then to the handler,
then branch to the return function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/73c00f3361ca280ef8fd7814c291bd1f5b6e2081.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:07 +11:00
Christophe Leroy
32d2ca0e96 powerpc/32: Refactor booke critical registers saving
Refactor booke critical registers saving into a few macros
and move it into the exception prolog directly.

Keep the dedicated transfert_to_handler entry point for the
moment allthough they are empty. They will be removed in a
later patch to reduce churn.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/269171496f1f5f22afa621695bded22976c9d48d.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:07 +11:00
Christophe Leroy
8f844c06f4 powerpc/32: Provide a name to exception prolog continuation in virtual mode
Now that the prolog continuation is separated in .text, give it a name
and mark it _ASM_NOKPROBE_SYMBOL.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d96374218815a6627e1e922ab2aba994050fb87a.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:06 +11:00
Christophe Leroy
dc13b889b5 powerpc/32: Move exception prolog code into .text once MMU is back on
The space in the head section is rather constrained by the fact that
exception vectors are spread every 0x100 bytes and sometimes we
need to have "out of line" code because it doesn't fit.

Now that we are enabling MMU early in the prolog, take that opportunity
to jump somewhere else in the .text section where we don't have any
space constraint.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/38b31ca4bc782a4985bc7952a675404d7ff27c24.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:06 +11:00
Christophe Leroy
7bf1d7e1ab powerpc/32: Use START_EXCEPTION() as much as possible
Everywhere where it is possible, use START_EXCEPTION().

This will help for proper exception init in future patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d47c1cc242bbbef8658327503726abdaef9b63ef.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:06 +11:00
Christophe Leroy
5b5e5bc53d powerpc/32: Add vmap_stack_overflow label inside the macro
For consistency, add in the macro the label used by exception prolog
to branch to stack overflow processing.

While at it, enclose the macro in #ifdef CONFIG_VMAP_STACK on the 8xx
as already done on book3s/32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cf80056f5b946572ad98aea9d915dd25b23beda6.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:06 +11:00
Christophe Leroy
a4719f5bb6 powerpc/32: Statically initialise first emergency context
The check of the emergency context initialisation in
vmap_stack_overflow is buggy for the SMP case, as it
compares r1 with 0 while in the SMP case r1 is offseted
by the CPU id.

Instead of fixing it, just perform static initialisation
of the first emergency context.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a67ba422be75713286dca0c86ee0d3df2eb6dfa.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:06 +11:00
Christophe Leroy
9b6150fb89 powerpc/32: Enable instruction translation at the same time as data translation
On 40x and 8xx, kernel text is pinned.
On book3s/32, kernel text is mapped by BATs.

Enable instruction translation at the same time as data translation, it
makes things simpler.

In syscall handler, MSR_RI can also be set at the same time because
srr0/srr1 are already saved and r1 is set properly.

On booke, translation is always on, so at the end all PPC32
have translation on early. Just update msr.

Also update comment in power_save_ppc32_restore().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5269c7e5f5d2117358af3a89744d75a116be27b0.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Christophe Leroy
5b1c9a0d7f powerpc/32: Tag DAR in EXCEPTION_PROLOG_2 for the 8xx
8xx requires to tag the DAR with a magic value in order to
fixup DAR on faults generated by 'dcbX', as the 8xx
forgets to update the DAR for those faults.

Do the tagging as early as possible, that is before enabling MMU.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/853a2e28ca7c5fc85617037030f99fe6070c9536.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Christophe Leroy
7aa8dd67f1 powerpc/32: Always enable data translation in exception prolog
If the code can use a stack in vm area, it can also use a
stack in linear space.

Simplify code by removing old non VMAP stack code on PPC32.

That means the data translation is now re-enabled early in
exception prolog in all cases, not only when using VMAP stacks.

While we are touching EXCEPTION_PROLOG macros, remove the
unused for_rtas parameter in EXCEPTION_PROLOG_1.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cd6440c60a7e8f4f035b245c57720f51e225aae.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Christophe Leroy
5747230645 powerpc/32: Remove ksp_limit
ksp_limit is there to help detect stack overflows.
That is specific to ppc32 as it was removed from ppc64 in
commit cbc9565ee8 ("powerpc: Remove ksp_limit on ppc64").

There are other means for detecting stack overflows.

As ppc64 has proven to not need it, ppc32 should be able to do
without it too.

Lets remove it and simplify exception handling.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d789c3385b22e07bedc997613c0d26074cb513e7.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Christophe Leroy
e464d92b29 powerpc/32: Use fast instruction to set MSR RI in exception prolog on 8xx
8xx has registers SPRN_NRI, SPRN_EID and SPRN_EIE for changing
MSR EE and RI.

Use SPRN_EID in exception prolog to set RI.

On an 8xx, it reduces the null_syscall test by 3 cycles.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f6bda827c2a2abce71ea7e07543e791163da33.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:04 +11:00
Christophe Leroy
79f4bb17f1 powerpc/32: Handle bookE debugging in C in exception entry
The handling of SPRN_DBCR0 and other registers can easily
be done in C instead of ASM.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6d6b2497115890b90cfa72a2b3ab1da5f78123c2.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:04 +11:00
Christophe Leroy
f93d866e14 powerpc/32: Entry cpu time accounting in C
There is no need for this to be in asm,
use the new interrupt entry wrapper.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daca4c3e05cdfe54d237162a0718b3aaca897662.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:04 +11:00
Christophe Leroy
be39e10506 powerpc/32: Reconcile interrupts in C
There is no need for this to be in asm anymore,
use the new interrupt entry wrapper.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/602e1ec47e15ca540f7edb9cf6feb6c249911bd6.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:04 +11:00
Christophe Leroy
0512aadd75 powerpc/40x: Prepare normal exception handler for enabling MMU early
Ensure normal exception handler are able to manage stuff with
MMU enabled. For that we use CONFIG_VMAP_STACK related code
allthough there is no intention to really activate CONFIG_VMAP_STACK
on powerpc 40x for the moment.

40x uses SPRN_DEAR instead of SPRN_DAR and SPRN_ESR instead of
SPRN_DSISR. Take it into account in common macros.

40x MSR value doesn't fit on 15 bits, use LOAD_REG_IMMEDIATE() in
common macros that will be used also with 40x.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/01963af2b83037bca270d7bf1336ffcf35da8282.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:04 +11:00
Christophe Leroy
0fc1e93481 powerpc/40x: Prepare for enabling MMU in critical exception prolog
In order the enable MMU early in exception prolog, implement
CONFIG_VMAP_STACK principles in critical exception prolog.

There is no intention to use CONFIG_VMAP_STACK on 40x,
but related code will be used to enable MMU early in exception
in a later patch.

Also address (critirq_ctx - PAGE_OFFSET) directly instead of
using tophys() in order to win one instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3fd75ee54c48307119acdbf66cfea966c1463bbd.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:03 +11:00
Christophe Leroy
26c468860c powerpc/40x: Reorder a few instructions in critical exception prolog
In order to ease preparation for CONFIG_VMAP_STACK, reorder
a few instruction, especially save r1 into stack frame earlier.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c895ecf958c86d1736bdd2ff6f36626b55f35fd2.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:03 +11:00
Christophe Leroy
fcd4b43c36 powerpc/40x: Save SRR0/SRR1 and r10/r11 earlier in critical exception
In order to be able to switch MMU on in exception prolog, save
SRR0 and SRR1 earlier.

Also save r10 and r11 into stack earlier to better match with the
normal exception prolog.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/79a93f253d72dc97ac968c9c62b5066960b688ed.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:03 +11:00
Christophe Leroy
9d3c18a11a powerpc/40x: Change CRITICAL_EXCEPTION_PROLOG macro to a gas macro
Change CRITICAL_EXCEPTION_PROLOG macro to a gas macro to
remove the ugly ; and \ on each line.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/73291fb9dc9ec58182c27a40dfc3db204e3f4024.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:03 +11:00
Christophe Leroy
52ae92cc29 powerpc/40x: Don't use SPRN_SPRG_SCRATCH0/1 in TLB miss handlers
SPRN_SPRG_SCRATCH5 is used to save SPRN_PID.
SPRN_SPRG_SCRATCH6 is already available.

SPRN_PID is only 8 bits. We have r12 that contains CR.
We only need to preserve CR0, so we have space available in r12
to save PID.

Keep PID in r12 and free up SPRN_SPRG_SCRATCH5.

Then In TLB miss handlers, instead of using SPRN_SPRG_SCRATCH0 and
SPRN_SPRG_SCRATCH1, use SPRN_SPRG_SCRATCH5 and SPRN_SPRG_SCRATCH6
to avoid future conflicts with normal exception prologs.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4cdaa85d38e14d594ba902424060ec55babf2c42.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:03 +11:00
Christophe Leroy
a58cbed683 powerpc/traps: Declare unrecoverable_exception() as __noreturn
unrecoverable_exception() is never expected to return, most callers
have an infiniteloop in case it returns.

Ensure it really never returns by terminating it with a BUG(), and
declare it __no_return.

It always GCC to really simplify functions calling it. In the exemple
below, it avoids the stack frame in the likely fast path and avoids
code duplication for the exit.

With this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	81 43 00 84 	lwz     r10,132(r3)
	 34c:	71 48 00 02 	andi.   r8,r10,2
	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
	 354:	71 4a 40 00 	andi.   r10,r10,16384
	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
	 35c:	80 62 00 70 	lwz     r3,112(r2)
	 360:	74 63 00 01 	andis.  r3,r3,1
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	7d 40 00 a6 	mfmsr   r10
	 36c:	7c 11 13 a6 	mtspr   81,r0
	 370:	7c 12 13 a6 	mtspr   82,r0
	 374:	4e 80 00 20 	blr
	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>
	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
	 380:	7c 08 02 a6 	mflr    r0
	 384:	90 01 00 14 	stw     r0,20(r1)
	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
				388: R_PPC_REL24	unrecoverable_exception
	 38c:	38 e2 00 70 	addi    r7,r2,112
	 390:	3d 00 00 01 	lis     r8,1
	 394:	7c c0 38 28 	lwarx   r6,0,r7
	 398:	7c c6 40 78 	andc    r6,r6,r8
	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
	 3a4:	38 60 00 01 	li      r3,1
	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>

Without this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	94 21 ff f0 	stwu    r1,-16(r1)
	 34c:	93 e1 00 0c 	stw     r31,12(r1)
	 350:	7c 7f 1b 78 	mr      r31,r3
	 354:	81 23 00 84 	lwz     r9,132(r3)
	 358:	71 2a 00 02 	andi.   r10,r9,2
	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
	 360:	71 29 40 00 	andi.   r9,r9,16384
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	80 62 00 70 	lwz     r3,112(r2)
	 36c:	74 63 00 01 	andis.  r3,r3,1
	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
	 374:	7d 20 00 a6 	mfmsr   r9
	 378:	7c 11 13 a6 	mtspr   81,r0
	 37c:	7c 12 13 a6 	mtspr   82,r0
	 380:	83 e1 00 0c 	lwz     r31,12(r1)
	 384:	38 21 00 10 	addi    r1,r1,16
	 388:	4e 80 00 20 	blr
	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
	 390:	7c 08 02 a6 	mflr    r0
	 394:	90 01 00 14 	stw     r0,20(r1)
	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
				398: R_PPC_REL24	unrecoverable_exception
	 39c:	80 01 00 14 	lwz     r0,20(r1)
	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
	 3a4:	7c 08 03 a6 	mtlr    r0
	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
	 3ac:	39 02 00 70 	addi    r8,r2,112
	 3b0:	3d 40 00 01 	lis     r10,1
	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
	 3b8:	7c e7 50 78 	andc    r7,r7,r10
	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
	 3c4:	38 60 00 01 	li      r3,1
	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e883e9d93fdb256853d1434c8ad77c257349b2d.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:02 +11:00
Ravi Bangoria
d943bc742a powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210311091538.368590-1-ravi.bangoria@linux.ibm.com
2021-03-29 12:52:24 +11:00
Christopher M. Riedl
d3ccc97815 powerpc/signal: Use __get_user() to copy sigset_t
Usually sigset_t is exactly 8B which is a "trivial" size and does not
warrant using __copy_from_user(). Use __get_user() directly in
anticipation of future work to remove the trivial size optimizations
from __copy_from_user().

The ppc32 implementation of get_sigset_t() previously called
copy_from_user() which, unlike __copy_from_user(), calls access_ok().
Replacing this w/ __get_user() (no access_ok()) is fine here since both
callsites in signal_32.c are preceded by an earlier access_ok().

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-11-cmr@codefail.de
2021-03-29 12:52:24 +11:00
Daniel Axtens
0f92433b8f powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.

Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-10-cmr@codefail.de
2021-03-29 12:52:23 +11:00
Daniel Axtens
96d7a4e06f powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.

There is no 'unsafe' version of copy_siginfo_to_user, so move it
slightly to allow for a "longer" uaccess block.

Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-9-cmr@codefail.de
2021-03-29 12:52:15 +11:00
Christopher M. Riedl
193323e100 powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()
Previously restore_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.

Rewrite restore_sigcontext() to assume that a userspace read access
window is open by replacing all uaccess functions with their 'unsafe'
versions. Modify the callers to first open, call
unsafe_restore_sigcontext(), and then close the uaccess window.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-8-cmr@codefail.de
2021-03-29 12:49:47 +11:00
Christopher M. Riedl
7bb081c8f0 powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
Previously setup_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.

Rewrite setup_sigcontext() to assume that a userspace write access window
is open by replacing all uaccess functions with their 'unsafe' versions.
Modify the callers to first open, call unsafe_setup_sigcontext() and
then close the uaccess window.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-7-cmr@codefail.de
2021-03-29 12:49:47 +11:00
Christopher M. Riedl
2d19630e20 powerpc/signal64: Remove TM ifdefery in middle of if/else block
Both rt_sigreturn() and handle_rt_signal_64() contain TM-related ifdefs
which break-up an if/else block. Provide stubs for the ifdef-guarded TM
functions and remove the need for an ifdef in rt_sigreturn().

Rework the remaining TM ifdef in handle_rt_signal64() similar to
commit f1cf4f93de ("powerpc/signal32: Remove ifdefery in middle of if/else").

Unlike in the commit for ppc32, the ifdef can't be removed entirely
since uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-6-cmr@codefail.de
2021-03-29 12:49:47 +11:00
Christopher M. Riedl
1a130b67c6 powerpc: Reference parameter in MSR_TM_ACTIVE() macro
Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
causes an 'unused variable' compile warning unless the variable is also
guarded with CONFIG_PPC_TRANSACTIONAL_MEM.

Reference but do nothing with the argument in the macro to avoid a
potential compile warning.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-5-cmr@codefail.de
2021-03-29 12:49:46 +11:00
Christopher M. Riedl
c6c9645e37 powerpc/signal64: Remove non-inline calls from setup_sigcontext()
The majority of setup_sigcontext() can be refactored to execute in an
"unsafe" context assuming an open uaccess window except for some
non-inline function calls. Move these out into a separate
prepare_setup_sigcontext() function which must be called first and
before opening up a uaccess window. Non-inline function calls should be
avoided during a uaccess window for a few reasons:

	- KUAP should be enabled for as much kernel code as possible.
	  Opening a uaccess window disables KUAP which means any code
	  executed during this time contributes to a potential attack
	  surface.

	- Non-inline functions default to traceable which means they are
	  instrumented for ftrace. This adds more code which could run
	  with KUAP disabled.

	- Powerpc does not currently support the objtool UACCESS checks.
	  All code running with uaccess must be audited manually which
	  means: less code -> less work -> fewer problems (in theory).

A follow-up commit converts setup_sigcontext() to be "unsafe".

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-4-cmr@codefail.de
2021-03-29 12:49:46 +11:00
Christopher M. Riedl
609355dfc8 powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
Reuse the "safe" implementation from signal.c but call unsafe_get_user()
directly in a loop to avoid the intermediate copy into a local buffer.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-3-cmr@codefail.de
2021-03-29 12:49:46 +11:00
Christopher M. Riedl
9466c1799f powerpc/uaccess: Add unsafe_copy_from_user()
Use the same approach as unsafe_copy_to_user() but instead call
unsafe_get_user() in a loop.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210227011259.11992-2-cmr@codefail.de
2021-03-29 12:49:46 +11:00
Davidlohr Bueso
deb9b13eb2 powerpc/qspinlock: Use generic smp_cond_load_relaxed
49a7d46a06 (powerpc: Implement smp_cond_load_relaxed()) added
busy-waiting pausing with a preferred SMT priority pattern, lowering
the priority (reducing decode cycles) during the whole loop slowpath.

However, data shows that while this pattern works well with simple
spinlocks, queued spinlocks benefit more being kept in medium priority,
with a cpu_relax() instead, being a low+medium combo on powerpc.

Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with
2 sockets and 8 threads per core.

1. locktorture.

This is data for the lowest and most artificial/pathological level,
with increasing thread counts pounding on the lock. Metrics are total
ops/minute. Despite some small hits in the 4-8 range, scenarios are
either neutral or favorable to this patch.

+=========+==========+==========+=======+
| # tasks | vanilla  | dirty    | %diff |
+=========+==========+==========+=======+
| 2       | 46718565 | 48751350 | 4.35  |
+---------+----------+----------+-------+
| 4       | 51740198 | 50369082 | -2.65 |
+---------+----------+----------+-------+
| 8       | 63756510 | 62568821 | -1.86 |
+---------+----------+----------+-------+
| 16      | 67824531 | 70966546 | 4.63  |
+---------+----------+----------+-------+
| 32      | 53843519 | 61155508 | 13.58 |
+---------+----------+----------+-------+
| 64      | 53005778 | 53104412 | 0.18  |
+---------+----------+----------+-------+
| 128     | 53331980 | 54606910 | 2.39  |
+=========+==========+==========+=======+

2. sockperf (tcp throughput)

Here a client will do one-way throughput tests to a localhost server, with
increasing message sizes, dealing with the sk_lock. This patch shows to put
the performance of the qspinlock back to par with that of the simple lock:

		     simple-spinlock           vanilla			dirty
Hmean     14        73.50 (   0.00%)       54.44 * -25.93%*       73.45 * -0.07%*
Hmean     100      654.47 (   0.00%)      385.61 * -41.08%*      771.43 * 17.87%*
Hmean     300     2719.39 (   0.00%)     2181.67 * -19.77%*     2666.50 * -1.94%*
Hmean     500     4400.59 (   0.00%)     3390.77 * -22.95%*     4322.14 * -1.78%*
Hmean     850     6726.21 (   0.00%)     5264.03 * -21.74%*     6863.12 * 2.04%*

3. dbench (tmpfs)

Configured to run with up to ncpusx8 clients, it shows both latency and
throughput metrics. For the latency, with the exception of the 64 case,
there is really nothing to go by:
				     vanilla                dirty
Amean     latency-1          1.67 (   0.00%)        1.67 *   0.09%*
Amean     latency-2          2.15 (   0.00%)        2.08 *   3.36%*
Amean     latency-4          2.50 (   0.00%)        2.56 *  -2.27%*
Amean     latency-8          2.49 (   0.00%)        2.48 *   0.31%*
Amean     latency-16         2.69 (   0.00%)        2.72 *  -1.37%*
Amean     latency-32         2.96 (   0.00%)        3.04 *  -2.60%*
Amean     latency-64         7.78 (   0.00%)        8.17 *  -5.07%*
Amean     latency-512      186.91 (   0.00%)      186.41 *   0.27%*

For the dbench4 Throughput (misleading but traditional) there's a small
but rather constant improvement:

			     vanilla                dirty
Hmean     1        849.13 (   0.00%)      851.51 *   0.28%*
Hmean     2       1664.03 (   0.00%)     1663.94 *  -0.01%*
Hmean     4       3073.70 (   0.00%)     3104.29 *   1.00%*
Hmean     8       5624.02 (   0.00%)     5694.16 *   1.25%*
Hmean     16      9169.49 (   0.00%)     9324.43 *   1.69%*
Hmean     32     11969.37 (   0.00%)    12127.09 *   1.32%*
Hmean     64     15021.12 (   0.00%)    15243.14 *   1.48%*
Hmean     512    14891.27 (   0.00%)    15162.11 *   1.82%*

Measuring the dbench4 Per-VFS Operation latency, shows some very minor
differences within the noise level, around the 0-1% ranges.

Fixes: 49a7d46a06 ("powerpc: Implement smp_cond_load_relaxed()")
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210318204702.71417-1-dave@stgolabs.net
2021-03-29 12:48:46 +11:00
Davidlohr Bueso
66f6052213 powerpc/spinlock: Unserialize spin_is_locked
c6f5d02b6a (locking/spinlocks/arm64: Remove smp_mb() from
arch_spin_is_locked()) made it pretty official that the call
semantics do not imply any sort of barriers, and any user that
gets creative must explicitly do any serialization.

This creativity, however, is nowadays pretty limited:

1. spin_unlock_wait() has been removed from the kernel in favor
of a lock/unlock combo. Furthermore, queued spinlocks have now
for a number of years no longer relied on _Q_LOCKED_VAL for the
call, but any non-zero value to indicate a locked state. There
were cases where the delayed locked store could lead to breaking
mutual exclusion with crossed locking; such as with sysv ipc and
netfilter being the most extreme.

2. The auditing Andrea did in verified that remaining spin_is_locked()
no longer rely on such semantics. Most callers just use it to assert
a lock is taken, in a debug nature. The only user that gets cute is
NOLOCK qdisc, as of:

   96009c7d50 (sched: replace __QDISC_STATE_RUNNING bit with a spin lock)

... which ironically went in the next day after c6f5d02b6a. This
change replaces test_bit() with spin_is_locked() to know whether
to take the busylock heuristic to reduce contention on the main
qdisc lock. So any races against spin_is_locked() for archs that
use LL/SC for spin_lock() will be benign and not break any mutual
exclusion; furthermore, both the seqlock and busylock have the same
scope.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210309015950.27688-3-dave@stgolabs.net
2021-03-26 23:19:43 +11:00
Davidlohr Bueso
2bf3604c41 powerpc/spinlock: Define smp_mb__after_spinlock only once
Instead of both queued and simple spinlocks doing it. Move
it into the arch's spinlock.h.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210309015950.27688-2-dave@stgolabs.net
2021-03-26 23:19:43 +11:00
Christophe Leroy
93c043e393 powerpc/ptrace: Convert gpr32_set_common() to user access block
Use user access block in gpr32_set_common() instead of
repetitive __get_user() which imply repetitive KUAP open/close.

To get it clean, force inlining of the small set of tiny functions
called inside the block.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bdcb8652c3bb4ab5b8b3bfd08147434be8fc04c9.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:43 +11:00
Christophe Leroy
870779f40e powerpc/futex: Switch to user_access block
Use user_access_begin() instead of the access_ok/allow_access sequence.

This brings the missing might_fault() check.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6cd202cdc4f939d47822e4ddd3c0856210431a58.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:43 +11:00
Christophe Leroy
164dc6ce36 powerpc/net: Switch csum_and_copy_{to/from}_user to user_access block
Use user_access_begin() instead of the
might_sleep/access_ok/allow_access sequence.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2dee286d2d6dc9a27d99e31ac564bad4fae2cb49.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:43 +11:00
Christophe Leroy
e63ceebdad powerpc/lib: Don't use __put_user_asm_goto() outside of uaccess.h
__put_user_asm_goto() is internal to uaccess.h

Use __put_kernel_nofault() instead. The generated code is identical.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3e32c4f0361933909368b68f5ee569e5de661c1b.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:42 +11:00
Christophe Leroy
fd69d544b0 powerpc/syscalls: Use sys_old_select() in ppc_select()
Instead of opencodying the copy of parameters, use
the generic sys_old_select().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4de983ad254739da1fe6e9f273baf387b7043ae0.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:42 +11:00
Christophe Leroy
4b8cda5881 powerpc/uaccess: Move copy_mc_xxx() functions down
copy_mc_xxx() functions are in the middle of raw_copy functions.

For clarity, move them out of the raw_copy functions block.

They are using access_ok, so they need to be after the general
functions in order to eventually allow the inclusion of
asm-generic/uaccess.h in some future.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2cdecb6e5a2fcee6c158d18dd254b71ec0e0da4d.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:42 +11:00
Christophe Leroy
7472199a6e powerpc/uaccess: Swap clear_user() and __clear_user()
It is clear_user() which is expected to call __clear_user(),
not the reverse.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8ec01fb22f33d87321451d5e5f01cb56dacaa39.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:42 +11:00
Christophe Leroy
c6adc835c6 powerpc/uaccess: Also perform 64 bits copies in unsafe_copy_to_user() on ppc32
ppc32 has an efficiant 64 bits __put_user(), so also use it in
order to unroll loops more.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ccc08a16eea682d6fa4acc957ffe34003a8f0844.1615398498.git.christophe.leroy@csgroup.eu
2021-03-26 23:19:41 +11:00
Laurent Dufour
6ce56e1ac3 powerpc/pseries: export LPAR security flavor in lparcfg
This is helpful to read the security flavor from inside the LPAR.

In /sys/kernel/debug/powerpc/security_features it can be seen if
mitigations are on or off but not the level set through the ASMI menu.
Furthermore, reporting it through /proc/powerpc/lparcfg allows an easy
processing by the lparstat command [1].

Export it like this in /proc/powerpc/lparcfg:

  $ grep security_flavor /proc/powerpc/lparcfg
  security_flavor=1

Value follows what is documented on the IBM support page [2]:

  0 Speculative execution fully enabled
  1 Speculative execution controls to mitigate user-to-kernel attacks
  2 Speculative execution controls to mitigate user-to-kernel and
    user-to-user side-channel attacks

[1] https://groups.google.com/g/powerpc-utils-devel/c/NaKXvdyl_UI/m/wa2stpIDAQAJ
[2] https://www.ibm.com/support/pages/node/715841

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210305125554.5165-1-ldufour@linux.ibm.com
2021-03-26 23:19:41 +11:00
Aneesh Kumar K.V
53f1d31708 powerpc/mm/book3s64: Use the correct storage key value when calling H_PROTECT
H_PROTECT expects the flag value to include flags:
  AVPN, pp0, pp1, pp2, key0-key4, Noexec, CMO Option flags

This patch updates hpte_updatepp() to fetch the storage key value from
the linux page table and use the same in H_PROTECT hcall.

native_hpte_updatepp() is not updated because the kernel doesn't clear
the existing storage key value there. The kernel also doesn't use
hpte_updatepp() callback for updating storage keys.

This fixes the below kernel crash observed with KUAP enabled.

  BUG: Unable to handle kernel data access on write at 0xc009fffffc440000
  Faulting instruction address: 0xc0000000000b7030
  Key fault AMR: 0xfcffffffffffffff IAMR: 0xc0000077bc498100
  Found HPTE: v = 0x40070adbb6fffc05 r = 0x1ffffffffff1194
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  ...
  CFAR: c000000000010100 DAR: c009fffffc440000 DSISR: 02200000 IRQMASK: 0
  ...
  NIP memset+0x68/0x104
  LR  pcpu_alloc+0x54c/0xb50
  Call Trace:
    pcpu_alloc+0x55c/0xb50 (unreliable)
    blk_stat_alloc_callback+0x94/0x150
    blk_mq_init_allocated_queue+0x64/0x560
    blk_mq_init_queue+0x54/0xb0
    scsi_mq_alloc_queue+0x30/0xa0
    scsi_alloc_sdev+0x1cc/0x300
    scsi_probe_and_add_lun+0xb50/0x1020
    __scsi_scan_target+0x17c/0x790
    scsi_scan_channel+0x90/0xe0
    scsi_scan_host_selected+0x148/0x1f0
    do_scan_async+0x2c/0x2a0
    async_run_entry_fn+0x78/0x220
    process_one_work+0x264/0x540
    worker_thread+0xa8/0x600
    kthread+0x190/0x1a0
    ret_from_kernel_thread+0x5c/0x6c

With KUAP enabled the kernel uses storage key 3 for all its
translations. But as shown by the debug print, in this specific case we
have the hash page table entry created with key value 0.

  Found HPTE: v = 0x40070adbb6fffc05 r = 0x1ffffffffff1194

and DSISR indicates a key fault.

This can happen due to parallel fault on the same EA by different CPUs:

  CPU 0					CPU 1
  fault on X

  H_PAGE_BUSY set
  					fault on X

  finish fault handling and
  clear H_PAGE_BUSY
  					check for H_PAGE_BUSY
  					continue with fault handling.

This implies CPU1 will end up calling hpte_updatepp for address X and
the kernel updated the hash pte entry with key 0

Fixes: d94b827e89 ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
Reported-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Debugged-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210326070755.304625-1-aneesh.kumar@linux.ibm.com
2021-03-26 22:19:39 +11:00
Christophe Leroy
90cbac0e99 powerpc: Enable KFENCE for PPC32
Add architecture specific implementation details for KFENCE and enable
KFENCE for the ppc32 architecture. In particular, this implements the
required interface in <asm/kfence.h>.

KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the Read/Write linear map to be
mapped at page granularity.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8dfe1bd2abde26337c1d8c1ad0acfcc82185e0d5.1614868445.git.christophe.leroy@csgroup.eu
2021-03-24 14:09:30 +11:00
Denis Efremov
0b71b37241 powerpc/ptrace: Remove duplicate check from pt_regs_check()
"offsetof(struct pt_regs, msr) == offsetof(struct user_pt_regs, msr)"
checked in pt_regs_check() twice in a row. Remove the second check.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210305112807.26299-1-efremov@linux.com
2021-03-24 14:09:30 +11:00
Lee Jones
13b8219bd0 powerpc/pseries: Move hvc_vio_init_early() prototype to shared location
Fixes the following W=1 kernel build warning(s):

 drivers/tty/hvc/hvc_vio.c:385:13: warning: no previous prototype for ‘hvc_vio_init_early’
 385 | void __init hvc_vio_init_early(void)
 | ^~~~~~~~~~~~~~~~~~

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210303124603.3150175-1-lee.jones@linaro.org
2021-03-24 14:09:30 +11:00
Zhang Yunkai
1a029e0edb powerpc: Fix misspellings in tlbflush.h
The comment marking the end of the include guard is wrong, fix it up.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
[mpe: Rewrite commit message]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210304031318.188447-1-zhang.yunkai@zte.com.cn
2021-03-24 14:09:30 +11:00
Zhang Yunkai
1a0e4550fb powerpc: Remove duplicate includes
asm/tm.h included in traps.c is duplicated. It is also included on
the 62nd line.

asm/udbg.h included in setup-common.c is duplicated. It is also
included on the 61st line.

asm/bug.h included in arch/powerpc/include/asm/book3s/64/mmu-hash.h
is duplicated. It is also included on the 12th line.

asm/tlbflush.h included in arch/powerpc/include/asm/pgtable.h is
duplicated. It is also included on the 11th line.

asm/page.h included in arch/powerpc/include/asm/thread_info.h is
duplicated. It is also included on the 13th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
[mpe: Squash together from multiple commits]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-03-24 14:09:30 +11:00
Nathan Chancellor
1ef1dd9c7e powerpc/prom: Mark identical_pvr_fixup as __init
If identical_pvr_fixup() is not inlined, there are two modpost warnings:

WARNING: modpost: vmlinux.o(.text+0x54e8): Section mismatch in reference
from the function identical_pvr_fixup() to the function
.init.text:of_get_flat_dt_prop()
The function identical_pvr_fixup() references
the function __init of_get_flat_dt_prop().
This is often because identical_pvr_fixup lacks a __init
annotation or the annotation of of_get_flat_dt_prop is wrong.

WARNING: modpost: vmlinux.o(.text+0x551c): Section mismatch in reference
from the function identical_pvr_fixup() to the function
.init.text:identify_cpu()
The function identical_pvr_fixup() references
the function __init identify_cpu().
This is often because identical_pvr_fixup lacks a __init
annotation or the annotation of identify_cpu is wrong.

identical_pvr_fixup() calls two functions marked as __init and is only
called by a function marked as __init so it should be marked as __init
as well. At the same time, remove the inline keywork as it is not
necessary to inline this function. The compiler is still free to do so
if it feels it is worthwhile since commit 889b3c1245 ("compiler:
remove CONFIG_OPTIMIZE_INLINING entirely").

Fixes: 14b3d926a2 ("[POWERPC] 4xx: update 440EP(x)/440GR(x) identical PVR issue workaround")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1316
Link: https://lore.kernel.org/r/20210302200829.2680663-1-nathan@kernel.org
2021-03-24 14:09:30 +11:00
Nathan Chancellor
fbced1546e powerpc/fadump: Mark fadump_calculate_reserve_size as __init
If fadump_calculate_reserve_size() is not inlined, there is a modpost
warning:

WARNING: modpost: vmlinux.o(.text+0x5196c): Section mismatch in
reference from the function fadump_calculate_reserve_size() to the
function .init.text:parse_crashkernel()
The function fadump_calculate_reserve_size() references
the function __init parse_crashkernel().
This is often because fadump_calculate_reserve_size lacks a __init
annotation or the annotation of parse_crashkernel is wrong.

fadump_calculate_reserve_size() calls parse_crashkernel(), which is
marked as __init and fadump_calculate_reserve_size() is called from
within fadump_reserve_mem(), which is also marked as __init.

Mark fadump_calculate_reserve_size() as __init to fix the section
mismatch. Additionally, remove the inline keyword as it is not necessary
to inline this function; the compiler is still free to do so if it feels
it is worthwhile since commit 889b3c1245 ("compiler: remove
CONFIG_OPTIMIZE_INLINING entirely").

Fixes: 11550dc0a0 ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1300
Link: https://lore.kernel.org/r/20210302195013.2626335-1-nathan@kernel.org
2021-03-24 14:09:30 +11:00
Bhaskar Chowdhury
5c4a4802b9 powerpc: Fix spelling of "droping" to "dropping" in traps.c
s/droping/dropping/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210224075547.763063-1-unixbhaskar@gmail.com
2021-03-24 14:09:29 +11:00
Jiapeng Chong
4f46d57cab powerpc: remove unneeded semicolon
Fix the following coccicheck warnings:

./arch/powerpc/kernel/prom_init.c:2986:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614151761-53721-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-03-24 14:09:29 +11:00
Geert Uytterhoeven
9634afa67b powerpc/chrp: Make hydra_init() static
Commit 407d418f2f ("powerpc/chrp: Move PHB discovery") moved the
sole call to hydra_init() to the source file where it is defined, so it
can be made static.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210223095345.2139416-1-geert@linux-m68k.org
2021-03-24 14:09:29 +11:00
Sebastian Andrzej Siewior
9be77e11da powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used
The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.

Move the definition of linear_mapping_mutex to ifdef block where it is
used.

Fixes: 1f73ad3e8d ("powerpc/mm: print warning in arch_remove_linear_mapping()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210219165648.2505482-1-bigeasy@linutronix.de
2021-03-24 14:09:29 +11:00
Ingo Molnar
f2cc020d78 tracing: Fix various typos in comments
Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.

Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-23 14:08:18 -04:00
Michal Simek
2907f851f6 xsysace: Remove SYSACE driver
Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
systems. The driver is not regularly tested and very likely not working for
quite a long time that's why remove it.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-23 10:27:38 -06:00
Nathan Lynch
274cb1ca2e powerpc/pseries/mobility: handle premature return from H_JOIN
The pseries join/suspend sequence in its current form was written with
the assumption that it was the only user of H_PROD and that it needn't
handle spurious successful returns from H_JOIN. That's wrong;
powerpc's paravirt spinlock code uses H_PROD, and CPUs entering
do_join() can be woken prematurely from H_JOIN with a status of
H_SUCCESS as a result. This causes all CPUs to exit the sequence
early, preventing suspend from occurring at all.

Add a 'done' boolean flag to the pseries_suspend_info struct, and have
the waking thread set it before waking the other threads. Threads
which receive H_SUCCESS from H_JOIN retry if the 'done' flag is still
unset.

Fixes: 9327dc0aee ("powerpc/pseries/mobility: use stop_machine for join/suspend")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210315080045.460331-3-nathanl@linux.ibm.com
2021-03-23 09:25:12 +11:00
Nathan Lynch
e834df6cfc powerpc/pseries/mobility: use struct for shared state
The atomic_t counter is the only shared state for the join/suspend
sequence so far, but that will change. Contain it in a
struct (pseries_suspend_info), and document its intended use. No
functional change.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210315080045.460331-2-nathanl@linux.ibm.com
2021-03-23 09:25:12 +11:00
Sascha Hauer
fa8b90070a quota: wire up quotactl_path
Wire up the quotactl_path syscall added in the previous patch.

Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-17 15:51:17 +01:00
Christoph Hellwig
9906aa5bd6 powerpc/svm: stop using io_tlb_start
Use the local variable that is passed to swiotlb_init_with_tbl for
freeing the memory in the failure case to isolate the code a little
better from swiotlb internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-17 00:13:52 +00:00
Greg Kroah-Hartman
280def1e1c Merge 5.12-rc3 into tty-next
Resolves a merge issue with:
	drivers/tty/hvc/hvcs.c
and we want the tty/serial fixes from 5.12-rc3 in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-15 08:43:49 +01:00
Christophe Leroy
eed5fae005 powerpc: Force inlining of cpu_has_feature() to avoid build failure
The code relies on constant folding of cpu_has_feature() based
on possible and always true values as defined per
CPU_FTRS_ALWAYS and CPU_FTRS_POSSIBLE.

Build failure is encountered with for instance
book3e_all_defconfig on kisskb in the AMDGPU driver which uses
cpu_has_feature(CPU_FTR_VSX_COMP) to decide whether calling
kernel_enable_vsx() or not.

The failure is due to cpu_has_feature() not being inlined with
that configuration with gcc 4.9.

In the same way as commit acdad8fb4a ("powerpc: Force inlining of
mmu_has_feature to fix build failure"), for inlining of
cpu_has_feature().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b231dfa040ce4cc37f702f5c3a595fdeabfe0462.1615378209.git.christophe.leroy@csgroup.eu
2021-03-14 20:32:24 +11:00
Christophe Leroy
08c18b63d9 powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
(for instance mvme5100_defconfig and ps3_defconfig), gcc 5
generates a call to _restgpr_31_x.

Until recently it went unnoticed, but
commit 42ed6d56ad ("powerpc/vdso: Block R_PPC_REL24 relocations")
made it rise to the surface.

Provide that function (copied from lib/crtsavres.S) in
gettimeofday.S

Fixes: ab037dd87a ("powerpc/vdso: Switch VDSO to generic C implementation.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a7aa198a88bcd33c6e35e99f70f86c7b7f2f9440.1615270757.git.christophe.leroy@csgroup.eu
2021-03-14 20:32:23 +11:00
Al Viro
c4ab036a2f spufs: fix bogosity in S_ISGID handling
clearing everything *except* S_ISGID (including the S_IFDIR, among
other things) is wrong.  Just use init_inode_owner() and be done
with that...

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-03-12 23:13:52 -05:00
Christophe Leroy
0b736881c8 powerpc/traps: unrecoverable_exception() is not an interrupt handler
unrecoverable_exception() is called from interrupt handlers or
after an interrupt handler has failed.

Make it a standard function to avoid doubling the actions
performed on interrupt entry (e.g.: user time accounting).

Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae96c59fa2cb7f24a8929c58cfa2c909cb8ff1f1.1615291471.git.christophe.leroy@csgroup.eu
2021-03-12 11:02:12 +11:00
Thiago Jung Bauermann
886db32398 powerpc/kexec_file: Restore FDT size estimation for kdump kernel
kexec_fdt_totalsize_ppc64() includes the base FDT size in its size
calculation, but commit 3c985d31ad ("powerpc: Use common
of_kexec_alloc_and_setup_fdt()") changed the kexec code to use the
generic function of_kexec_alloc_and_setup_fdt() which already includes
the base FDT size. That change made the code overestimate the size a bit
by counting twice the space required for the kernel command line and
/chosen properties.

Therefore change kexec_fdt_totalsize_ppc64() to calculate just the extra
space needed by the kdump kernel, and change the function name so that it
better reflects what the function is now doing.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
[robh: reword commit msg as no longer a fix from merging to branches]
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210220005204.1417200-1-bauerman@linux.ibm.com
2021-03-11 09:53:38 -07:00
Jiri Slaby
f76edd8f7c tty: cyclades, remove this orphan
The Cyclades driver was orphaned by commit d459883e6c (MAINTAINERS:
remove two dead e-mail) 13 years ago. Noone stepped up to take care of
them and to fix all the issues the driver has.

On the top of that, there is no way to obtain the firmware for Z cards
from the vendor as cyclades.com ceased to exist.

So it's time to drop the driver with all its traces.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20210302062214.29627-5-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:34:06 +01:00
Christophe Leroy
bd73758803 powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
Add stub instances of enable_kernel_vsx() and disable_kernel_vsx()
when CONFIG_VSX is not set, to avoid following build failure.

  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o
  In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services_types.h:29,
                   from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:37,
                   from drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:27:
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c: In function 'dcn_bw_apply_registry_override':
  ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:64:3: error: implicit declaration of function 'enable_kernel_vsx'; did you mean 'enable_kernel_fp'? [-Werror=implicit-function-declaration]
     64 |   enable_kernel_vsx(); \
        |   ^~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:640:2: note: in expansion of macro 'DC_FP_START'
    640 |  DC_FP_START();
        |  ^~~~~~~~~~~
  ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:75:3: error: implicit declaration of function 'disable_kernel_vsx'; did you mean 'disable_kernel_fp'? [-Werror=implicit-function-declaration]
     75 |   disable_kernel_vsx(); \
        |   ^~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:676:2: note: in expansion of macro 'DC_FP_END'
    676 |  DC_FP_END();
        |  ^~~~~~~~~
  cc1: some warnings being treated as errors
  make[5]: *** [drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o] Error 1

This works because the caller is checking if VSX is available using
cpu_has_feature():

  #define DC_FP_START() { \
  	if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \
  		preempt_disable(); \
  		enable_kernel_vsx(); \
  	} else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \
  		preempt_disable(); \
  		enable_kernel_altivec(); \
  	} else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \
  		preempt_disable(); \
  		enable_kernel_fp(); \
  	} \

When CONFIG_VSX is not selected, cpu_has_feature(CPU_FTR_VSX_COMP)
constant folds to 'false' so the call to enable_kernel_vsx() is
discarded and the build succeeds.

Fixes: 16a9dea110 ("amdgpu: Enable initial DCN support on POWER")
Cc: stable@vger.kernel.org # v5.6+
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Incorporate some discussion comments into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8d7d285a027e9d21f5ff7f850fa71a2655b0c4af.1615279170.git.christophe.leroy@csgroup.eu
2021-03-10 11:15:00 +11:00
Daniel Axtens
c080a17330 powerpc/64s/exception: Clean up a missed SRR specifier
Nick's patch cleaning up the SRR specifiers in exception-64s.S missed
a single instance of EXC_HV_OR_STD. Clean that up.

Caught by clang's integrated assembler.

Fixes: 3f7fbd97d0 ("powerpc/64s/exception: Clean up SRR specifiers")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225031006.1204774-2-dja@axtens.net
2021-03-10 07:59:31 +11:00
Nicholas Piggin
73ac798818 powerpc: Fix inverted SET_FULL_REGS bitop
This bit operation was inverted and set the low bit rather than
cleared it, breaking the ability to ptrace non-volatile GPRs after
exec. Fix.

Only affects 64e and 32-bit.

Fixes: feb9df3462 ("powerpc/64s: Always has full regs, so remove remnant checks")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210308085530.3191843-1-npiggin@gmail.com
2021-03-10 07:59:30 +11:00
Michael Ellerman
7aed41cff3 powerpc/64s: Use symbolic macros for function entry encoding
In ppc_function_entry() we look for a specific set of instructions by
masking the instructions and comparing with a known value. Currently
those known values are just literal hex values, and we recently
discovered one of them was wrong.

Instead construct the values using the existing constants we have for
defining various fields of instructions.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210309071544.515303-1-mpe@ellerman.id.au
2021-03-10 07:58:04 +11:00
Naveen N. Rao
cea15316ce powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
'lis r2,N' is 'addis r2,0,N' and the instruction encoding in the macro
LIS_R2 is incorrect (it currently maps to 'addis r0,r2,N'). Fix the
same.

Fixes: c71b7eff42 ("powerpc: Add ABIv2 support to ppc_function_entry")
Cc: stable@vger.kernel.org # v3.16+
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210304020411.16796-1-naveen.n.rao@linux.vnet.ibm.com
2021-03-09 22:13:26 +11:00
Lakshmi Ramasubramanian
cd42f1db09 powerpc: Delete unused function delete_fdt_mem_rsv()
delete_fdt_mem_rsv() defined in "arch/powerpc/kexec/file_load.c"
has been renamed to fdt_find_and_del_mem_rsv(), and moved to
"drivers/of/kexec.c".

Remove delete_fdt_mem_rsv() in "arch/powerpc/kexec/file_load.c".

Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-13-nramas@linux.microsoft.com
2021-03-08 12:06:30 -07:00
Lakshmi Ramasubramanian
fee3ff99bc powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c
The functions defined in "arch/powerpc/kexec/ima.c" handle setting up
and freeing the resources required to carry over the IMA measurement
list from the current kernel to the next kernel across kexec system call.
These functions do not have architecture specific code, but are
currently limited to powerpc.

Move remove_ima_buffer() and setup_ima_buffer() calls into
of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".

Move the remaining architecture independent functions from
"arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c".
Delete "arch/powerpc/kexec/ima.c" and "arch/powerpc/include/asm/ima.h".
Remove references to the deleted files and functions in powerpc and
in ima.

Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-11-nramas@linux.microsoft.com
2021-03-08 12:06:29 -07:00
Lakshmi Ramasubramanian
39652741c8 powerpc: Enable passing IMA log to next kernel on kexec
CONFIG_HAVE_IMA_KEXEC is enabled to indicate that the IMA measurement
log information is present in the device tree. This should be selected
only if CONFIG_IMA is enabled.

Update CONFIG_KEXEC_FILE to select CONFIG_HAVE_IMA_KEXEC, if CONFIG_IMA
is enabled, to indicate that the IMA measurement log information is
present in the device tree for powerpc.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-10-nramas@linux.microsoft.com
2021-03-08 12:06:29 -07:00
Lakshmi Ramasubramanian
0c605158be powerpc: Move ima buffer fields to struct kimage
The fields ima_buffer_addr and ima_buffer_size in "struct kimage_arch"
for powerpc are used to carry forward the IMA measurement list across
kexec system call.  These fields are not architecture specific, but are
currently limited to powerpc.

arch_ima_add_kexec_buffer() defined in "arch/powerpc/kexec/ima.c"
sets ima_buffer_addr and ima_buffer_size for the kexec system call.
This function does not have architecture specific code, but is
currently limited to powerpc.

Move ima_buffer_addr and ima_buffer_size to "struct kimage".
Set ima_buffer_addr and ima_buffer_size in ima_add_kexec_buffer()
in security/integrity/ima/ima_kexec.c.

Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-9-nramas@linux.microsoft.com
2021-03-08 12:06:29 -07:00
Rob Herring
3c985d31ad powerpc: Use common of_kexec_alloc_and_setup_fdt()
The code for setting up the /chosen node in the device tree
and updating the memory reservation for the next kernel has been
moved to of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".

Use the common of_kexec_alloc_and_setup_fdt() to setup the device tree
and update the memory reservation for kexec for powerpc.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-8-nramas@linux.microsoft.com
2021-03-08 12:06:29 -07:00
Lakshmi Ramasubramanian
e6635bab53 powerpc: Use ELF fields defined in 'struct kimage'
ELF related fields elf_headers, elf_headers_sz, and elfcorehdr_addr
have been moved from 'struct kimage_arch' to 'struct kimage' as
elf_headers, elf_headers_sz, and elf_load_addr respectively.

Use the ELF fields defined in 'struct kimage'.

Suggested-by: Rob Herring <robh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-4-nramas@linux.microsoft.com
2021-03-08 12:06:28 -07:00
Al Viro
d0f1088b31 coredump: don't bother with do_truncate()
have dump_skip() just remember how much needs to be skipped,
leave actual seeks/writing zeroes to the next dump_emit()
or the end of coredump output, whichever comes first.
And instead of playing with do_truncate() in the end, just
write one NUL at the end of the last gap (if any).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-03-08 10:21:11 -05:00
John Ogness
a4f9876532 printk: kmsg_dump: remove _nolock() variants
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.

The functions without _nolock() were chosen because they are already
exported to kernel modules.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-15-john.ogness@linutronix.de
2021-03-08 11:43:35 +01:00
John Ogness
f9f3f02db9 printk: introduce a kmsg_dump iterator
Rather than storing the iterator information in the registered
kmsg_dumper structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller, thus
allowing lockless use of the kmsg_dump functions.

Update code that accesses the kernel logs using the kmsg_dumper
structure to use the new kmsg_dump_iter structure. For kmsg_dumpers,
this also means adding a call to kmsg_dump_rewind() to initialize
the iterator.

All this is in preparation for removal of @logbuf_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org> # pstore
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-13-john.ogness@linutronix.de
2021-03-08 11:43:27 +01:00
John Ogness
5f6c7648e5 printk: kmsg_dumper: remove @active field
All 6 kmsg_dumpers do not benefit from the @active flag:

  (provide their own synchronization)
  - arch/powerpc/kernel/nvram_64.c
  - arch/um/kernel/kmsg_dump.c
  - drivers/mtd/mtdoops.c
  - fs/pstore/platform.c

  (only dump on KMSG_DUMP_PANIC, which does not require
  synchronization)
  - arch/powerpc/platforms/powernv/opal-kmsg.c
  - drivers/hv/vmbus_drv.c

The other 2 kmsg_dump users also do not rely on @active:

  (hard-code @active to always be true)
  - arch/powerpc/xmon/xmon.c
  - kernel/debug/kdb/kdb_main.c

Therefore, @active can be removed.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-12-john.ogness@linutronix.de
2021-03-08 11:43:23 +01:00
Yang Li
da3c6c836f crypto: powepc/sha1 - remove unneeded semicolon
Eliminate the following coccicheck warning:
./arch/powerpc/crypto/sha1-spe-glue.c:110:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-07 15:13:14 +11:00
Jordan Niethe
5c88a17e15 powerpc/sstep: Fix VSX instruction emulation
Commit af99da7433 ("powerpc/sstep: Support VSX vector paired storage
access instructions") added loading and storing 32 word long data into
adjacent VSRs. However the calculation used to determine if two VSRs
needed to be loaded/stored inadvertently prevented the load/storing
taking place for instructions with a data length less than 16 words.

This causes the emulation to not function correctly, which can be seen
by the alignment_handler selftest:

$ ./alignment_handler
[snip]
test: test_alignment_handler_vsx_207
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 2.07B
        Doing lxsspx:   PASSED
        Doing lxsiwax:  FAILED: Wrong Data
        Doing lxsiwzx:  PASSED
        Doing stxsspx:  PASSED
        Doing stxsiwx:  PASSED
failure: test_alignment_handler_vsx_207
test: test_alignment_handler_vsx_300
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 3.00B
        Doing lxsd:     PASSED
        Doing lxsibzx:  PASSED
        Doing lxsihzx:  PASSED
        Doing lxssp:    FAILED: Wrong Data
        Doing lxv:      PASSED
        Doing lxvb16x:  PASSED
        Doing lxvh8x:   PASSED
        Doing lxvx:     PASSED
        Doing lxvwsx:   FAILED: Wrong Data
        Doing lxvl:     PASSED
        Doing lxvll:    PASSED
        Doing stxsd:    PASSED
        Doing stxsibx:  PASSED
        Doing stxsihx:  PASSED
        Doing stxssp:   PASSED
        Doing stxv:     PASSED
        Doing stxvb16x: PASSED
        Doing stxvh8x:  PASSED
        Doing stxvx:    PASSED
        Doing stxvl:    PASSED
        Doing stxvll:   PASSED
failure: test_alignment_handler_vsx_300
[snip]

Fix this by making sure all VSX instruction emulation correctly
load/store from the VSRs.

Fixes: af99da7433 ("powerpc/sstep: Support VSX vector paired storage access instructions")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225031946.1458206-1-jniethe5@gmail.com
2021-03-02 22:41:51 +11:00
Athira Rajeev
5ae5fbd210 powerpc/perf: Fix handling of privilege level checks in perf interrupt context
Running "perf mem record" in powerpc platforms with selinux enabled
resulted in soft lockup's. Below call-trace was seen in the logs:

  CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2
  NIP:  c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000
  REGS: c000007fffab7d60 TRAP: 0100   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x94/0x120
  LR  _raw_spin_lock_irqsave+0x90/0x120
  Call Trace:
    0xc00000000fd47260 (unreliable)
    skb_queue_tail+0x3c/0x90
    audit_log_end+0x6c/0x180
    common_lsm_audit+0xb0/0xe0
    slow_avc_audit+0xa4/0x110
    avc_has_perm+0x1c4/0x260
    selinux_perf_event_open+0x74/0xd0
    security_perf_event_open+0x68/0xc0
    record_and_restart+0x6e8/0x7f0
    perf_event_interrupt+0x22c/0x560
    performance_monitor_exception0x4c/0x60
    performance_monitor_common_virt+0x1c8/0x1d0
  interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120
  NIP:  c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0
  REGS: c00000000fd47860 TRAP: 0f00   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x38/0x120
  LR  skb_queue_tail+0x3c/0x90
  interrupt: f00
    0x38 (unreliable)
    0xc00000000aae6200
    audit_log_end+0x6c/0x180
    audit_log_exit+0x344/0xf80
    __audit_syscall_exit+0x2c0/0x320
    do_syscall_trace_leave+0x148/0x200
    syscall_exit_prepare+0x324/0x390
    system_call_common+0xfc/0x27c

The above trace shows that while the CPU was handling a performance
monitor exception, there was a call to security_perf_event_open()
function. In powerpc core-book3s, this function is called from
perf_allow_kernel() check during recording of data address in the
sample via perf_get_data_addr().

Commit da97e18458 ("perf_event: Add support for LSM and SELinux
checks") introduced security enhancements to perf. As part of this
commit, the new security hook for perf_event_open() was added in all
places where perf paranoid check was previously used. In powerpc
core-book3s code, originally had paranoid checks in
perf_get_data_addr() and power_pmu_bhrb_read(). So
perf_paranoid_kernel() checks were replaced with perf_allow_kernel()
in these PMU helper functions as well.

The intention of paranoid checks in core-book3s was to verify
privilege access before capturing some of the sample data. Along with
paranoid checks, perf_allow_kernel() also does a
security_perf_event_open(). Since these functions are accessed while
recording a sample, we end up calling selinux_perf_event_open() in PMI
context. Some of the security functions use spinlock like
sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and
if we end up in calling selinux hook functions in PMI handler, this
could cause a dead lock.

Since the purpose of this security hook is to control access to
perf_event_open(), it is not right to call this in interrupt context.

The paranoid checks in powerpc core-book3s were done at interrupt time
which is also not correct.

Reference commits:
  Commit cd1231d703 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
  Commit bb19af8160 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer")

We only allow creation of events that have already passed the
privilege checks in perf_event_open(). So these paranoid checks are
not needed at event time. As a fix, patch uses
'event->attr.exclude_kernel' check to prevent exposing kernel address
for userspace only sampling.

Fixes: cd1231d703 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
Cc: stable@vger.kernel.org # v4.17+
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-03-02 22:41:51 +11:00
Christophe Leroy
acdad8fb4a powerpc: Force inlining of mmu_has_feature to fix build failure
The test robot has managed to generate a random config leading
to following build failure:

  LD      .tmp_vmlinux.kallsyms1
powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `ptep_set_access_flags':
pgtable.c:(.text.ptep_set_access_flags+0xf0): undefined reference to `hash__flush_tlb_page'
powerpc64-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function `MMU_init_hw_patch':
mmu.c:(.init.text+0x452): undefined reference to `patch__hash_page_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x45e): undefined reference to `patch__hash_page_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x46a): undefined reference to `patch__hash_page_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x476): undefined reference to `patch__hash_page_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x482): undefined reference to `patch__hash_page_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x48e): undefined reference to `patch__hash_page_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x49e): undefined reference to `patch__hash_page_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x4aa): undefined reference to `patch__hash_page_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x4b6): undefined reference to `patch__hash_page_C'
powerpc64-linux-ld: mmu.c:(.init.text+0x4c2): undefined reference to `patch__hash_page_C'
powerpc64-linux-ld: mmu.c:(.init.text+0x4ce): undefined reference to `patch__flush_hash_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x4da): undefined reference to `patch__flush_hash_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x4e6): undefined reference to `patch__flush_hash_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x4f2): undefined reference to `patch__flush_hash_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x4fe): undefined reference to `patch__flush_hash_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x50a): undefined reference to `patch__flush_hash_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x522): undefined reference to `patch__flush_hash_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x532): undefined reference to `patch__flush_hash_B'
powerpc64-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function `update_mmu_cache':
mmu.c:(.text.update_mmu_cache+0xa0): undefined reference to `add_hash_page'
powerpc64-linux-ld: mm/memory.o: in function `zap_pte_range':
memory.c:(.text.zap_pte_range+0x160): undefined reference to `flush_hash_pages'
powerpc64-linux-ld: mm/memory.o: in function `handle_pte_fault':
memory.c:(.text.handle_pte_fault+0x180): undefined reference to `hash__flush_tlb_page'

This is due to mmu_has_feature() not being inlined. See extract of build of
mmu.c with -Winline:

In file included from ./include/linux/mm_types.h:19,
                 from ./include/linux/mmzone.h:21,
                 from ./include/linux/gfp.h:6,
                 from ./include/linux/mm.h:10,
                 from arch/powerpc/mm/book3s32/mmu.c:21:
./arch/powerpc/include/asm/mmu.h: In function 'find_free_bat':
./arch/powerpc/include/asm/mmu.h:231:20: warning: inlining failed in call to 'early_mmu_has_feature': call is unlikely and code size would grow [-Winline]
  231 | static inline bool early_mmu_has_feature(unsigned long feature)
      |                    ^~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/mmu.h:291:9: note: called from here
  291 |  return early_mmu_has_feature(feature);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
and elimination of non possible parts of code at compile time.
For this to work, mmu_has_feature() and early_mmu_has_feature()
must be inlined.

Fixes: 259149cf7c ("powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cf61345912c078c96f171afd0fcc48ef27cbdc3f.1614443418.git.christophe.leroy@csgroup.eu
2021-03-02 22:41:50 +11:00
Uwe Kleine-König
386a966f5c vio: make remove callback return void
The driver core ignores the return value of struct bus_type::remove()
because there is only little that can be done. To simplify the quest to
make this function return void, let struct vio_driver::remove() return
void, too. All users already unconditionally return 0, this commit makes
it obvious that returning an error code is a bad idea.

Note there are two nominally different implementations for a vio bus:
one in arch/sparc/kernel/vio.c and the other in
arch/powerpc/platforms/pseries/vio.c. This patch only adapts the powerpc
one.

Before this patch for a device that was bound to a driver without a
remove callback vio_cmo_bus_remove(viodev) wasn't called. As the device
core still considers the device unbound after vio_bus_remove() returns
calling this unconditionally is the consistent behaviour which is
implemented here.

Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[mpe: Drop unneeded hvcs_remove() forward declaration, squash in
 change from sfr to drop ibmvnic_remove() forward declaration]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225221834.160083-1-uwe@kleine-koenig.org
2021-03-02 22:41:23 +11:00
Christophe Leroy
91b6c5dbe9 powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit()
As reported by kernel test robot, a randconfig with high amount of
debuging options can lead to build failure for undefined reference
to replay_soft_interrupts() on ppc32.

This is due to gcc not seeing that __prep_irq_for_enabled_exit()
always returns true on ppc32 because it doesn't inline it for
some reason.

Force inlining of __prep_irq_for_enabled_exit() to fix the build.

Fixes: 344bb20b15 ("powerpc/syscall: Make interrupt.c buildable on PPC32")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/53f3a1f719441761000c41154602bf097d4350b5.1614148356.git.christophe.leroy@csgroup.eu
2021-03-01 12:33:31 +11:00
Christophe Leroy
c119565a15 powerpc/603: Fix protection of user pages mapped with PROT_NONE
On book3s/32, page protection is defined by the PP bits in the PTE
which provide the following protection depending on the access
keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection,
PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0
and user accesses are performed with key 1. As PP00 is used for
non _PAGE_USER pages, user can't access kernel pages not flagged
_PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed
with key 0, therefore pages not flagged _PAGE_USER are still
accessible to the user.

This shouldn't be an issue, because userspace is expected to be
accessible to the user. But unlike most other architectures, powerpc
implements PROT_NONE protection by removing _PAGE_USER flag instead of
flagging the page as not valid. This means that pages in userspace
that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the
TLB miss handler by checking _PAGE_USER permission on userspace
accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have
an hash table, and hash_page() function already implement the
verification of _PAGE_USER permission on userspace pages.

Fixes: f342adca3a ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Christoph Plattner <christoph.plattner@thalesgroup.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.eu
2021-03-01 12:33:31 +11:00
Greg Kurz
f9619d5e51 powerpc/pseries: Don't enforce MSI affinity with kdump
Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.org
2021-03-01 12:33:31 +11:00
Michael Ellerman
eead089311 powerpc/4xx: Fix build errors from mfdcr()
lkp reported a build error in fsp2.o:

  CC      arch/powerpc/platforms/44x/fsp2.o
  {standard input}:577: Error: unsupported relocation against base

Which comes from:

  pr_err("GESR0: 0x%08x\n", mfdcr(base + PLB4OPB_GESR0));

Where our mfdcr() macro is stringifying "base + PLB4OPB_GESR0", and
passing that to the assembler, which obviously doesn't work.

The mfdcr() macro already checks that the argument is constant using
__builtin_constant_p(), and if not calls the out-of-line version of
mfdcr(). But in this case GCC is smart enough to notice that "base +
PLB4OPB_GESR0" will be constant, even though it's not something we can
immediately stringify into a register number.

Segher pointed out that passing the register number to the inline asm
as a constant would be better, and in fact it fixes the build error,
presumably because it gives GCC a chance to resolve the value.

While we're at it, change mtdcr() similarly.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210218123058.748882-1-mpe@ellerman.id.au
2021-03-01 12:33:31 +11:00
Linus Torvalds
5695e51619 io_uring-worker.v3-2021-02-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmA4JRkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoWqD/9dbbqe8L701U6May1A/4hRsqL4THTA2flx
 vNCNRBl6XV3l/wBCtL6waKy6tyO4lyM8XdUdEvo3Kxl2kGPb8eVfpyYL/+77HqyH
 ctT4RMrs+84Mxn+5N6cM97hS1qVI2moTxxyvOEl/JTB7BYrutz9gvAoeY3/Dto47
 J66oSaPeuqJ32TyihxfQHVxQopJcqFzDjyoYHGDu6ATio1PXfaIdTu8ywVYSECAh
 pWI4rwnqdurGuHMNpxyL1bA6CT/jC7s+sqU7bUYUCgtYI3eG0u3V0bp5gAQQIgl9
 5sxxE3DidYGAkYZsosrelshBtzGddLdz4Qrt2ungMYv8RsGNpFQ095jDPKDwFaZj
 bSvSsfplCo7iFsJByb1TtpNEOW8eAwi81PmBDVQ9Oq5P5ygTYno9GBDc/20ql0Fk
 q6wcX28coE3IBw44ne0hIwvBOtXV4WJyluG/gqOxfbTH+kOy3pDsN8lWcY/P4X0U
 yzdU2MLHe8BNMyYlUiBF47Amzt4ltr85P4XD3WZ4bX71iwri6HvrdGWLuuKwX+Ie
 66QiIDDQIYZQ6NMMJWS9DGW3y3DBizpSXGxONbOw1J2bQdNmtToR0D2UnK/9UnKp
 msnvkUNk8fkYGS4aptpJ6HxbmjMEG5YtbiGlPj6fz5/7MTvhRjPxt7A0LWrUIdqR
 f88+sHUMqg==
 =oc8u
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block

Pull io_uring thread rewrite from Jens Axboe:
 "This converts the io-wq workers to be forked off the tasks in question
  instead of being kernel threads that assume various bits of the
  original task identity.

  This kills > 400 lines of code from io_uring/io-wq, and it's the worst
  part of the code. We've had several bugs in this area, and the worry
  is always that we could be missing some pieces for file types doing
  unusual things (recent /dev/tty example comes to mind, userfaultfd
  reads installing file descriptors is another fun one... - both of
  which need special handling, and I bet it's not the last weird oddity
  we'll find).

  With these identical workers, we can have full confidence that we're
  never missing anything. That, in itself, is a huge win. Outside of
  that, it's also more efficient since we're not wasting space and code
  on tracking state, or switching between different states.

  I'm sure we're going to find little things to patch up after this
  series, but testing has been pretty thorough, from the usual
  regression suite to production. Any issue that may crop up should be
  manageable.

  There's also a nice series of further reductions we can do on top of
  this, but I wanted to get the meat of it out sooner rather than later.
  The general worry here isn't that it's fundamentally broken. Most of
  the little issues we've found over the last week have been related to
  just changes in how thread startup/exit is done, since that's the main
  difference between using kthreads and these kinds of threads. In fact,
  if all goes according to plan, I want to get this into the 5.10 and
  5.11 stable branches as well.

  That said, the changes outside of io_uring/io-wq are:

   - arch setup, simple one-liner to each arch copy_thread()
     implementation.

   - Removal of net and proc restrictions for io_uring, they are no
     longer needed or useful"

* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
  io-wq: remove now unused IO_WQ_BIT_ERROR
  io_uring: fix SQPOLL thread handling over exec
  io-wq: improve manager/worker handling over exec
  io_uring: ensure SQPOLL startup is triggered before error shutdown
  io-wq: make buffered file write hashed work map per-ctx
  io-wq: fix race around io_worker grabbing
  io-wq: fix races around manager/worker creation and task exit
  io_uring: ensure io-wq context is always destroyed for tasks
  arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
  io_uring: cleanup ->user usage
  io-wq: remove nr_process accounting
  io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
  net: remove cmsg restriction from io_uring based send/recvmsg calls
  Revert "proc: don't allow async path resolution of /proc/self components"
  Revert "proc: don't allow async path resolution of /proc/thread-self components"
  io_uring: move SQPOLL thread io-wq forked worker
  io-wq: make io_wq_fork_thread() available to other users
  io-wq: only remove worker from free_list, if it was there
  io_uring: remove io_identity
  io_uring: remove any grabbing of context
  ...
2021-02-27 08:29:02 -08:00
Linus Torvalds
d94d14008e x86:
- take into account HVA before retrying on MMU notifier race
 - fixes for nested AMD guests without NPT
 - allow INVPCID in guest without PCID
 - disable PML in hardware when not in use
 - MMU code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmA3eMQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP6TQf5ARpUyq3oo+13albwg+zNca6hzR8i
 Vl7dpoR3bSJCN3sTYFnlL9eXw5TxgeUL2nqKqma6ddZDNDEBLT2Bq8rcFkbi4pUf
 n7av76EEq74HW/jlUhKVug7Q5Dm5DiKC6BOH3RVuKHbr6iZseyF3jXZSX0Ppf0yF
 gvoy6cGyMW60NVLN5tuGeOjVQ1fxziE0SqB90fXuiWgZ5rzIBfbqJV7EOOZsGO67
 /LHSaEpvKutsc2a+Hx76yQNJjAbb2/O+4Bo5/RqfdqS5tRLGBzYggdJjLvAPvd6P
 pTNtDCnErvBZQfMedEQyHYuBL2Ca59fOp6i/ekOM2I+m7816+kSkdTMt2g==
 =iMHY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "x86:

   - take into account HVA before retrying on MMU notifier race

   - fixes for nested AMD guests without NPT

   - allow INVPCID in guest without PCID

   - disable PML in hardware when not in use

   - MMU code cleanups:

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: SVM: Fix nested VM-Exit on #GP interception handling
  KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created
  KVM: x86/mmu: Consider the hva in mmu_notifier retry
  KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault
  KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID
  KVM: nSVM: prepare guest save area while is_guest_mode is true
  KVM: x86/mmu: Remove a variety of unnecessary exports
  KVM: x86: Fold "write-protect large" use case into generic write-protect
  KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging
  KVM: x86: Further clarify the logic and comments for toggling log dirty
  KVM: x86: Move MMU's PML logic to common code
  KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()
  KVM: nVMX: Disable PML in hardware when running L2
  KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Split out max mapping level calculation to helper
  KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
  KVM: nVMX: no need to undo inject_page_fault change on nested vmexit
  ...
2021-02-26 10:00:12 -08:00
Linus Torvalds
6fbd6cf85a Kbuild updates for v5.12
- Fix false-positive build warnings for ARCH=ia64 builds
 
  - Optimize dictionary size for module compression with xz
 
  - Check the compiler and linker versions in Kconfig
 
  - Fix misuse of extra-y
 
  - Support DWARF v5 debug info
 
  - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
    exceeded the limit
 
  - Add generic syscall{tbl,hdr}.sh for cleanups across arches
 
  - Minor cleanups of genksyms
 
  - Minor cleanups of Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmA3zhgVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG0C4P/A5hUNFdkYI+EffAWZiHn69t0S8j
 M1GQkZildKu/yOfm6hp3mNwgHmYgw0aAuch1htkJuv+5rXRtoK77yw0xKbUqNHyO
 VqkJWQPVUXJbWIDiu332NaETHbFTWCnPZKGmzcbVOBHbYsXUJPp17gROQ9ke0fQN
 Ae6OV5WINhoS8UnjESWb3qOO87MdQTZ+9mP+NMnVh4kV1SUeMAXLFwFll66KZTkj
 GXB330N3p9L0wQVljhXpQ/YPOd76wJNPhJWJ9+hKLFbWsedovzlHb+duprh1z1xe
 7LLaq9dEbXxe1Uz0qmK76lupXxilYMyUupTW9HIYtIsY8br8DIoBOG0bn46LVnuL
 /m+UQNfUFCYYePT7iZQNNc1DISQJrxme3bjq0PJzZTDukNnHJVahnj9x4RoNaF8j
 Dc+JME0r2i8Ccp28vgmaRgzvSsb8Xtw5icwRdwzIpyt1ubs/+tkd/GSaGzQo30Q8
 m8y1WOjovHNX7OGnOaOWBGoQAX/2k/VHeAediMsPqWUoOxwsLHYxG/4KtgwbJ5vc
 gu/Fyk1GRDklZPpLdYFVvz8TGnqSDogJgF+7WolJ6YvPGAUIDAfd5Ky2sWayddlm
 wchc3sKDVyh3lov23h0WQVTvLO9xl+NZ6THxoAGdYeQ0DUu5OxwH8qje/UpWuo1a
 DchhNN+g5pa6n56Z
 =sLxb
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Fix false-positive build warnings for ARCH=ia64 builds

 - Optimize dictionary size for module compression with xz

 - Check the compiler and linker versions in Kconfig

 - Fix misuse of extra-y

 - Support DWARF v5 debug info

 - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
   exceeded the limit

 - Add generic syscall{tbl,hdr}.sh for cleanups across arches

 - Minor cleanups of genksyms

 - Minor cleanups of Kconfig

* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
  initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
  kbuild: remove deprecated 'always' and 'hostprogs-y/m'
  kbuild: parse C= and M= before changing the working directory
  kbuild: reuse this-makefile to define abs_srctree
  kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
  kconfig: omit --oldaskconfig option for 'make config'
  kconfig: fix 'invalid option' for help option
  kconfig: remove dead code in conf_askvalue()
  kconfig: clean up nested if-conditionals in check_conf()
  kconfig: Remove duplicate call to sym_get_string_value()
  Makefile: Remove # characters from compiler string
  Makefile: reuse CC_VERSION_TEXT
  kbuild: check the minimum linker version in Kconfig
  kbuild: remove ld-version macro
  scripts: add generic syscallhdr.sh
  scripts: add generic syscalltbl.sh
  arch: syscalls: remove $(srctree)/ prefix from syscall tables
  arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
  gen_compile_commands: prune some directories
  kbuild: simplify access to the kernel's version
  ...
2021-02-25 10:17:31 -08:00
Linus Torvalds
29c395c77a Rework of the X86 irq stack handling:
The irq stack switching was moved out of the ASM entry code in course of
   the entry code consolidation. It ended up being suboptimal in various
   ways.
 
   - Make the stack switching inline so the stackpointer manipulation is not
     longer at an easy to find place.
 
   - Get rid of the unnecessary indirect call.
 
   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.
 
   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused
     about the stack pointer manipulation.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmA21OcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaX0D/9S0ud6oqbsIvI8LwhvYub63a2cjKP9
 liHAJ7xwMYYVwzf0skwsPb/QE6+onCzdq0upJkgG/gEYm2KbiaMWZ4GgHdj0O7ER
 qXKJONDd36AGxSEdaVzLY5kPuD/mkomGk5QdaZaTmjruthkNzg4y/N2wXUBIMZR0
 FdpSpp5fGspSZCn/DXDx6FjClwpLI53VclvDs6DcZ2DIBA0K+F/cSLb1UQoDLE1U
 hxGeuNa+GhKeeZ5C+q5giho1+ukbwtjMW9WnKHAVNiStjm0uzdqq7ERGi/REvkcB
 LY62u5uOSW1zIBMmzUjDDQEqvypB0iFxFCpN8g9sieZjA0zkaUioRTQyR+YIQ8Cp
 l8LLir0dVQivR1bHghHDKQJUpdw/4zvDj4mMH10XHqbcOtIxJDOJHC5D00ridsAz
 OK0RlbAJBl9FTdLNfdVReBCoehYAO8oefeyMAG12nZeSh5XVUWl238rvzmzIYNhG
 cEtkSx2wIUNEA+uSuI+xvfmwpxL7voTGvqmiRDCAFxyO7Bl/GBu9OEBFA1eOvHB+
 +wTmPDMswRetQNh4QCRXzk1JzP1Wk5CobUL9iinCWFoTJmnsPPSOWlosN6ewaNXt
 kYFpRLy5xt9EP7dlfgBSjiRlthDhTdMrFjD5bsy1vdm1w7HKUo82lHa4O8Hq3PHS
 tinKICUqRsbjig==
 =Sqr1
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq entry updates from Thomas Gleixner:
 "The irq stack switching was moved out of the ASM entry code in course
  of the entry code consolidation. It ended up being suboptimal in
  various ways.

  This reworks the X86 irq stack handling:

   - Make the stack switching inline so the stackpointer manipulation is
     not longer at an easy to find place.

   - Get rid of the unnecessary indirect call.

   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.

   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
     confused about the stack pointer manipulation"

* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix stack-swizzle for FRAME_POINTER=y
  um: Enforce the usage of asm-generic/softirq_stack.h
  x86/softirq/64: Inline do_softirq_own_stack()
  softirq: Move do_softirq_own_stack() to generic asm header
  softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
  x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
  x86/softirq: Remove indirection in do_softirq_own_stack()
  x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
  x86/entry: Convert device interrupts to inline stack switching
  x86/entry: Convert system vectors to irq stack macro
  x86/irq: Provide macro for inlining irq stack switching
  x86/apic: Split out spurious handling code
  x86/irq/64: Adjust the per CPU irq stack pointer by 8
  x86/irq: Sanitize irq stack tracking
  x86/entry: Fix instrumentation annotation
2021-02-24 16:32:23 -08:00
Jens Axboe
0100e6bbdb arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
In the arch addition of PF_IO_WORKER, I missed parisc and powerpc for
some reason. Fix that up, ensuring they handle PF_IO_WORKER like they do
PF_KTHREAD in copy_thread().

Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: 4727dc20e0 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 20:33:33 -07:00
Linus Torvalds
7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Linus Torvalds
21a6ab2131 Modules updates for v5.12
Summary of modules changes for the 5.12 merge window:
 
 - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These export
   types were introduced between 2006 - 2008. All the of the unused symbols have
   been long removed and gpl future symbols were converted to gpl quite a long
   time ago, and I don't believe these export types have been used ever since.
   So, I think it should be safe to retire those export types now. (Christoph Hellwig)
 
 - Refactor and clean up some aged code cruft in the module loader (Christoph Hellwig)
 
 - Build {,module_}kallsyms_on_each_symbol only when livepatching is enabled, as
   it is the only caller (Christoph Hellwig)
 
 - Unexport find_module() and module_mutex and fix the last module
   callers to not rely on these anymore. Make module_mutex internal to
   the module loader. (Christoph Hellwig)
 
 - Harden ELF checks on module load and validate ELF structures before checking
   the module signature (Frank van der Linden)
 
 - Fix undefined symbol warning for clang (Fangrui Song)
 
 - Fix smatch warning (Dan Carpenter)
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAmA0/KMQHGpleXVAa2Vy
 bmVsLm9yZwAKCRDARX44zjvBcu0uD/4nmRp18EKAtdUZivsZHat0aEWGlkmrVueY
 5huYw6iwM8b/wIAl3xwLki1Iv0/l0a83WXZhLG4ekl0/Nj8kgllA+jtBrZWpoLMH
 CZusN5dS9YwwyD2vu3ak83ARcehcDEPeA9thvc3uRFGis6Hi4bt1rkzGdrzsgqR4
 tybfN4qaQx4ZAKFxA8bnS58l7QTFwUzTxJfM6WWzl1Q+mLZDr/WP+loJ/f1/oFFg
 ufN31KrqqFpdQY5UKq5P4H8FVq/eXE1Mwl8vo3HsnAj598fznyPUmA3D/j+N4GuR
 sTGBVZ9CSehUj7uZRs+Qgg6Bd+y3o44N29BrdZWA6K3ieTeQQpA+VgPUNrDBjGhP
 J/9Y4ms4PnuNEWWRaa73m9qsVqAsjh9+T2xp9PYn9uWLCM8BvQFtWcY7tw4/nB0/
 INmyiP/tIRpwWkkBl47u1TPR09FzBBGDZjBiSn3lm3VX+zCYtHoma5jWyejG11cf
 ybDrTsci9ANyHNP2zFQsUOQJkph78PIal0i3k4ODqGJvaC0iEIH3Xjv+0dmE14rq
 kGRrG/HN6HhMZPjashudVUktyTZ63+PJpfFlQbcUzdvjQQIkzW0vrCHMWx9vD1xl
 Na7vZLl4Nb03WSJp6saY6j2YSRKL0poGETzGqrsUAHEhpEOPHduaiCVlAr/EmeMk
 p6SrWv8+UQ==
 =T29Q
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:

 - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These
   export types were introduced between 2006 - 2008. All the of the
   unused symbols have been long removed and gpl future symbols were
   converted to gpl quite a long time ago, and I don't believe these
   export types have been used ever since. So, I think it should be safe
   to retire those export types now (Christoph Hellwig)

 - Refactor and clean up some aged code cruft in the module loader
   (Christoph Hellwig)

 - Build {,module_}kallsyms_on_each_symbol only when livepatching is
   enabled, as it is the only caller (Christoph Hellwig)

 - Unexport find_module() and module_mutex and fix the last module
   callers to not rely on these anymore. Make module_mutex internal to
   the module loader (Christoph Hellwig)

 - Harden ELF checks on module load and validate ELF structures before
   checking the module signature (Frank van der Linden)

 - Fix undefined symbol warning for clang (Fangrui Song)

 - Fix smatch warning (Dan Carpenter)

* tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: potential uninitialized return in module_kallsyms_on_each_symbol()
  module: remove EXPORT_UNUSED_SYMBOL*
  module: remove EXPORT_SYMBOL_GPL_FUTURE
  module: move struct symsearch to module.c
  module: pass struct find_symbol_args to find_symbol
  module: merge each_symbol_section into find_symbol
  module: remove each_symbol_in_section
  module: mark module_mutex static
  kallsyms: only build {,module_}kallsyms_on_each_symbol when required
  kallsyms: refactor {,module_}kallsyms_on_each_symbol
  module: use RCU to synchronize find_module
  module: unexport find_module and module_mutex
  drm: remove drm_fb_helper_modinit
  powerpc/powernv: remove get_cxl_module
  module: harden ELF info handling
  module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
2021-02-23 10:15:33 -08:00
Linus Torvalds
b12b472496 powerpc updates for 5.12
A large series adding wrappers for our interrupt handlers, so that irq/nmi/user
 tracking can be isolated in the wrappers rather than spread in each handler.
 
 Conversion of the 32-bit syscall handling into C.
 
 A series from Nick to streamline our TLB flushing when using the Radix MMU.
 
 Switch to using queued spinlocks by default for 64-bit server CPUs.
 
 A rework of our PCI probing so that it happens later in boot, when more generic
 infrastructure is available.
 
 Two small fixes to allow 32-bit little-endian processes to run on 64-bit
 kernels.
 
 Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira
   Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy,
   Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh
   Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus
   Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
   O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan
   Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng
   Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs
 Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz
 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT
 iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ
 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B
 eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883
 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW
 NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo
 TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK
 E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK
 nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB
 kpJkjOjWNJU/PQ==
 =dB2S
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A large series adding wrappers for our interrupt handlers, so that
   irq/nmi/user tracking can be isolated in the wrappers rather than
   spread in each handler.

 - Conversion of the 32-bit syscall handling into C.

 - A series from Nick to streamline our TLB flushing when using the
   Radix MMU.

 - Switch to using queued spinlocks by default for 64-bit server CPUs.

 - A rework of our PCI probing so that it happens later in boot, when
   more generic infrastructure is available.

 - Two small fixes to allow 32-bit little-endian processes to run on
   64-bit kernels.

 - Other smaller features, fixes & cleanups.

Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.

* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
  powerpc/perf: Adds support for programming of Thresholding in P10
  powerpc/pci: Remove unimplemented prototypes
  powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
  powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
  powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
  powerpc/64: Fix stack trace not displaying final frame
  powerpc/time: Remove get_tbl()
  powerpc/time: Avoid using get_tbl()
  spi: mpc52xx: Avoid using get_tbl()
  powerpc/syscall: Avoid storing 'current' in another pointer
  powerpc/32: Handle bookE debugging in C in syscall entry/exit
  powerpc/syscall: Do not check unsupported scv vector on PPC32
  powerpc/32: Remove the counter in global_dbcr0
  powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
  powerpc/syscall: implement system call entry/exit logic in C for PPC32
  powerpc/32: Always save non volatile GPRs at syscall entry
  powerpc/syscall: Change condition to check MSR_RI
  powerpc/syscall: Save r3 in regs->orig_r3
  powerpc/syscall: Use is_compat_task()
  powerpc/syscall: Make interrupt.c buildable on PPC32
  ...
2021-02-22 14:34:00 -08:00
David Stevens
4a42d848db KVM: x86/mmu: Consider the hva in mmu_notifier retry
Track the range being invalidated by mmu_notifier and skip page fault
retries if the fault address is not affected by the in-progress
invalidation. Handle concurrent invalidations by finding the minimal
range which includes all ranges being invalidated. Although the combined
range may include unrelated addresses and cannot be shrunk as individual
invalidation operations complete, it is unlikely the marginal gains of
proper range tracking are worth the additional complexity.

The primary benefit of this change is the reduction in the likelihood of
extreme latency when handing a page fault due to another thread having
been preempted while modifying host virtual addresses.

Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210222024522.1751719-3-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22 13:16:53 -05:00
Linus Torvalds
a99163e9e7 Devicetree updates for v5.12:
- Sync dtc to upstream version v1.6.0-51-g183df9e9c2b9 and build
   host fdtoverlay
 
 - Add kbuild support to build DT overlays (%.dtbo)
 
 - Drop NULLifying match table in of_match_device(). In preparation for
   this, there are several driver cleanups to use
   (of_)?device_get_match_data().
 
 - Drop pointless wrappers from DT struct device API
 
 - Convert USB binding schemas to use graph schema and remove old plain
   text graph binding doc
 
 - Convert spi-nor and v3d GPU bindings to DT schema
 
 - Tree wide schema fixes for if/then schemas, array size constraints,
   and undocumented compatible strings in examples
 
 - Handle 'no-map' correctly for already reserved memblock regions
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmAz1GEQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw55/D/955O2f5Gjp7bXvdoSucZtks/lqlC/eIAAw
 An5pjBL+o1urXsvafEMYemwmnwq/U4vy0aJRoAK1+MiI4masb56ET0KN5LsOudki
 b3M/O16RqGF31+blWyoxseZnZh6KsKzIRoE5XAtbr/QAnpdI0/5BgGoWSWYtDk2v
 LddL650/BieyvzdnFTLWCMAxd6DW0P9SI+8N3E+XlxAWCYQrLCqVELHbkrxAGCuN
 CggHIIiNf2K7z4UopVsGjnUwML9YRHXc9wOpF1c4CBrLu9TfDvdQ4OnNcnxcl/Sp
 E2FTHG0jSVm3VJRBbk4e68uvt3HrJJWsYnMtu2QTweGC/GbeUr9LJ0MIbSwp+rsL
 FEqCMFWOniq27eJBk6jHckaoBl93AHQlIGdJR/pFAi9Ijt32tUdVG5kqD/Tl+xKm
 njPcjVjWilr2ssfZ4tUggzPp2fjrau88ZS8qLja31vElzvULeA67KjEtG0RZAtwg
 ywfATiCaT096pR9v2VYuL/5NNnZFxHx3hWsOH1rcsyPk0BLguU3dkrAn28XBVQFd
 cOPfR3T/wsT0wHDht2aXPSM0hBiejFmvLhebGuJN9lqG+Pc1f87xiCT3pM7wymtJ
 iqTMrQ7dUgjQgU91PjatdB17tlnGHe0hh8AiuhQoPgOprpRKszG+rBFJLG3yRnl7
 QmLZnQTIhw==
 =9V4A
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Sync dtc to upstream version v1.6.0-51-g183df9e9c2b9 and build host
   fdtoverlay

 - Add kbuild support to build DT overlays (%.dtbo)

 - Drop NULLifying match table in of_match_device().

   In preparation for this, there are several driver cleanups to use
   (of_)?device_get_match_data().

 - Drop pointless wrappers from DT struct device API

 - Convert USB binding schemas to use graph schema and remove old plain
   text graph binding doc

 - Convert spi-nor and v3d GPU bindings to DT schema

 - Tree wide schema fixes for if/then schemas, array size constraints,
   and undocumented compatible strings in examples

 - Handle 'no-map' correctly for already reserved memblock regions

* tag 'devicetree-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (35 commits)
  driver core: platform: Drop of_device_node_put() wrapper
  of: Remove of_dev_{get,put}()
  dt-bindings: usb: Change descibe to describe in usbmisc-imx.txt
  dt-bindings: can: rcar_canfd: Group tuples in pin control properties
  dt-bindings: power: renesas,apmu: Group tuples in cpus properties
  dt-bindings: mtd: spi-nor: Convert to DT schema format
  dt-bindings: Use portable sort for version cmp
  dt-bindings: ethernet-controller: fix fixed-link specification
  dt-bindings: irqchip: Add node name to PRUSS INTC
  dt-bindings: interconnect: Fix the expected number of cells
  dt-bindings: Fix errors in 'if' schemas
  dt-bindings: iommu: renesas,ipmmu-vmsa: Make 'power-domains' conditionally required
  dt-bindings: Fix undocumented compatible strings in examples
  kbuild: Add support to build overlays (%.dtbo)
  scripts: dtc: Remove the unused fdtdump.c file
  scripts: dtc: Build fdtoverlay tool
  scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9
  scripts: dtc: Fetch fdtoverlay.c from external DTC project
  dt-bindings: thermal: sun8i: Fix misplaced schema keyword in compatible strings
  dt-bindings: iio: dac: Fix AD5686 references
  ...
2021-02-22 10:05:12 -08:00
Linus Torvalds
31caf8b2a8 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:
   - Restrict crypto_cipher to internal API users only.

  Algorithms:
   - Add x86 aesni acceleration for cts.
   - Improve x86 aesni acceleration for xts.
   - Remove x86 acceleration of some uncommon algorithms.
   - Remove RIPE-MD, Tiger and Salsa20.
   - Remove tnepres.
   - Add ARM acceleration for BLAKE2s and BLAKE2b.

  Drivers:
   - Add Keem Bay OCS HCU driver.
   - Add Marvell OcteonTX2 CPT PF driver.
   - Remove PicoXcell driver.
   - Remove mediatek driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (154 commits)
  hwrng: timeriomem - Use device-managed registration API
  crypto: hisilicon/qm - fix printing format issue
  crypto: hisilicon/qm - do not reset hardware when CE happens
  crypto: hisilicon/qm - update irqflag
  crypto: hisilicon/qm - fix the value of 'QM_SQC_VFT_BASE_MASK_V2'
  crypto: hisilicon/qm - fix request missing error
  crypto: hisilicon/qm - removing driver after reset
  crypto: octeontx2 - fix -Wpointer-bool-conversion warning
  crypto: hisilicon/hpre - enable Elliptic curve cryptography
  crypto: hisilicon - PASID fixed on Kunpeng 930
  crypto: hisilicon/qm - fix use of 'dma_map_single'
  crypto: hisilicon/hpre - tiny fix
  crypto: hisilicon/hpre - adapt the number of clusters
  crypto: cpt - remove casting dma_alloc_coherent
  crypto: keembay-ocs-aes - Fix 'q' assignment during CCM B0 generation
  crypto: xor - Fix typo of optimization
  hwrng: optee - Use device-managed registration API
  crypto: arm64/crc-t10dif - move NEON yield to C code
  crypto: arm64/aes-ce-mac - simplify NEON yield
  crypto: arm64/aes-neonbs - remove NEON yield calls
  ...
2021-02-21 17:23:56 -08:00
Masahiro Yamada
29c5c3ac63 arch: syscalls: remove $(srctree)/ prefix from syscall tables
The 'syscall' variables are not directly used in the commands.
Remove the $(srctree)/ prefix because we can rely on VPATH.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-22 08:22:03 +09:00
Masahiro Yamada
865fa29f7d arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
The rules in these Makefiles cannot detect the command line change
because the prerequisite 'FORCE' is missing.

Adding 'FORCE' will result in the headers being rebuilt every time
because the 'targets' additions are also wrong; the file paths in
'targets' must be relative to the current Makefile.

Fix all of them so the if_changed rules work correctly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2021-02-22 08:21:55 +09:00
Linus Torvalds
3e10585335 x86:
- Support for userspace to emulate Xen hypercalls
 - Raise the maximum number of user memslots
 - Scalability improvements for the new MMU.  Instead of the complex
   "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an
   rwlock so that page faults are concurrent, but the code that can run
   against page faults is limited.  Right now only page faults take the
   lock for reading; in the future this will be extended to some
   cases of page table destruction.  I hope to switch the default MMU
   around 5.12-rc3 (some testing was delayed due to Chinese New Year).
 - Cleanups for MAXPHYADDR checks
 - Use static calls for vendor-specific callbacks
 - On AMD, use VMLOAD/VMSAVE to save and restore host state
 - Stop using deprecated jump label APIs
 - Workaround for AMD erratum that made nested virtualization unreliable
 - Support for LBR emulation in the guest
 - Support for communicating bus lock vmexits to userspace
 - Add support for SEV attestation command
 - Miscellaneous cleanups
 
 PPC:
 - Support for second data watchpoint on POWER10
 - Remove some complex workarounds for buggy early versions of POWER9
 - Guest entry/exit fixes
 
 ARM64
 - Make the nVHE EL2 object relocatable
 - Cleanups for concurrent translation faults hitting the same page
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Simplification of the early init hypercall handling
 
 Non-KVM changes (with acks):
 - Detection of contended rwlocks (implemented only for qrwlocks,
   because KVM only needs it for x86)
 - Allow __DISABLE_EXPORTS from assembly code
 - Provide a saner follow_pfn replacements for modules
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmApSRgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc7wf9FnlinKoTFaSk7oeuuhF/CoCVwSFs
 Z9+A2sNI99tWHQxFR6dyDkEFeQoXnqSxfLHtUVIdH/JnTg0FkEvFz3NK+0PzY1PF
 PnGNbSoyhP58mSBG4gbBAxdF3ZJZMB8GBgYPeR62PvMX2dYbcHqVBNhlf6W4MQK4
 5mAUuAnbf19O5N267sND+sIg3wwJYwOZpRZB7PlwvfKAGKf18gdBz5dQ/6Ej+apf
 P7GODZITjqM5Iho7SDm/sYJlZprFZT81KqffwJQHWFMEcxFgwzrnYPx7J3gFwRTR
 eeh9E61eCBDyCTPpHROLuNTVBqrAioCqXLdKOtO5gKvZI3zmomvAsZ8uXQ==
 =uFZU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "x86:

   - Support for userspace to emulate Xen hypercalls

   - Raise the maximum number of user memslots

   - Scalability improvements for the new MMU.

     Instead of the complex "fast page fault" logic that is used in
     mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
     but the code that can run against page faults is limited. Right now
     only page faults take the lock for reading; in the future this will
     be extended to some cases of page table destruction. I hope to
     switch the default MMU around 5.12-rc3 (some testing was delayed
     due to Chinese New Year).

   - Cleanups for MAXPHYADDR checks

   - Use static calls for vendor-specific callbacks

   - On AMD, use VMLOAD/VMSAVE to save and restore host state

   - Stop using deprecated jump label APIs

   - Workaround for AMD erratum that made nested virtualization
     unreliable

   - Support for LBR emulation in the guest

   - Support for communicating bus lock vmexits to userspace

   - Add support for SEV attestation command

   - Miscellaneous cleanups

  PPC:

   - Support for second data watchpoint on POWER10

   - Remove some complex workarounds for buggy early versions of POWER9

   - Guest entry/exit fixes

  ARM64:

   - Make the nVHE EL2 object relocatable

   - Cleanups for concurrent translation faults hitting the same page

   - Support for the standard TRNG hypervisor call

   - A bunch of small PMU/Debug fixes

   - Simplification of the early init hypercall handling

  Non-KVM changes (with acks):

   - Detection of contended rwlocks (implemented only for qrwlocks,
     because KVM only needs it for x86)

   - Allow __DISABLE_EXPORTS from assembly code

   - Provide a saner follow_pfn replacements for modules"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
  KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
  KVM: selftests: Don't bother mapping GVA for Xen shinfo test
  KVM: selftests: Fix hex vs. decimal snafu in Xen test
  KVM: selftests: Fix size of memslots created by Xen tests
  KVM: selftests: Ignore recently added Xen tests' build output
  KVM: selftests: Add missing header file needed by xAPIC IPI tests
  KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
  KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
  locking/arch: Move qrwlock.h include after qspinlock.h
  KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
  KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
  KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
  KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
  KVM: PPC: remove unneeded semicolon
  KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
  KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
  KVM: PPC: Book3S HV: Fix radix guest SLB side channel
  KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
  KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
  KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
  ...
2021-02-21 13:31:43 -08:00
Linus Torvalds
d310ec03a3 The performance event updates for v5.12 are:
- Add CPU-PMU support for Intel Sapphire Rapids CPUs
 
  - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter
    sampling event feedback. Not used yet, but is intended for Golden Cove
    CPU-PMU, which can provide both the instruction latency and the cache
    latency information for memory profiling events.
 
  - Remove experimental, default-disabled perfmon-v4 counter_freezing support
    that could only be enabled via a boot option. The hardware is hopelessly
    broken, we'd like to make sure nobody starts relying on this, as it would
    only end in tears.
 
  - Fix energy/power events on Intel SPR platforms
 
  - Simplify the uprobes resume_execution() logic
 
  - Misc smaller fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtf7kRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iJ2xAAvygKF8hm/UAGyT2R3iEruO49wRrmUfgt
 13iBBA1DotKw2b8F5UN5MqjfwS8UgGKuAd8agvQ6XXANpnJ5mpy0nrzgjEXUx4j+
 sQUqL7vxSdZ5J3kKblSZ4QoMzLVYSUkEDmw818vsa4eFWN8z58FJsv+ySegIFbXx
 +I3hF1O9a8MERZBUz4T5xHlgcbSDGEX6EvYRcO+zZ0rXfARfo9StfHYv1V53j6iY
 EOotFEKEn/5naczAd/sQo1SE1IgHtX2cbjOaKF7LulgEwZQWHpdKq0gww6nFK5yz
 XMSE9oXAFXRkRCJbrSqC0Dvrrf8hdlxWbKYbj9L7XILoxw199AdOBDbliJm6P/UH
 6+JSEu/N4R0TFYc7TX6yef7ncw12e+64USjKOlWWwww97rVWWH1/tFTdlXhS6s+d
 jVI3yEECKyZlddrDdsetRdUj+QKyZQfDqbMXPXiDTv9P6AFqBvNLZYT0UPU3akk5
 jXueHJQYSSgqnN+eRaIwvm4ZYWa031YHJXxiq2E89RnzL4JJArBYaddpukgxTYka
 c6Tn8L7f4zP5Bghu7hHv5Vy69i1N/3YvzUoYc6ljjmapgAJzxzq/yoEKrBlKnjtA
 MrstHhnwnPJl+PKjlbLpjl74rtcCiKJxjVhm+a5UbEcYoVuzJ86lmQK2WrLaoCTU
 B/zFplUF8C4=
 =BCcg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:

 - Add CPU-PMU support for Intel Sapphire Rapids CPUs

 - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer
   two-parameter sampling event feedback. Not used yet, but is intended
   for Golden Cove CPU-PMU, which can provide both the instruction
   latency and the cache latency information for memory profiling
   events.

 - Remove experimental, default-disabled perfmon-v4 counter_freezing
   support that could only be enabled via a boot option. The hardware is
   hopelessly broken, we'd like to make sure nobody starts relying on
   this, as it would only end in tears.

 - Fix energy/power events on Intel SPR platforms

 - Simplify the uprobes resume_execution() logic

 - Misc smaller fixes.

* tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Fix psys-energy event on Intel SPR platform
  perf/x86/rapl: Only check lower 32bits for RAPL energy counters
  perf/x86/rapl: Add msr mask support
  perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[]
  perf/x86/intel: Support CPUID 10.ECX to disable fixed counters
  perf/x86/intel: Add perf core PMU support for Sapphire Rapids
  perf/x86/intel: Filter unsupported Topdown metrics event
  perf/x86/intel: Factor out intel_update_topdown_event()
  perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT
  perf/intel: Remove Perfmon-v4 counter_freezing support
  x86/perf: Use static_call for x86_pmu.guest_get_msrs
  perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info
  perf/x86/intel/uncore: Store the logical die id instead of the physical die id.
  x86/kprobes: Do not decode opcode in resume_execution()
2021-02-21 12:49:32 -08:00
Linus Torvalds
657bd90c93 Scheduler updates for v5.12:
[ NOTE: unfortunately this tree had to be freshly rebased today,
         it's a same-content tree of 82891be90f3c (-next published)
         merged with v5.11.
 
         The main reason for the rebase was an authorship misattribution
         problem with a new commit, which we noticed in the last minute,
         and which we didn't want to be merged upstream. The offending
         commit was deep in the tree, and dependent commits had to be
         rebased as well. ]
 
 - Core scheduler updates:
 
   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full),
     to allow distros to build a PREEMPT kernel but fall back to
     close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling
     behavior via a boot time selection.
 
     There's also the /debug/sched_debug switch to do this runtime.
 
     This feature is implemented via runtime patching (a new variant of static calls).
 
     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.
 
     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )
 
     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority
     of distributions, are supposed to be unaffected.
 
   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.
 
     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address
     the underlying issue of missed preemption events. These are the
     initial fixes that should fix current incarnations of the bug.
 
   - Clean up rbtree usage in the scheduler, by providing & using the following
     consistent set of rbtree APIs:
 
      partial-order; less() based:
        - rb_add(): add a new entry to the rbtree
        - rb_add_cached(): like rb_add(), but for a rb_root_cached
 
      total-order; cmp() based:
        - rb_find(): find an entry in an rbtree
        - rb_find_add(): find an entry, and add if not found
 
        - rb_find_first(): find the first (leftmost) matching entry
        - rb_next_match(): continue from rb_find_first()
        - rb_for_each(): iterate a sub-tree using the previous two
 
   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass.
     This is a 4-commit series where each commit improves one aspect of the idle
     sibling scan logic.
 
   - Improve the cpufreq cooling driver by getting the effective CPU utilization
     metrics from the scheduler
 
   - Improve the fair scheduler's active load-balancing logic by reducing the number
     of active LB attempts & lengthen the load-balancing interval. This improves
     stress-ng mmapfork performance.
 
   - Fix CFS's estimated utilization (util_est) calculation bug that can result in
     too high utilization values
 
 - Misc updates & fixes:
 
    - Fix the HRTICK reprogramming & optimization feature
    - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
    - Reduce dl_add_task_root_domain() overhead
    - Fix uprobes refcount bug
    - Process pending softirqs in flush_smp_call_function_from_idle()
    - Clean up task priority related defines, remove *USER_*PRIO and
      USER_PRIO()
    - Simplify the sched_init_numa() deduplication sort
    - Documentation updates
    - Fix EAS bug in update_misfit_status(), which degraded the quality
      of energy-balancing
    - Smaller cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt
 M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk
 zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/
 9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT
 ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9
 pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E
 i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB
 WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo
 JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n
 k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+
 Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr
 kYgQa23dsZk=
 =pyaG
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler updates:

   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full), to allow
     distros to build a PREEMPT kernel but fall back to close to
     PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
     a boot time selection.

     There's also the /debug/sched_debug switch to do this runtime.

     This feature is implemented via runtime patching (a new variant of
     static calls).

     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.

     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )

     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
     majority of distributions, are supposed to be unaffected.

   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.

     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address the
     underlying issue of missed preemption events. These are the initial
     fixes that should fix current incarnations of the bug.

   - Clean up rbtree usage in the scheduler, by providing & using the
     following consistent set of rbtree APIs:

       partial-order; less() based:
         - rb_add(): add a new entry to the rbtree
         - rb_add_cached(): like rb_add(), but for a rb_root_cached

       total-order; cmp() based:
         - rb_find(): find an entry in an rbtree
         - rb_find_add(): find an entry, and add if not found

         - rb_find_first(): find the first (leftmost) matching entry
         - rb_next_match(): continue from rb_find_first()
         - rb_for_each(): iterate a sub-tree using the previous two

   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
     single pass. This is a 4-commit series where each commit improves
     one aspect of the idle sibling scan logic.

   - Improve the cpufreq cooling driver by getting the effective CPU
     utilization metrics from the scheduler

   - Improve the fair scheduler's active load-balancing logic by
     reducing the number of active LB attempts & lengthen the
     load-balancing interval. This improves stress-ng mmapfork
     performance.

   - Fix CFS's estimated utilization (util_est) calculation bug that can
     result in too high utilization values

  Misc updates & fixes:

   - Fix the HRTICK reprogramming & optimization feature

   - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code

   - Reduce dl_add_task_root_domain() overhead

   - Fix uprobes refcount bug

   - Process pending softirqs in flush_smp_call_function_from_idle()

   - Clean up task priority related defines, remove *USER_*PRIO and
     USER_PRIO()

   - Simplify the sched_init_numa() deduplication sort

   - Documentation updates

   - Fix EAS bug in update_misfit_status(), which degraded the quality
     of energy-balancing

   - Smaller cleanups"

* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  sched,x86: Allow !PREEMPT_DYNAMIC
  entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
  entry: Explicitly flush pending rcuog wakeup before last rescheduling point
  rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
  rcu/nocb: Perform deferred wake up before last idle's need_resched() check
  rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
  sched/features: Distinguish between NORMAL and DEADLINE hrtick
  sched/features: Fix hrtick reprogramming
  sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
  uprobes: (Re)add missing get_uprobe() in __find_uprobe()
  smp: Process pending softirqs in flush_smp_call_function_from_idle()
  sched: Harden PREEMPT_DYNAMIC
  static_call: Allow module use without exposing static_call_key
  sched: Add /debug/sched_preempt
  preempt/dynamic: Support dynamic preempt with preempt= boot option
  preempt/dynamic: Provide irqentry_exit_cond_resched() static call
  preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
  preempt/dynamic: Provide cond_resched() and might_resched() static calls
  preempt: Introduce CONFIG_PREEMPT_DYNAMIC
  static_call: Provide DEFINE_STATIC_CALL_RET0()
  ...
2021-02-21 12:35:04 -08:00
Linus Torvalds
24880bef41 Remove oprofile and dcookies support
The "oprofile" user-space tools don't use the kernel OPROFILE support any more,
 and haven't in a long time. User-space has been converted to the perf
 interfaces.
 
 The dcookies stuff is only used by the oprofile code. Now that oprofile's
 support is getting removed from the kernel, there is no need for dcookies as
 well.
 
 Remove kernel's old oprofile and dcookies support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJgJMEVAAoJENK5HDyugRIcL8YP/jkmXH5CZT80ntcqrJGWKcG7
 lWbach7uNeQteht7B1ZPKvojxizTkmfrN2sClX0B2hbGkc5TiWUQ2ZSnvnfWDZ8+
 z2qQcEB11G/ReL2vvRk1fJlWdAOyUfrPee/44AkemnLRv+Niw/8PqnGd87yDQGsK
 qy5E1XXfbjUq6Y/uMiLOX3+21I6w6o2Q6I3NNXC93s0wS3awqnft8n0XBC7iAPBj
 eowRJxpdRU2Vcuj8UOzzOI7gQlwdjwYImyLPbRy/V8NawC8a+FHrPrf5/GCYlVzl
 7TGFBsDQSmzvrBChUfoGz1Rq/VZ1a357p5rhRqemfUrdkjW+vyzelnD8I1W/hb2o
 SmBXoPoyl3+UkFHNyJI0mI7obaV+2PzyXMV0JIQUj+IiX/mfeFv0nF4XfZD2IkRt
 6xhaYj775Zrx32iBdGZIvvLg5Gh9ZkZmR5vJ7Fi/EIZFe6Z+bZnPKUROnAgS/o0z
 +UkSygOhgo/1XbqrzZVk1iweWeu+EUMbY4YQv2qVnFhpvsq4ieThcUGQpWcxGjjH
 WP8O0n1yq1slsnpUtxhiTsm46ENajx9zZp6Iv6Ws+NM0RUqjND8BdF1co9WGD3LS
 cnZMFBs4Bg/V1HICL/D4s6L7t1ofrEXIgJH1y3iF0HeECq03mU4CgA/qly9Aebqg
 UxPF3oNlVOPlds9FzsU2
 =I2Ac
 -----END PGP SIGNATURE-----

Merge tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux

Pull oprofile and dcookies removal from Viresh Kumar:
 "Remove oprofile and dcookies support

  The 'oprofile' user-space tools don't use the kernel OPROFILE support
  any more, and haven't in a long time. User-space has been converted to
  the perf interfaces.

  The dcookies stuff is only used by the oprofile code. Now that
  oprofile's support is getting removed from the kernel, there is no
  need for dcookies as well.

  Remove kernel's old oprofile and dcookies support"

* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
  fs: Remove dcookies support
  drivers: Remove CONFIG_OPROFILE support
  arch: xtensa: Remove CONFIG_OPROFILE support
  arch: x86: Remove CONFIG_OPROFILE support
  arch: sparc: Remove CONFIG_OPROFILE support
  arch: sh: Remove CONFIG_OPROFILE support
  arch: s390: Remove CONFIG_OPROFILE support
  arch: powerpc: Remove oprofile
  arch: powerpc: Stop building and using oprofile
  arch: parisc: Remove CONFIG_OPROFILE support
  arch: mips: Remove CONFIG_OPROFILE support
  arch: microblaze: Remove CONFIG_OPROFILE support
  arch: ia64: Remove rest of perfmon support
  arch: ia64: Remove CONFIG_OPROFILE support
  arch: hexagon: Don't select HAVE_OPROFILE
  arch: arc: Remove CONFIG_OPROFILE support
  arch: arm: Remove CONFIG_OPROFILE support
  arch: alpha: Remove CONFIG_OPROFILE support
2021-02-21 10:40:34 -08:00
Linus Torvalds
591fd30eee Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ELF compat updates from Al Viro:
 "Sanitizing ELF compat support, especially for triarch architectures:

   - X32 handling cleaned up

   - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now

   - Kconfig side of things regularized

  Eventually I hope to have compat_binfmt_elf.c killed, with both native
  and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
  by kbuild, but that's a separate story - not included here"

* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of COMPAT_ELF_EXEC_PAGESIZE
  compat_binfmt_elf: don't bother with undef of ELF_ARCH
  Kconfig: regularize selection of CONFIG_BINFMT_ELF
  mips compat: switch to compat_binfmt_elf.c
  mips: don't bother with ELF_CORE_EFLAGS
  mips compat: don't bother with ELF_ET_DYN_BASE
  mips: KVM_GUEST makes no sense for 64bit builds...
  mips: kill unused definitions in binfmt_elf[on]32.c
  mips binfmt_elf*32.c: use elfcore-compat.h
  x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
  [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
  elf_prstatus: collect the common part (everything before pr_reg) into a struct
  binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-21 09:29:23 -08:00
Linus Torvalds
51e6d17809 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Here is what we have this merge window:

   1) Support SW steering for mlx5 Connect-X6Dx, from Yevgeny Kliteynik.

   2) Add RSS multi group support to octeontx2-pf driver, from Geetha
      Sowjanya.

   3) Add support for KS8851 PHY. From Marek Vasut.

   4) Add support for GarfieldPeak bluetooth controller from Kiran K.

   5) Add support for half-duplex tcan4x5x can controllers.

   6) Add batch skb rx processing to bcrm63xx_enet, from Sieng Piaw
      Liew.

   7) Rework RX port offload infrastructure, particularly wrt, UDP
      tunneling, from Jakub Kicinski.

   8) Add BCM72116 PHY support, from Florian Fainelli.

   9) Remove Dsa specific notifiers, they are unnecessary. From Vladimir
      Oltean.

  10) Add support for picosecond rx delay in dwmac-meson8b chips. From
      Martin Blumenstingl.

  11) Support TSO on xfrm interfaces from Eyal Birger.

  12) Add support for MP_PRIO to mptcp stack, from Geliang Tang.

  13) Support BCM4908 integrated switch, from Rafał Miłecki.

  14) Support for directly accessing kernel module variables via module
      BTF info, from Andrii Naryiko.

  15) Add DASH (esktop and mobile Architecture for System Hardware)
      support to r8169 driver, from Heiner Kallweit.

  16) Add rx vlan filtering to dpaa2-eth, from Ionut-robert Aron.

  17) Add support for 100 base0x SFP devices, from Bjarni Jonasson.

  18) Support link aggregation in DSA, from Tobias Waldekranz.

  19) Support for bitwidse atomics in bpf, from Brendan Jackman.

  20) SmartEEE support in at803x driver, from Russell King.

  21) Add support for flow based tunneling to GTP, from Pravin B Shelar.

  22) Allow arbitrary number of interconnrcts in ipa, from Alex Elder.

  23) TLS RX offload for bonding, from Tariq Toukan.

  24) RX decap offklload support in mac80211, from Felix Fietkou.

  25) devlink health saupport in octeontx2-af, from George Cherian.

  26) Add TTL attr to SCM_TIMESTAMP_OPT_STATS, from Yousuk Seung

  27) Delegated actionss support in mptcp, from Paolo Abeni.

  28) Support receive timestamping when doin zerocopy tcp receive. From
      Arjun Ray.

  29) HTB offload support for mlx5, from Maxim Mikityanskiy.

  30) UDP GRO forwarding, from Maxim Mikityanskiy.

  31) TAPRIO offloading in dsa hellcreek driver, from Kurt Kanzenbach.

  32) Weighted random twos choice algorithm for ipvs, from Darby Payne.

  33) Fix netdev registration deadlock, from Johannes Berg.

  34) Various conversions to new tasklet api, from EmilRenner Berthing.

  35) Bulk skb allocations in veth, from Lorenzo Bianconi.

  36) New ethtool interface for lane setting, from Danielle Ratson.

  37) Offload failiure notifications for routes, from Amit Cohen.

  38) BCM4908 support, from Rafał Miłecki.

  39) Support several new iwlwifi chips, from Ihab Zhaika.

  40) Flow drector support for ipv6 in i40e, from Przemyslaw Patynowski.

  41) Support for mhi prrotocols, from Loic Poulain.

  42) Optimize bpf program stats.

  43) Implement RFC6056, for better port randomization, from Eric
      Dumazet.

  44) hsr tag offloading support from George McCollister.

  45) Netpoll support in qede, from Bhaskar Upadhaya.

  46) 2005/400g speed support in bonding 3ad mode, from Nikolay
      Aleksandrov.

  47) Netlink event support in mptcp, from Florian Westphal.

  48) Better skbuff caching, from Alexander Lobakin.

  49) MRP (Media Redundancy Protocol) offloading in DSA and a few
      drivers, from Horatiu Vultur.

  50) mqprio saupport in mvneta, from Maxime Chevallier.

  51) Remove of_phy_attach, no longer needed, from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1766 commits)
  octeontx2-pf: Fix otx2_get_fecparam()
  cteontx2-pf: cn10k: Prevent harmless double shift bugs
  net: stmmac: Add PCI bus info to ethtool driver query output
  ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary
  ptp: ptp_clockmatrix: Simplify code - remove unnecessary `err` variable.
  ptp: ptp_clockmatrix: Coding style - tighten vertical spacing.
  ptp: ptp_clockmatrix: Clean-up dev_*() messages.
  ptp: ptp_clockmatrix: Remove unused header declarations.
  ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
  ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
  net: stmmac: dwmac-sun8i: Add a shutdown callback
  net: stmmac: dwmac-sun8i: Minor probe function cleanup
  net: stmmac: dwmac-sun8i: Use reset_control_reset
  net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check
  net: stmmac: dwmac-sun8i: Return void from PHY unpower
  r8169: use macro pm_ptr
  net: mdio: Remove of_phy_attach()
  net: mscc: ocelot: select PACKING in the Kconfig
  net: re-solve some conflicts after net -> net-next merge
  net: dsa: tag_rtl4_a: Support also egress tags
  ...
2021-02-20 17:45:32 -08:00
Dietmar Eggemann
9d061ba6bc sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48d ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddc ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

                = MAX_PRIO - MAX_RT_PRIO

       MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
2021-02-17 14:08:17 +01:00
Rob Herring
83c4a4eec0 of: Remove of_dev_{get,put}()
of_dev_get() and of_dev_put are just wrappers for get_device()/put_device()
on a platform_device. There's also already platform_device_{get,put}()
wrappers for this purpose. Let's update the few users and remove
of_dev_{get,put}().

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Gilles Muller <Gilles.Muller@inria.fr>
Cc: Nicolas Palix <nicolas.palix@imag.fr>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-usb@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210211232745.1498137-2-robh@kernel.org
2021-02-12 19:23:39 -06:00
Paolo Bonzini
8c6e67bec3 KVM/arm64 updates for Linux 5.12
- Make the nVHE EL2 object relocatable, resulting in much more
   maintainable code
 - Handle concurrent translation faults hitting the same page
   in a more elegant way
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Allow the disabling of symbol export from assembly code
 - Simplification of the early init hypercall handling
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmAmjqEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoUEQAIrJ7YF4v4gz06a0HG9+b6fbmykHyxlG7jfm
 trvctfaiKzOybKoY5odPpNFzhbYOOdXXqYipyTHGwBYtGSy9G/9SjMKSUrfln2Ni
 lr1wBqapr9TE+SVKoR8pWWuZxGGbHVa7brNuMbMsMi1wwAsM2/n70H9PXrdq3QiK
 Ge1DWLso2oEfhtTwqNKa4dwB2MHjBhBFhhq+Nq5pslm6mmxJaYqz7pyBmw/C+2cc
 oU/6kpAa1yPAauptWXtYXJYOMHihxgEa1IdK3Gl0hUyFyu96xVkwH/KFsj+bRs23
 QGGCSdy4313hzaoGaSOTK22R98Aeg0wI9a6tcCBvVVjTAztnlu1FPtUZr8e/F7uc
 +r8xVJUJFiywt3Zktf/D7YDK9LuMMqFnj0BkI4U9nIBY59XZRNhENsBCmjru5lnL
 iXa5cuta03H4emfssIChLpgn0XHFas6t5dFXBPGbXyw0qsQchTw98iQX9LVxefUK
 rOUGPIN4nE9ESRIZe0SPlAVeCtNP8cLH7+0YG9MJ1QeDVYaUsnvy9Ln/ox+514mR
 5y2KJ6y7xnLB136SKCzPDDloYtz7BDiJq6a/RPiXKGheKoxy+N+BSe58yWCqFZYE
 Fx/cGUr7oSg39U7gCboog6BDp5e2CXBfbRllg6P47bZFfdPNwzNEzHvk49VltMxx
 Rl2W05bk
 =6EwV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.12

- Make the nVHE EL2 object relocatable, resulting in much more
  maintainable code
- Handle concurrent translation faults hitting the same page
  in a more elegant way
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Allow the disabling of symbol export from assembly code
- Simplification of the early init hypercall handling
2021-02-12 11:23:44 -05:00
Ingo Molnar
a3251c1a36 Merge branch 'x86/paravirt' into x86/entry
Merge in the recent paravirt changes to resolve conflicts caused
by objtool annotations.

Conflicts:
	arch/x86/xen/xen-asm.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-12 13:36:43 +01:00
Linus Torvalds
dcc0b49040 powerpc fixes for 5.11 #8
One fix for a regression seen in io_uring, introduced by our support for KUAP
 (Kernel User Access Prevention) with the Hash MMU.
 
 Thanks to: Aneesh Kumar K.V, Zorro Lang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAlIXsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBemD/4pl6wsnmPnoxAKSL1jeJ6XXDOeBRJF
 gCk6/NgnH7YUOrQcHhme6Rbs9CMaitidw8tqRwexyjs2AmcAF8lelqsRJHcFPgWt
 AYGR0sb00ThI5iMo4heZKMQ/swmHikhZRnH+bx9+fHFgIVoLLCU9bsATSzAM3RXM
 +s+bVn8S4dYFx8imC4tYmki2FpZAUzcWhEMP6gTfHoUeP7+GIybDcERBbVoQfBGj
 Gqq0nnJxjgXCs9yY/11QevQhqqbeKIyXLvRqm09wGWQacJGti2dd2X5hBWGUeNa1
 SMx7ZpnfpGqLudyJBA6XUUBtfbuLZANTJe7CAtYtaYZJ3u75Wd+I5UaHypI+LibC
 WvaUC89C/y9DDlhyZqntqiiaYPxvNQxauhbjX4MBJdCXaAZB3GpzqaDCkXB3WUSN
 yXlx9hJpFmmumWelWqiuLTdtfMuehj9okLvQ7OSBI5ueQdoxrYw2jeGJPsj3+Zrq
 okeMgrDpx3yq4AmcLkTppcYkqT9OJ59A52vk1rCMev0kz44o0sjF0ebiHNpoXZe+
 mVRF9RbxYYYCd2YojOqpLm3nQmx0TbB2DSSmE+tkzMS7BZ/ERBtvqzJ0EGmc6af/
 zc+K1JoWAMLrAqYX5BgKptVRh9NPcEL08cS6ZrLLXZDYkpcUQAeiVN5aIvc8wamH
 iCwIf5hSyKeQ9g==
 =iXnC
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One fix for a regression seen in io_uring, introduced by our support
  for KUAP (Kernel User Access Prevention) with the Hash MMU.

  Thanks to Aneesh Kumar K.V, and Zorro Lang"

* tag 'powerpc-5.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
2021-02-11 15:41:07 -08:00
Masahiro Yamada
052c805a18 kbuild: LD_VERSION redenomination
Commit ccbef1674a ("Kbuild, lto: add ld-version and ld-ifversion
macros") introduced scripts/ld-version.sh for GCC LTO.

At that time, this script handled 5 version fields because GCC LTO
needed the downstream binutils. (https://lkml.org/lkml/2014/4/8/272)

The code snippet from the submitted patch was as follows:

    # We need HJ Lu's Linux binutils because mainline binutils does not
    # support mixing assembler and LTO code in the same ld -r object.
    # XXX check if the gcc plugin ld is the expected one too
    # XXX some Fedora binutils should also support it. How to check for that?
    ifeq ($(call ld-ifversion,-ge,22710001,y),y)
        ...

However, GCC LTO was not merged into the mainline after all.
(https://lkml.org/lkml/2014/4/8/272)

So, the 4th and 5th fields were never used, and finally removed by
commit 0d61ed17dd ("ld-version: Drop the 4th and 5th version
components").

Since then, the last 4-digits returned by this script is always zeros.

Remove the meaningless last 4-digits. This makes the version format
consistent with GCC_VERSION, CLANG_VERSION, LLD_VERSION.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-12 05:12:31 +09:00
Kajol Jain
82d2c16b35 powerpc/perf: Adds support for programming of Thresholding in P10
Thresholding, a performance monitoring unit feature, can be
used to identify marked instructions which take more than
expected cycles between start event and end event.
Threshold compare (thresh_cmp) bits are programmed in MMCRA
register. In Power9, thresh_cmp bits were part of the
event code. But in case of P10, thresh_cmp are not part of
event code due to inclusion of MMCR3 bits.

Patch here adds an option to use attr.config1 variable
to be used to pass thresh_cmp value to be programmed in
MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1
has been added and this flag is used to notify the use of
attr.config1 variable.

Patch has extended the parameter list of 'compute_mmcr',
to include power_pmu's 'flags' element and parameter list of
get_constraint to include attr.config1 value. It also extend
parameter list of power_check_constraints inorder to pass
perf_event list.

As stated by commit ef0e3b650f ("powerpc/perf: Fix Threshold
Event Counter Multiplier width for P10"), constraint bits for
thresh_cmp is also needed to be increased to 11 bits, which is
handled as part of this patch. We added bit number 53 as part
of constraint bits of thresh_cmp for power10 to make it an
11 bit field.

Updated layout for p10:

/*
 * Layout of constraint bits:
 *
 *        60        56        52        48        44        40        36        32
 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
 *   [   fab_match   ]         [       thresh_cmp      ] [   thresh_ctl    ] [   ]
 *                                          |                                  |
 *                           [  thresh_cmp bits for p10]           thresh_sel -*
 *
 *        28        24        20        16        12         8         4         0
 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
 *               [ ] |   [ ] |  [  sample ]   [     ]   [6] [5]   [4] [3]   [2] [1]
 *                |  |    |  |                  |
 *      BHRB IFM -*  |    |  |*radix_scope      |      Count of events for each PMC.
 *              EBB -*    |                     |        p1, p2, p3, p4, p5, p6.
 *      L1 I/D qualifier -*                     |
 *                     nc - number of counters -*
 *
 * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints
 * we want the low bit of each field to be added to any existing value.
 *
 * Everything else is a value field.
 */

Result:
command#: cat /sys/devices/cpu/format/thresh_cmp
config1:0-17

ex. usage:

command#: perf record -I --weight -d  -e
	 cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096
1826636 records/s
real  2.00 s
user  2.00 s
sys   0.00 s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ]

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210209095234.837356-1-kjain@linux.ibm.com
2021-02-11 23:35:36 +11:00
Oliver O'Halloran
b3abe590c8 powerpc/pci: Remove unimplemented prototypes
The corresponding definitions were deleted in commit 3d5134ee83
("[POWERPC] Rewrite IO allocation & mapping on powerpc64") which
was merged a mere 13 years ago.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902035138.1762531-1-oohall@gmail.com
2021-02-11 23:35:36 +11:00
Christophe Leroy
052f9d206f powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
Since commit 17bc43367f ("powerpc/uaccess: Implement
unsafe_copy_to_user() as a simple loop"), raw_copy_to_user_allowed()
is only used by raw_copy_to_user().

Merge raw_copy_to_user_allowed() into raw_copy_to_user().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3ae114740317187e12edbd5ffa9157cb8c396dea.1612879284.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:35 +11:00
Christophe Leroy
95d019e0f9 powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
__put_user_size_allowed() is only called from __put_user_size() now.

Merge them together.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b3baeaec1ee2fbdc653bb6fb27b0be5b846163ef.1612879284.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:35 +11:00
Christophe Leroy
6b385d1d7c powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
Copied from commit 4b842e4e25 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()")

Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99d4ccb58a20d8408d0e19874393655ad5b40822.1612879284.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:35 +11:00
Michael Ellerman
e3de1e291f powerpc/64: Fix stack trace not displaying final frame
In commit bf13718bc5 ("powerpc: show registers when unwinding
interrupt frames") we changed our stack dumping logic to show the full
registers whenever we find an interrupt frame on the stack.

However we didn't notice that on 64-bit this doesn't show the final
frame, ie. the interrupt that brought us in from userspace, whereas on
32-bit it does.

That is due to confusion about the size of that last frame. The code
in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE
to check the sp is at least that far below the top of the stack.

However on 64-bit that size is too large for the final frame, because
it includes the red zone, but we don't allocate a red zone for the
first frame.

So add a new define that encodes the correct size for 32-bit and
64-bit, and use it in show_stack().

This results in the full trace being shown on 64-bit, eg:

  sysrq: Trigger a crash
  Kernel panic - not syncing: sysrq triggered crash
  CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649
  Call Trace:
  [c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable)
  [c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c
  [c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50
  [c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210
  [c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188
  [c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0
  [c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360
  [c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140
  [c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0
  [c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff9fbab428
  NIP:  00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000
  REGS: c00000000a1c3e80 TRAP: 0c00   Not tainted  (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty)
  MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 22002884  XER: 00000000
  IRQMASK: 0
  GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001
  GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063
  GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0
  GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000
  GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0
  GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001
  NIP [00007fff9fbab428] 0x7fff9fbab428
  LR [000000001000b724] 0x1000b724
  --- interrupt: c00

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210209141627.2898485-1-mpe@ellerman.id.au
2021-02-11 23:35:14 +11:00
Christophe Leroy
132f94f133 powerpc/time: Remove get_tbl()
There are no more users of get_tbl(). Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dd0368bfd497ffe06b31ee1b5f2ebf7760e30900.1612866360.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:13 +11:00
Christophe Leroy
55d68df623 powerpc/time: Avoid using get_tbl()
get_tbl() is confusing as it returns the content TBL register
on PPC32 but the concatenation of TBL and TBU on PPC64.

Use mftb() instead.

This will allow the removal of get_tbl() in a following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/decefb47c8a2070bf55d20b096b813908c7b3110.1612866360.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:13 +11:00
Christophe Leroy
5b90b9661a powerpc/syscall: Avoid storing 'current' in another pointer
By saving the pointer pointing to thread_info.flags, gcc copies r2
in a non-volatile register.

We know 'current' doesn't change, so avoid that intermediaite pointer.

Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles)

On PPC64, gcc seems to know that 'current' is not changing, and it keeps
it in a non volatile register to avoid multiple read of 'current' in paca.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ad0363ff0ff8c125f40e1cdc589a85bbd7e31693.1612946484.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:13 +11:00
Christophe Leroy
d524dda719 powerpc/32: Handle bookE debugging in C in syscall entry/exit
The handling of SPRN_DBCR0 and other registers can easily
be done in C instead of ASM.

For that, create booke_load_dbcr0() and booke_restore_dbcr0().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a7515f9258b27a9177de88491a8bb79b255ceb7.1612898425.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:12 +11:00
Christophe Leroy
b966f22790 powerpc/syscall: Do not check unsupported scv vector on PPC32
Only book3s/64 has scv. No need to check the 0x7ff0 trap on 32 or 64e.
For that, add a helper trap_is_unsupported_scv() similar to
trap_is_scv().

And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles
346 => 332 cycles)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb87b205ae8eb8c623f33bb316801acf95a831e6.1612898425.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:12 +11:00
Christophe Leroy
eb595eca74 powerpc/32: Remove the counter in global_dbcr0
global_dbcr0 has two parts, 4 bytes to save/restore the
value of SPRN_DBCR0, and 4 bytes that are incremented/decremented
everytime something is saving/loading the above value.

This counter is only incremented/decremented, its value is never
used and never read.

Remove the counter and devide the size of global_dbcr0 by 2.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7e381dc58b3f583556cfab37ba5d813bfd5cce1e.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:12 +11:00
Christophe Leroy
4d67facbcb powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
system_call_exception() checks MSR_PR and BUGs if a syscall
is issued from kernel mode.

No need to handle it anymore from the ASM entry code.

null_syscall reduction 2 cycles (348 => 346 cycles)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1eddb42cb12092b1e3d72608d182c365db3da41d.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:11 +11:00
Christophe Leroy
6f76a01173 powerpc/syscall: implement system call entry/exit logic in C for PPC32
That's port of PPC64 syscall entry/exit logic in C to PPC32.

Performancewise on 8xx:
Before : 304 cycles on null_syscall
After  : 348 cycles on null_syscall

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a93b08e1275e9d1f0b1c39043d1b827586b2b401.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:11 +11:00
Christophe Leroy
fbcee2ebe8 powerpc/32: Always save non volatile GPRs at syscall entry
In preparation for porting syscall entry/exit to C, inconditionally
save non volatile general purpose registers.

Commit 965dd3ad30 ("powerpc/64/syscall: Remove non-volatile GPR save
optimisation") provides detailed explanation.

This increases the number of cycles by 24 cycles on 8xx with
null_syscall benchmark (280 => 304 cycles)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/21c08162b83655195fe9ead78ff2cfd28508d023.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:11 +11:00
Christophe Leroy
c01b916658 powerpc/syscall: Change condition to check MSR_RI
In system_call_exception(), MSR_RI also needs to be checked on 8xx.
Only booke and 40x doesn't have MSR_RI.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/67820fada8dd6a8fe9d7b666f175d4cc9d8de87e.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:11 +11:00
Christophe Leroy
8875f47b76 powerpc/syscall: Save r3 in regs->orig_r3
Save r3 in regs->orig_r3 in system_call_exception()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a90805ab6b9101b46daf56470f457a57acd86fc.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:10 +11:00
Christophe Leroy
72b7a9e56b powerpc/syscall: Use is_compat_task()
Instead of hard comparing task flags with _TIF_32BIT, use
is_compat_task(). The advantage is that it returns 0 on PPC32
allthough _TIF_32BIT is always set.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8094662199337a7200fea9f6e1d1f8b1b6d5f69.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:10 +11:00
Christophe Leroy
344bb20b15 powerpc/syscall: Make interrupt.c buildable on PPC32
To allow building interrupt.c on PPC32, ifdef out specific PPC64
code or use helpers which are available on both PP32 and PPC64

Modify Makefile to always build interrupt.o

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba073ad67bd971a88ce331b65d6655523b54c794.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:10 +11:00
Christophe Leroy
ab1a517d55 powerpc/syscall: Rename syscall_64.c into interrupt.c
syscall_64.c will be reused almost as is for PPC32.

As this file also contains functions to handle other types
of interrupts rename it interrupt.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cddc2deaa8f049d3ec419738e69804934919b935.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:10 +11:00
Christophe Leroy
6650c4782d powerpc/irq: Add stub irq_soft_mask_return() for PPC32
To allow building syscall_64.c smoothly on PPC32, add stub version
of irq_soft_mask_return().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9b9f62c5e2e63cc121fd749a923aaaee92ee0da4.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:09 +11:00
Christophe Leroy
08353779f2 powerpc/irq: Rework helpers that manipulate MSR[EE/RI]
In preparation of porting PPC32 to C syscall entry/exit,
rewrite the following helpers as static inline functions and
add support for PPC32 in them:
	__hard_irq_enable()
	__hard_irq_disable()
	__hard_EE_RI_disable()
	__hard_RI_enable()

Then use them in PPC32 version of arch_local_irq_disable()
and arch_local_irq_enable() to avoid code duplication.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0e290372a0e7dc2ae657b4a01aec85f8de7fdf77.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:09 +11:00
Christophe Leroy
fb5608fd11 powerpc/irq: Add helper to set regs->softe
regs->softe doesn't exist on PPC32.

Add irq_soft_mask_regs_set_state() helper to set regs->softe.
This helper will void on PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5f37d1177a751fdbca79df461d283850ca3a34a2.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:09 +11:00
Christophe Leroy
2c59e51048 powerpc/32: Reorder instructions to avoid using CTR in syscall entry
Now that we are using rfi instead of mtmsr to reactivate MMU, it is
possible to reorder instructions and avoid the need to use CTR for
stashing SRR0.

null_syscall on 8xx is reduced by 3 cycles (283 => 280 cycles).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8fa13a59f73647e058c95fc7e1c7a98f316bd20a.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:08 +11:00
Christophe Leroy
76249ddc27 powerpc/32: On syscall entry, enable instruction translation at the same time as data
On 40x and 8xx, kernel text is pinned.
On book3s/32, kernel text is mapped by BATs.

Enable instruction translation at the same time as data translation, it
makes things simpler.

MSR_RI can also be set at the same time because srr0/srr1 are already
saved and r1 is set properly.

On booke, translation is always on, so at the end all PPC32
have translation on early.

This reduces null_syscall benchmark by 13 cycles on 8xx
(296 ==> 283 cycles).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3fe8891c814103a3549efc1d4e7ffc828bba5993.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:08 +11:00
Christophe Leroy
eca2411040 powerpc/32: Always enable data translation on syscall entry
If the code can use a stack in vm area, it can also use a
stack in linear space.

Simplify code by removing old non VMAP stack code on PPC32 in syscall.

That means the data translation is now re-enabled early in
syscall entry in all cases, not only when using VMAP stacks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/412c6c1786922d991bbb89c2ad2e82cffe8ab112.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:08 +11:00
Christophe Leroy
57fdfbce89 powerpc/32s: Add missing call to kuep_lock on syscall entry
Userspace Execution protection and fast syscall entry were implemented
independently from each other and were both merged in kernel 5.2,
leading to syscall entry missing userspace execution protection.

On syscall entry, execution of user space memory must be
locked in the same way as on exception entry.

Fixes: b86fb88855 ("powerpc/32: implement fast entry for syscalls on non BOOKE")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c65e105b63aaf74f91a14f845bc77192350b84a6.1612796617.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:08 +11:00
Will Springer
57f48b4b74 powerpc/compat_sys: swap hi/lo parts of 64-bit syscall args in LE mode
Swap upper/lower 32 bits for 64-bit compat syscalls, conditioned on
endianness. This is modeled after the same functionality in
arch/mips/kernel/linux32.c.

This fixes compat_sys on ppc64le, when called by 32-bit little-endian
processes.

Tested with `file /bin/bash` (pread64) and `truncate -s 5G test`
(ftruncate64).

Signed-off-by: Will Springer <skirmisher@protonmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2765111.e9J7NaK4W3@sheen
2021-02-11 23:35:07 +11:00
Joseph J Allen
caccf2ac5c powerpc: use kernel endianness in MSR in 32-bit signal handler
This mirrors the behavior in handle_rt_signal32, to obey kernel endianness
rather than assume a 32-bit process is big-endian. Without this change,
any 32-bit little-endian process will SIGILL immediately upon handling a
signal.

Signed-off-by: Joseph J Allen <eerykitty@gmail.com>
Signed-off-by: Will Springer <skirmisher@protonmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2058876.irdbgypaU6@sheen
2021-02-11 23:35:07 +11:00
Hari Bathini
2377c92e37 powerpc/kexec_file: fix FDT size estimation for kdump kernel
On systems with large amount of memory, loading kdump kernel through
kexec_file_load syscall may fail with the below error:

    "Failed to update fdt with linux,drconf-usable-memory property"

This happens because the size estimation for kdump kernel's FDT does
not account for the additional space needed to setup usable memory
properties. Fix it by accounting for the space needed to include
linux,usable-memory & linux,drconf-usable-memory properties while
estimating kdump kernel's FDT size.

Fixes: 6ecd0163d3 ("powerpc/kexec_file: Add appropriate regions for memory reserve map")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/161243826811.119001.14083048209224609814.stgit@hbathini
2021-02-11 23:35:07 +11:00
Aneesh Kumar K.V
2ac02e5ece powerpc/mm: Remove dcache flush from memory remove.
We added dcache flush on memory add/remove in commit
fb5924fddf ("powerpc/mm: Flush cache on memory hot(un)plug") to
handle crashes on GPU hotplug. Instead of adding dcache flush in
generic memory add/remove routine which is used even for regular
memory, we should handle these devices specific flush in the device
driver code.

memtrace did handle this in the driver and that was removed by commit
7fd6641de2 ("powerpc/powernv/memtrace: Let the arch hotunplug code
flush cache"). This patch reverts that commit.

The dcache flush in memory add was removed by commit
ea458effa8 ("powerpc: Don't flush caches when adding memory") which
I don't think is correct. The reason why we require dcache flush in
memtrace is to make sure we don't have a dirty cache when we remap a
pfn to cache inhibited. We should do that when the memtrace module
removes the memory and make the pfn available for HTM traces to map it
as cache inhibited.

The other device mentioned in commit fb5924fddf ("powerpc/mm: Flush
cache on memory hot(un)plug") is nvlink device with coherent memory.
The support for that was removed in commit
7eb3cf7619 ("powerpc/powernv: remove unused NPU DMA code") and
commit 25b2995a35 ("mm: remove MEMORY_DEVICE_PUBLIC support")

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210203045812.234439-3-aneesh.kumar@linux.ibm.com
2021-02-11 23:35:07 +11:00
Aneesh Kumar K.V
ec94b9b23d powerpc/mm: Add PG_dcache_clean to indicate dcache clean state
This just add a better name for PG_arch_1. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com
2021-02-11 23:35:06 +11:00
Aneesh Kumar K.V
c7ba2d6363 powerpc/mm: Enable compound page check for both THP and HugeTLB
THP config results in compound pages. Make sure the kernel enables
the PageCompound() check with CONFIG_HUGETLB_PAGE disabled and
CONFIG_TRANSPARENT_HUGEPAGE enabled.

This makes sure we correctly flush the icache with THP pages.
flush_dcache_icache_page only matter for platforms that don't support
COHERENT_ICACHE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210203045812.234439-1-aneesh.kumar@linux.ibm.com
2021-02-11 23:35:06 +11:00
Jiapeng Chong
c9df3f809c powerpc/xive: Assign boolean values to a bool variable
Fix the following coccicheck warnings:

./arch/powerpc/kvm/book3s_xive.c:1856:2-17: WARNING: Assignment of 0/1
to bool variable.

./arch/powerpc/kvm/book3s_xive.c:1854:2-17: WARNING: Assignment of 0/1
to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612680192-43116-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-02-11 23:35:06 +11:00
Christophe Leroy
3642eb2125 powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1
Maximum PAGE_SHIFT is 18 for 256k pages so
THREAD_ALIGN_SHIFT is 19 at the maximum.

No need to clobber cr1, it can be preserved when moving r1
into CR when we check stack overflow.

This reduces the number of instructions in Machine Check Exception
prolog and fixes a build failure reported by the kernel test robot
on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That
build failure is due to too many instructions in the prolog hence
not fitting between 0x200 and 0x300. Allthough the problem doesn't
show up in mainline, it is still worth the change.

Fixes: 98bf2d3f49 ("powerpc/32s: Fix RTAS machine check with VMAP stack")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.leroy@csgroup.eu
2021-02-11 23:35:06 +11:00
Nicholas Piggin
ac7c5e9b08 powerpc/64s: Remove EXSLB interrupt save area
SLB faults should not be taken while the PACA save areas are live, all
memory accesses should be fetches from the kernel text, and access to
PACA and the current stack, before C code is called or any other
accesses are made.

All of these have pinned SLBs so will not take a SLB fault. Therefore
EXSLB is not be required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208063406.331655-1-npiggin@gmail.com
2021-02-11 23:35:05 +11:00
Nicholas Piggin
14ad0e7d04 powerpc/64s: syscall real mode entry use mtmsrd rather than rfid
Have the real mode system call entry handler branch to the kernel
0xc000... address and then use mtmsrd to enable the MMU, rather than use
SRRs and rfid.

Commit 8729c26e67 ("powerpc/64s/exception: Move real to virt switch
into the common handler") implemented this style of real mode entry for
other interrupt handlers, so this brings system calls into line with
them, which is the main motivcation for the change.

This tends to be slightly faster due to avoiding the mtsprs, and it also
does not clobber the SRR registers, which becomes important in a
subsequent change. The real mode entry points don't tend to be too
important for performance these days, but it is possible for a
hypervisor to run guests in AIL=0 mode for certian reasons.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208063326.331502-1-npiggin@gmail.com
2021-02-11 23:35:05 +11:00
Alexey Kardashevskiy
60a707d0c9 powerpc/kuap: Restore AMR after replaying soft interrupts
Since de78a9c42a ("powerpc: Add a framework for Kernel Userspace
Access Protection"), user access helpers call user_{read|write}_access_{begin|end}
when user space access is allowed.

Commit 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU") made
the mentioned helpers program a AMR special register to allow such
access for a short period of time, most of the time AMR is expected to
block user memory access by the kernel.

Since the code accesses the user space memory, unsafe_get_user() calls
might_fault() which calls arch_local_irq_restore() if either
CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
arch_local_irq_restore() then attempts to replay pending soft
interrupts as KUAP regions have hardware interrupts enabled.

If a pending interrupt happens to do user access (performance
interrupts do that), it enables access for a short period of time so
after returning from the replay, the user access state remains blocked
and if a user page fault happens - "Bug: Read fault blocked by AMR!"
appears and SIGSEGV is sent.

An example trace:
  Bug: Read fault blocked by AMR!
  WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145
  CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
  NIP:  c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000
  REGS: c00000000dc63560 TRAP: 0700   Not tainted  (5.10.0-rc6_v5.10-rc6_a+fstn1)
  MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28002888  XER: 20040000
  CFAR: c0000000001fa928 IRQMASK: 1
  GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f
  GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027
  GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001
  GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff
  GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0
  GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000
  GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730
  NIP __do_page_fault+0xb38/0xde0
  LR  __do_page_fault+0xb34/0xde0
  Call Trace:
    __do_page_fault+0xb34/0xde0 (unreliable)
    handle_page_fault+0x10/0x2c
  --- interrupt: 300 at strncpy_from_user+0x290/0x440
      LR = strncpy_from_user+0x284/0x440
    strncpy_from_user+0x2f0/0x440 (unreliable)
    getname_flags+0x88/0x2c0
    do_sys_openat2+0x2d4/0x5f0
    do_sys_open+0xcc/0x140
    system_call_exception+0x160/0x240
    system_call_common+0xf0/0x27c

To fix it save/restore the AMR when replaying interrupts, and also
add a check if AMR was not blocked prior to replaying interrupts.

Originally found by syzkaller.

Fixes: 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use normal commit citation format and add full oops log to
      change log, move kuap_check_amr() into the restore routine to
      avoid warnings about unreconciled IRQ state]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210202091541.36499-1-aik@ozlabs.ru
2021-02-11 23:35:05 +11:00
Alexey Kardashevskiy
7d506ca97b powerpc/uaccess: Avoid might_fault() when user access is enabled
The amount of code executed with enabled user space access (unlocked
KUAP) should be minimal. However with CONFIG_PROVE_LOCKING or
CONFIG_DEBUG_ATOMIC_SLEEP enabled, might_fault() calls into various
parts of the kernel, and may even end up replaying interrupts which in
turn may access user space and forget to restore the KUAP state.

The problem places are:
  1. strncpy_from_user (and similar) which unlock KUAP and call
     unsafe_get_user -> __get_user_allowed -> __get_user_nocheck()
     with do_allow=false to skip KUAP as the caller took care of it.
  2. __unsafe_put_user_goto() which is called with unlocked KUAP.

eg:
  WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 arch_local_irq_restore+0x160/0x190
  NIP arch_local_irq_restore+0x160/0x190
  LR  lock_is_held_type+0x140/0x200
  Call Trace:
    0xc00000007f392ff8 (unreliable)
    ___might_sleep+0x180/0x320
    __might_fault+0x50/0xe0
    filldir64+0x2d0/0x5d0
    call_filldir+0xc8/0x180
    ext4_readdir+0x948/0xb40
    iterate_dir+0x1ec/0x240
    sys_getdents64+0x80/0x290
    system_call_exception+0x160/0x280
    system_call_common+0xf0/0x27c

Change __get_user_nocheck() to look at `do_allow` to decide whether to
skip might_fault(). Since strncpy_from_user/etc call might_fault()
anyway before unlocking KUAP, there should be no visible change.

Drop might_fault() in __unsafe_put_user_goto() as it is only called
from unsafe_put_user(), which already has KUAP unlocked.

Since keeping might_fault() is still desirable for debugging, add
calls to it in user_[read|write]_access_begin(). That also allows us
to drop the is_kernel_addr() test, because there should be no code
using user_[read|write]_access_begin() in order to access a kernel
address.

Fixes: de78a9c42a ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Combine with related patch from myself, merge change logs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210204121612.32721-1-aik@ozlabs.ru
2021-02-11 23:35:04 +11:00
Michael Ellerman
de4ffc653f powerpc/uaccess: Simplify unsafe_put_user() implementation
Currently unsafe_put_user() expands to __put_user_goto(), which
expands to __put_user_nocheck_goto().

There are no other uses of __put_user_nocheck_goto(), and although
there are some other uses of __put_user_goto() those could just use
unsafe_put_user().

Every layer of indirection introduces the possibility that some code
is calling that layer, and makes keeping track of the required
semantics at each point more complicated.

So drop __put_user_goto(), and rename __put_user_nocheck_goto() to
__unsafe_put_user_goto(). The "nocheck" is implied by "unsafe".

Replace the few uses of __put_user_goto() with unsafe_put_user().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208135717.2618798-1-mpe@ellerman.id.au
2021-02-11 23:33:16 +11:00
Michael Ellerman
f30520c64f powerpc/amigaone: Make amigaone_discover_phbs() static
It's only used in setup.c, so make it static.

Fixes: 053d58c870 ("powerpc/amigaone: Move PHB discovery")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210210130804.3190952-3-mpe@ellerman.id.au
2021-02-11 23:28:51 +11:00
Michael Ellerman
2bb421a3d9 powerpc/mm/64s: Fix no previous prototype warning
As reported by lkp:

  arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous
  prototype for function 'exit_lazy_flush_tlb'

Fix it by moving the prototype into the existing header.

Fixes: 032b7f0893 ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au
2021-02-11 23:28:51 +11:00
Michael Ellerman
5c47c44f15 powerpc/83xx: Fix build error when CONFIG_PCI=n
As reported by lkp:

  arch/powerpc/platforms/83xx/km83xx.c:183:19: error: 'mpc83xx_setup_pci' undeclared here (not in a function)
     183 |  .discover_phbs = mpc83xx_setup_pci,
	 |                   ^~~~~~~~~~~~~~~~~
	 |                   mpc83xx_setup_arch

There is a stub defined for the CONFIG_PCI=n case, but now that
mpc83xx_setup_pci() is being assigned to discover_phbs the correct
empty value is NULL.

Fixes: 83f84041ff ("powerpc/83xx: Move PHB discovery")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210210130804.3190952-1-mpe@ellerman.id.au
2021-02-11 23:28:51 +11:00
Nicholas Piggin
e4bb64c7a4 powerpc: remove interrupt handler functions from the noinstr section
The allyesconfig ppc64 kernel fails to link with relocations unable to
fit after commit 3a96570ffc ("powerpc: convert interrupt handlers to
use wrappers"), which is due to the interrupt handler functions being
put into the .noinstr.text section, which the linker script places on
the opposite side of the main .text section from the interrupt entry
asm code which calls the handlers.

This results in a lot of linker stubs that overwhelm the 252-byte sized
space we allow for them, or in the case of BE a .opd relocation link
error for some reason.

It's not required to put interrupt handlers in the .noinstr section,
previously they used NOKPROBE_SYMBOL, so take them out and replace
with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit
NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes
a number of interrupt handlers nokprobe that were not prior to the
interrupt wrappers commit, but since that commit they were made
nokprobe due to being in .noinstr.text, so this fix does not change
that.

The fixes tag is different to the commit that first exposes the problem
because it is where the wrapper macros were introduced.

Fixes: 8d41fc618a ("powerpc: interrupt handler wrapper functions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Slightly fix up comment wording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com
2021-02-11 23:28:34 +11:00
Michael Ellerman
dea6f4c696 powerpc/powernv/pci: Use kzalloc() for phb related allocations
As part of commit fbbefb3202 ("powerpc/pci: Move PHB discovery for
PCI_DN using platforms"), I switched some allocations from
memblock_alloc() to kmalloc(), otherwise memblock would warn that it
was being called after slab init.

However I missed that the code relied on the allocations being zeroed,
without which we could end up crashing:

  pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff
  BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6af7
  Faulting instruction address: 0xc0000000000dbc90
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  ...
  NIP  pnv_ioda_get_pe_state+0xe0/0x1d0
  LR   pnv_ioda_get_pe_state+0xb4/0x1d0
  Call Trace:
    pnv_ioda_get_pe_state+0xb4/0x1d0 (unreliable)
    pnv_pci_config_check_eeh.isra.9+0x78/0x270
    pnv_pci_read_config+0xf8/0x160
    pci_bus_read_config_dword+0xa4/0x120
    pci_bus_generic_read_dev_vendor_id+0x54/0x270
    pci_scan_single_device+0xb8/0x140
    pci_scan_slot+0x80/0x1b0
    pci_scan_child_bus_extend+0x94/0x490
    pcibios_scan_phb+0x1f8/0x3c0
    pcibios_init+0x8c/0x12c
    do_one_initcall+0x94/0x510
    kernel_init_freeable+0x35c/0x3fc
    kernel_init+0x2c/0x168
    ret_from_kernel_thread+0x5c/0x70

Switch them to kzalloc().

Fixes: fbbefb3202 ("powerpc/pci: Move PHB discovery for PCI_DN using platforms")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210211112749.3410771-1-mpe@ellerman.id.au
2021-02-11 23:28:34 +11:00
Nicholas Piggin
72476aaa46 KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
Commit 68ad28a4cd ("KVM: PPC: Book3S HV: Fix radix guest SLB side
channel") incorrectly removed the radix host instruction patch to skip
re-loading the host SLB entries when exiting from a hash
guest. Restore it.

Fixes: 68ad28a4cd ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-11 17:28:15 +11:00