25969 Commits

Author SHA1 Message Date
Nicholas Piggin
13959373e9 powerpc/qspinlock: Fix 32-bit build
Some 32-bit configurations don't pull in the spin_begin/end/relax
definitions. Fix is to restore a lost include.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/oe-kbuild-all/202212050224.i7uh9fOh-lkp@intel.com
Link: https://lore.kernel.org/r/20221208123225.1566113-1-npiggin@gmail.com
2022-12-12 12:34:52 +11:00
Jakub Kicinski
837e8ac871 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:19:59 -08:00
Michael Ellerman
f24f21c412 Merge branch 'topic/objtool' into next
Merge the powerpc objtool support, which we were keeping in a topic
branch in case of any merge conflicts.
2022-12-08 23:57:47 +11:00
Jiri Slaby (SUSE)
74d58cd48a USB: sisusbvga: remove console support
It was marked as BROKEN since commit 862ee699fefe (USB: sisusbvga: Make
console support depend on BROKEN) 2 years ago. Since noone stepped up to
fix it, remove it completely.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221208090749.28056-1-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08 10:44:24 +01:00
Michael Ellerman
64fdcbcc06 powerpc/prom: Fix 32-bit build
Add an IS_ENABLED() check to fix the build error:

arch/powerpc/kernel/prom.o: in function `early_init_dt_scan_cpus':
  prom.c:(.init.text+0x2ea): undefined reference to `boot_cpu_node_count'

Fixes: e13d23a404f2 ("powerpc: export the CPU node count")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2022-12-08 09:43:15 +11:00
Nathan Lynch
98c738c8ce powerpc/rtas: mandate RTAS syscall filtering
CONFIG_PPC_RTAS_FILTER has been optional but default-enabled since its
introduction. It's been enabled in enterprise distro kernels for a
while without causing ABI breakage that wasn't easily fixed, and it
prevents harmful abuses of the rtas syscall.

Let's make it unconditional.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-10-nathanl@linux.ibm.com
2022-12-07 22:40:43 +11:00
Nathan Lynch
f975b6559b powerpc/rtas: define pr_fmt and convert printk call sites
Set pr_fmt to "rtas: " and convert the handful of printk() uses in
rtas.c, adjusting the messages to remove now-redundant "RTAS"
strings.

Note that rtas_restart(), rtas_power_off(), and rtas_halt() all
currently use printk() without specifying a log level. These have been
changed to use pr_emerg(), which matches the behavior of
rtas_os_term().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-9-nathanl@linux.ibm.com
2022-12-07 22:40:43 +11:00
Nathan Lynch
9581f8a007 powerpc/rtas: clean up includes
rtas.c used to host complex code related to pseries-specific guest
migration and suspend, which used atomics, completions, hcalls, and
CPU hotplug APIs. That's all been deleted or moved, so remove the
include directives that have been rendered unnecessary. Sort the
remainder (with linux/ before asm/) to impose some order on where
future additions go.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-8-nathanl@linux.ibm.com
2022-12-07 22:40:42 +11:00
Nathan Lynch
c67a0e411d powerpc/rtas: clean up rtas_error_log_max initialization
The code in rtas_get_error_log_max() doesn't cause problems in
practice, but there are no measures to ensure that the lazy
initialization of the static rtas_error_log_max variable is atomic,
and it's not worth adding them.

Initialize the static rtas_error_log_max variable at boot when we're
single-threaded instead of lazily on first use. Use the more
appropriate of_property_read_u32() API instead of rtas_token() to
consult the "rtas-error-log-max" property, which is not the name of an
RTAS function. Convert use of printk() to pr_warn() and distinguish
the possible error cases.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-7-nathanl@linux.ibm.com
2022-12-07 22:40:42 +11:00
Nathan Lynch
9aafbfa5f5 powerpc/pseries/eeh: use correct API for error log size
rtas-error-log-max is not the name of an RTAS function, so rtas_token()
is not the appropriate API for retrieving its value. We already have
rtas_get_error_log_max() which returns a sensible value if the property
is absent for any reason, so use that instead.

Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log retrieval")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop no-longer possible error handling as noticed by ajd]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-6-nathanl@linux.ibm.com
2022-12-07 22:39:50 +11:00
Nathan Lynch
6c606e57ee powerpc/rtas: avoid scheduling in rtas_os_term()
It's unsafe to use rtas_busy_delay() to handle a busy status from
the ibm,os-term RTAS function in rtas_os_term():

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Tainted: G      D            6.0.0-rc5-02182-gf8553a572277-dirty #9
Call Trace:
[c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable)
[c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0
[c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0
[c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4
[c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68
[c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50
[c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0
[c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0
[c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0
[c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420
[c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200

Use rtas_busy_delay_time() instead, which signals without side effects
whether to attempt the ibm,os-term RTAS call again.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com
2022-12-07 22:23:04 +11:00
Nathan Lynch
ed2213bfb1 powerpc/rtas: avoid device tree lookups in rtas_os_term()
rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.

Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com
2022-12-07 22:22:22 +11:00
Nathan Lynch
b10af504a2 powerpc/rtasd: use correct OF API for event scan rate
rtas_token() should be used only for properties that are RTAS function
tokens. "rtas-event-scan-rate" does not contain a function token, but it
has the same size/format as token properties so reading it with
rtas_token() happens to work.

Convert to of_property_read_u32().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-3-nathanl@linux.ibm.com
2022-12-07 22:20:33 +11:00
Nathan Lynch
336e2554ec powerpc/rtas: document rtas_call()
rtas_call() has a complex calling convention, non-standard return
values, and many users. Add kernel-doc for it and remove the less
structured commentary from rtas.h.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com
2022-12-07 22:20:33 +11:00
Laurent Dufour
f6aa37c51e powerpc/pseries: unregister VPA when hot unplugging a CPU
The VPA should unregister when offlining a CPU. Otherwise there could be
a short window where 2 CPUs could share the same VPA.

This happens because the hypervisor is still keeping the VPA attached to
the vCPU even if it became offline.

Here is a potential situation:
 1. remove proc A,
 2. add proc B. If proc B gets proc A's place in cpu_present_mask, then
    it registers proc A's VPAs.
 3. If proc B is then re-added to the LP, its threads are sharing VPAs
    with proc A briefly as they come online.

As the hypervisor may check for the VPA's yield_count field oddity, it
may detect an unexpected value and kill the LPAR.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: s/cpu_present_map/cpu_present_mask/ in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114160150.13554-1-ldufour@linux.ibm.com
2022-12-07 20:30:23 +11:00
Laurent Dufour
9b574cfab7 powerpc/pseries: reset the RCU watchdogs after a LPM
The RCU watchdog timer should be reset when restarting the CPU after a
Live Partition Mobility operation.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Combine comments into a single comment block]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125173204.15329-1-ldufour@linux.ibm.com
2022-12-07 20:30:09 +11:00
Laurent Dufour
340a4a9f87 powerpc: Take in account addition CPU node when building kexec FDT
On a system with a large number of CPUs, the creation of the FDT for a
kexec kernel may fail because the allocated FDT is not large enough.

When this happens, such a message is displayed on the console:

  Unable to add ibm,processor-vadd-size property: FDT_ERR_NOSPACE

The property's name may change depending when the buffer overwrite is
detected.

Obviously the created FDT is missing information, and it is expected
that system dump or kexec kernel failed to run properly.

When the FDT is allocated, the size of the FDT the kernel received at
boot time is used and an extra size can be applied. Currently, only
memory added after boot time is taken in account, not the CPU nodes.

The extra size should take in account these additional CPU nodes and
compute the required extra space. To achieve that, the size of a CPU
node, including its subnode is computed once and multiplied by the
number of additional CPU nodes.

The assumption is that the size of the CPU node is _same_ for all the
node, the only variable part should be the name "PowerPC,POWERxx@##"
where "##" may vary a little.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[mpe: Don't shadow function name w/variable, minor coding style changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221110180619.15796-3-ldufour@linux.ibm.com
2022-12-07 20:19:04 +11:00
Laurent Dufour
e13d23a404 powerpc: export the CPU node count
At boot time, the FDT is parsed to compute the number of CPUs.
In addition count the number of CPU nodes and export it.

This is useful when building the FDT for a kexeced kernel since we need to
take in account the CPU node added since the boot time during CPU hotplug
operations.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221110180619.15796-2-ldufour@linux.ibm.com
2022-12-07 20:14:49 +11:00
Geert Uytterhoeven
3ae7c96dd5 powerpc/dts/fsl: Fix pca954x i2c-mux node names
"make dtbs_check":

    arch/powerpc/boot/dts/fsl/t1040rdb-rev-a.dtb: pca9546@77: $nodename:0: 'pca9546@77' does not match '^(i2c-?)?mux'
           From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
    arch/powerpc/boot/dts/fsl/t1024qds.dtb: pca9547@77: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@2', 'i2c@3' were unexpected)
           From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
    ...

Fix this by renaming pca954x nodes to "i2c-mux", to match the I2C bus
multiplexer/switch DT bindings and the Generic Names Recommendation in
the Devicetree Specification.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6c5d86c49ac170e9d56ab121ea0602f3873849ca.1669999298.git.geert+renesas@glider.be
2022-12-06 23:15:53 +11:00
Christophe Leroy
6f3a81b600 powerpc/code-patching: Remove protection against patching init addresses after init
Once init section is freed, attempting to patch init code
ends up in the weed.

Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
protected patch_instruction() against that, but it is the responsibility
of the caller to ensure that the patched memory is valid.

All callers have now been verified and fixed so the check
can be removed.

This improves ftrace activation by about 2% on 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:57 +11:00
Christophe Leroy
b988e7797d powerpc/feature-fixups: Do not patch init section after init
Once init section is freed, attempting to patch init code
ends up in the weed.

Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
protected patch_instruction() against that, but it is the responsibility
of the caller to ensure that the patched memory is valid.

In the same spirit as jump_label with its jump_label_can_update()
function, add is_fixup_addr_valid() function to skip patching on
freed init section.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8e9311fc1b057e4e6a2a3a0701ebcc74b787affe.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:57 +11:00
Christophe Leroy
3d1dbbca33 powerpc/feature-fixups: Refactor other fixups patching
Several fonctions have the same loop for patching instructions.

Introduce function do_patch_fixups() to refactor those loops.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/58ab36949c18f94d466fc98d6c085783b0cd474f.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:56 +11:00
Christophe Leroy
6076dc349b powerpc/feature-fixups: Refactor entry fixups patching
Several fonctions have the same loop for patching instructions.

Introduce function do_patch_entry_fixups() to refactor those loops.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/79eeff7b20a98f7136da5f79b1f7c436928f27f3.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:56 +11:00
Christophe Leroy
84ecfe6f38 powerpc/code-patching: Remove #ifdef CONFIG_STRICT_KERNEL_RWX
No need to have one implementation of patch_instruction() for
CONFIG_STRICT_KERNEL_RWX and one for !CONFIG_STRICT_KERNEL_RWX.

In patch_instruction(), call raw_patch_instruction() when
!CONFIG_STRICT_KERNEL_RWX.

In poking_init(), bail out immediately, it will be equivalent
to the weak default implementation.

Everything else is declared static and will be discarded by
GCC when !CONFIG_STRICT_KERNEL_RWX.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f67d2a109404d03e8fdf1ea15388c8778337a76b.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:56 +11:00
Michael Jeanson
ad050d2390 powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1
In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on
PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table
changing from their dot prefixed variant to the non-prefixed ones.

Since ftrace prefixes a dot to the syscall names when matching them to
build its syscall event list, this resulted in no syscall events being
available.

Remove the PPC64_ELF_ABI_V1 specific version of
arch_syscall_match_sym_name to have the same behavior across all powerpc
variants.

Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
2022-12-02 20:57:09 +11:00
Rohan McLure
7cd882df94 powerpc/64: Sanitise user registers on interrupt in pseries, POWERNV
Cause pseries and POWERNV platforms to default to zeroising all potentially
user-defined registers when entering the kernel by means of any interrupt
source, reducing user-influence of the kernel and the likelihood or
producing speculation gadgets.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-7-rmclure@linux.ibm.com
2022-12-02 20:46:09 +11:00
Rohan McLure
efe1691ac8 powerpc/64e: Clear gprs on interrupt routine entry on Book3E
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-6-rmclure@linux.ibm.com
2022-12-02 20:46:08 +11:00
Rohan McLure
1df45d78b8 powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S
Zeroise user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values. If an interrupt does not
select the IMSR_R12 IOption, zeroise r12.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored on 64bit Book3S. Remove explicit
REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
user registers on interrupt, and continue to depend on the return value
of interrupt_exit_user_prepare to determine whether or not to restore
non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is able to be enabled/disabled through
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-5-rmclure@linux.ibm.com
2022-12-02 20:46:05 +11:00
Rohan McLure
2487fd2e6d powerpc/64s: IOption for MSR stored in r12
Interrupt handlers in asm/exceptions-64s.S contain a great deal of common
code produced by the GEN_COMMON macros. Currently, at the exit point of
the macro, r12 will contain the contents of the MSR. A future patch will
cause these macros to zeroise architected registers to avoid potential
speculation influence of user data.

Provide an IOption that signals that r12 must be retained, as the
interrupt handler assumes it to hold the contents of the MSR.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-4-rmclure@linux.ibm.com
2022-12-02 20:46:01 +11:00
Rohan McLure
75c5d6b1e1 powerpc/64: Sanitise common exit code for interrupts
Interrupt code is shared between Book3E/S 64-bit systems for interrupt
handlers. Ensure that exit code correctly restores non-volatile gprs on
each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled.

Also introduce macros for clearing/restoring registers on interrupt
entry for when this configuration option is either disabled or enabled.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-3-rmclure@linux.ibm.com
2022-12-02 20:46:01 +11:00
Rohan McLure
cbf892ba56 powerpc/64: Add interrupt register sanitisation macros
Include in asm/ppc_asm.h macros to be used in multiple successive
patches to implement zeroising architected registers in interrupt
handlers. Registers will be sanitised in this fashion in future patches
to reduce the speculation influence of user-controlled register values.
These mitigations will be configurable through the
CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option.

Included are macros for conditionally zeroising registers and restoring
as required with the mitigation enabled. With the mitigation disabled,
non-volatiles must be restored on demand at separate locations to
those required by the mitigation.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-2-rmclure@linux.ibm.com
2022-12-02 20:45:57 +11:00
Rohan McLure
0e23347f1e powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig
Add Kconfig option for enabling clearing of registers on arrival in an
interrupt handler. This reduces the speculation influence of registers
on kernel internals. The option will be consumed by 64-bit systems that
feature speculation and wish to implement this mitigation.

This patch only introduces the Kconfig option, no actual mitigations.

The primary overhead of this mitigation lies in an increased number of
registers that must be saved and restored by interrupt handlers on
Book3S systems. Enable by default on Book3E systems, which prior to
this patch eagerly save and restore register state, meaning that the
mitigation when implemented will have minimal overhead.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-1-rmclure@linux.ibm.com
2022-12-02 20:45:57 +11:00
Kajol Jain
03f7c1d2a4 powerpc/hv-gpci: Fix hv_gpci event list
Based on getPerfCountInfo v1.018 documentation, some of the
hv_gpci events were deprecated for platform firmware that
supports counter_info_version 0x8 or above.

Fix the hv_gpci event list by adding a new attribute group
called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6"
macro to enable these events for platform firmware
that supports counter_info_version 0x6 or below. And assigning
the hv_gpci event list based on output counter info version
of underlying plaform.

Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
2022-12-02 20:39:26 +11:00
Yang Yingliang
4d0eea4152 powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe()
If platform_device_add() is not called or failed, it can not call
platform_device_del() to clean up memory, it should call
platform_device_put() in error case.

Fixes: 26f6cb999366 ("[POWERPC] fsl_soc: add support for fsl_spi")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221029111626.429971-1-yangyingliang@huawei.com
2022-12-02 20:09:48 +11:00
Michael Ellerman
22db71bcba Merge branch 'topic/qspinlock' into next
Merge Nick's powerpc qspinlock implementation. From his cover letter:

This replaces the generic queued spinlock code (like s390 does) with our
own implementation.

Generic PV qspinlock code is causing latency / starvation regressions on
large systems that are resulting in hard lockups reported (mostly in
pathoogical cases). The generic qspinlock code has a number of issues
important for powerpc hardware and hypervisors that aren't easily solved
without changing code that would impact other architectures. Follow
s390's lead and implement our own for now.

Issues for powerpc using generic qspinlocks:
  - The previous lock value should not be loaded with simple loads, and
    need not be passed around from previous loads or cmpxchg results,
    because powerpc uses ll/sc-style atomics which can perform more
    complex operations that do not require this. powerpc implementations
    tend to prefer loads use larx for improved coherency performance.
  - The queueing process should absolutely minimise the number of stores
    to the lock word to reduce exclusive coherency probes, important for
    large system scalability. The pending logic is counter productive
    here.
  - Non-atomic unlock for paravirt locks is important (atomic
    instructions tend to still be more expensive than x86 CPUs).
  - Yielding to the lock owner is important in the oversubscribed
    paravirt case, which requires storing the owner CPU in the lock
    word.
  - More control of lock stealing for the paravirt case is important to
    keep latency down on large systems.
  - The lock acquisition operation should always be made with a special
    variant of atomic instructions with the lock hint bit set,
    including (especially) in the queueing paths. This is more a matter
    of adding more arch lock helpers so not an insurmountable problem
    for generic code.
2022-12-02 18:04:56 +11:00
Nicholas Piggin
6b34a099fa powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults
This option increases the number of hash misses by limiting the number
of kernel HPT entries, by keeping a per-CPU record of the last kernel
HPTEs installed, and removing that from the hash table on the next hash
insertion. A timer round-robins CPUs removing remaining kernel HPTEs and
clearing the TLB (in the case of bare metal) to increase and slightly
randomise kernel fault activity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comment about NR_CPUS usage, fixup whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
2022-12-02 18:04:25 +11:00
Nicholas Piggin
dfecd06bc5 powerpc: remove STACK_FRAME_OVERHEAD
This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no
longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences
with STACK_FRAME_MIN_SIZE.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-18-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
cd52414d5a powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes
Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size,
plus pt_regs, plus 16 bytes for the aligned regs marker for the int
frame (and the switch frame needs to match that because it uses the same
regs offset as the int frame).

This saves 80 bytes of kernel stack per interrupt. It's the principle of
getting our accounting right that's more important than the practical
saving.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-17-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
90f1b43196 powerpc: allow minimum sized kernel stack frames
This affects only 64-bit ELFv2 kernels, and reduces the minimum
asm-created stack frame size from 112 to 32 byte on those kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-16-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
4cefb0f6c5 powerpc: split validate_sp into two functions
Most callers just want to validate an arbitrary kernel stack pointer,
some need a particular size. Make the size case the exceptional one
with an extra function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
edbd0387f3 powerpc: copy_thread add a back chain to the switch stack frame
Stack unwinders need LR and the back chain as a minimum. The switch
stack uses regs->nip for its return pointer rather than lrsave, so
that was not set in the fork frame, and neither was the back chain.
This change sets those fields in the stack.

With this and the previous change, a stack trace in the switch or
interrupt stack goes from looking like this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 3 PID: 90 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:

To this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 0 PID: 93 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:
  [c000000005a93e10] [c00000000000cdbc] ret_from_fork_scv+0x0/0x54
  --- interrupt: 3000 at 0x7fffa72f56d8
  NIP:  00007fffa72f56d8 LR: 0000000000000000 CTR: 0000000000000000
  [ ... regs ... ]
  NIP [00007fffa72f56d8] 0x7fffa72f56d8
  LR [0000000000000000] 0x0
  --- interrupt: 3000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-14-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
6895dfc047 powerpc: copy_thread fill in interrupt frame marker and back chain
Backtraces will not recognise the fork system call interrupt without
the regs marker. And regular interrupt entry from userspace creates
the back chain to the user stack, so do this for the initial fork
frame too, to be consistent.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-13-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
6f291a0381 powerpc: add a define for the switch frame size and regs offset
This is open-coded in process.c, ppc32 uses a different define with the
same value, and the C definition is name differently which makes it an
extra indirection to grep for.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
1223e5a20f powerpc: add a define for the user interrupt frame size
The user interrupt frame is a different size from the kernel frame, so
give it its own name.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
e856e33692 powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
This is a count of longs from the stack pointer to the regs marker.
Rename it to make it more distinct from the other byte offsets. It
can be derived from the byte offset definitions just added.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
d2e8ff9f14 powerpc: add a definition for the marker offset within the interrupt frame
Define a constant rather than open-code the offset for the
"regs" marker.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
c03be0a3f3 powerpc: add definition for pt_regs offset within an interrupt frame
This is a common offset that currently uses the overloaded
STACK_FRAME_OVERHEAD constant. It's easier to read and more
flexible to use a specific regs offset for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
37195b820d powerpc: simplify ppc_save_regs
Adjust the pt_regs pointer so the interrupt frame offsets can be used
to save registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-7-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
baa49d81a9 powerpc/pseries: hvcall stack frame overhead
This call may use the min size stack frame. The scratch space used is
in the caller's parameter area frame, not this function's frame.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-6-npiggin@gmail.com
2022-12-02 17:54:07 +11:00
Nicholas Piggin
bc0677363d powerpc: Rearrange copy_thread child stack creation
This makes it a bit clearer where the stack frame is created, and will
allow easier use of some of the stack offset constants in a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-5-npiggin@gmail.com
2022-12-02 17:54:07 +11:00