1089773 Commits

Author SHA1 Message Date
Guilherme G. Piccoli
e2aa34ce80 powerpc/setup: Refactor/untangle panic notifiers
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
2022-05-19 23:11:26 +10:00
Michael Ellerman
b104e41cda Merge branch 'topic/ppc-kvm' into next
Merge our KVM topic branch.
2022-05-19 23:10:42 +10:00
Fabiano Rosas
ad55bae7dc KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.

Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2022-05-19 00:44:50 +10:00
Alexey Kardashevskiy
b22af90419 KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers
Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.

This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2022-05-19 00:44:28 +10:00
Alexey Kardashevskiy
29592181c5 KVM: PPC: Book3s: PR: Enable default TCE hypercalls
When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
were already implemented and enabled by default; however H_GET_TCE
was missed out on PR KVM (probably because the handler was in
the real mode code at the time).

This enables H_GET_TCE by default. While at this, this wraps
the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2022-05-19 00:44:15 +10:00
Alexey Kardashevskiy
cad32d9d42 KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-19 00:44:01 +10:00
Michael Ellerman
750137ec6c Merge branch 'fixes' into topic/ppc-kvm
Merge our fixes branch. In parciular this brings in the KVM TCE handling
fix, which is a prerequisite for a subsequent patch.
2022-05-19 00:43:04 +10:00
Fabiano Rosas
1d1cd0f12a KVM: PPC: Book3S HV: Initialize AMOR in nested entry
The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2022-05-19 00:28:49 +10:00
Michael Ellerman
a5fc286f69 Merge branch 'fixes' into next
Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.
2022-05-19 00:11:51 +10:00
Bo Liu
15eb1b6afc KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()
The return value type defined in the function kvm_age_rmapp() is
"bool", but the return value type defined in the implementation of the
function kvm_age_rmapp() is "int".

Change the return value type to "bool".

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2022-05-18 23:34:39 +10:00
Xiaomeng Tong
300981abdd KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
The bug is here:
	if (!p)
                return ret;

The list iterator value 'p' will *always* be set and non-NULL by
list_for_each_entry(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element is found.

To fix the bug, Use a new value 'iter' as the list iterator, while use
the old value 'p' as a dedicated variable to point to the found element.

Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
2022-05-18 23:31:35 +10:00
Bagas Sanjaya
d53c36e6c8 KVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() comment
kernel test robot reported kernel-doc warning for rm_host_ipi_action():

   arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment.
    * Host Operations poked by RM KVM

Since the function is static, remove the extraneous (second) asterisk at
the head of function comment.

Fixes: 0c2a66062470cd ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linux-doc/202204252334.Cd2IsiII-lkp@intel.com/
Link: https://lore.kernel.org/r/20220506070747.16309-1-bagasdotme@gmail.com
2022-05-18 23:27:24 +10:00
Nicholas Piggin
2852ebfa10 KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
The L1 should not be able to adjust LPES mode for the L2. Setting LPES
if the L0 needs it clear would cause external interrupts to be sent to
L2 and missed by the L0.

Clearing LPES when it may be set, as typically happens with XIVE enabled
could cause a performance issue despite having no native XIVE support in
the guest, because it will cause mediated interrupts for the L2 to be
taken in HV mode, which then have to be injected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
2022-05-13 21:34:33 +10:00
Nicholas Piggin
11681b79b1 KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
The PowerNV L0 currently pushes the OS xive context when running a vCPU,
regardless of whether it is running a nested guest. The problem is that
xive OS ring interrupts will be delivered while the L2 is running.

At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which
actually makes external interrupts go to the L0. That causes the L2 to
exit and the interrupt taken or injected into the L1, so in some
respects this behaves like an escalation. It's not clear if this was
deliberate or not, there's no comment about it and the L1 is actually
allowed to clear LPES in the L2, so it's confusing at best.

When the L2 is running, the L1 is essentially in a ceded state with
respect to external interrupts (it can't respond to them directly and
won't get scheduled again absent some additional event). So the natural
way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to
run the L2, have it arm the escalation interrupt and don't push the L1
context while running the L2.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
2022-05-13 21:34:33 +10:00
Nicholas Piggin
42b4a2b347 KVM: PPC: Book3S HV P9: Split !nested case out from guest entry
The differences between nested and !nested will become larger in
later changes so split them out for readability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-5-npiggin@gmail.com
2022-05-13 21:34:33 +10:00
Nicholas Piggin
ad5ace91c5 KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
Move the cede abort logic out of xive escalation rearming and into
the caller to prepare for handling a similar case with nested guest
entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-4-npiggin@gmail.com
2022-05-13 21:34:33 +10:00
Nicholas Piggin
026728dc5d KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
If there is a pending xive interrupt, inject it at guest entry (if
MSR[EE] is enabled) rather than take another interrupt when the guest
is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
should be expected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-3-npiggin@gmail.com
2022-05-13 21:34:33 +10:00
Nicholas Piggin
f104df7d51 KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS
KVMPPC_NR_LPIDS no longer represents any size restriction on the
LPID space and can be removed. A CPU with more than 12 LPID bits
implemented will now be able to create more than 4095 guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2022-05-13 21:33:34 +10:00
Nicholas Piggin
03a2e65f54 KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum
Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic,
fix it to 4096 (12-bits) explicitly for now.

kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS
because the L1 partition table registration hcall already did that, and
it checks against the partition table size.

This patch also puts all the partition table size calculations into the
same form, using 12 for the architected size field shift and 4 for the
shift corresponding to the partition table entry size.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2022-05-13 21:33:34 +10:00
Nicholas Piggin
c0f00a18e2 KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr
This removes the fixed sized kvm->arch.nested_guests array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
2022-05-13 21:33:34 +10:00
Nicholas Piggin
6ba2a2924d KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator
This removes the fixed-size lpid_inuse array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
2022-05-13 21:33:33 +10:00
Nicholas Piggin
5d506f159b KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested
The LPID allocator init is changed to:
- use mmu_lpid_bits rather than hard-coding;
- use KVM_MAX_NESTED_GUESTS for nested hypervisors;
- not reserve the top LPID on POWER9 and newer CPUs.

The reserved LPID is made a POWER7/8-specific detail.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2022-05-13 21:33:33 +10:00
Nicholas Piggin
18827eeef0 KVM: PPC: Remove kvmppc_claim_lpid
Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to
change the underlying implementation in a future patch.

The host LPID is always 0, so that can be a detail of the allocator. If
the allocator range is restricted, that can reserve LPIDs at the top of
the range. This allows kvmppc_claim_lpid to be removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2022-05-13 21:33:33 +10:00
Nicholas Piggin
361234d7a1 KVM: PPC: Book3S HV P9: Optimise loads around context switch
It is better to get all loads for the register values in flight
before starting to switch LPID, PID, and LPCR because those
mtSPRs are expensive and serialising.

This also just tidies up the code for a potential future change
to the context switching sequence.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123114725.3549202-1-npiggin@gmail.com
2022-05-13 21:33:26 +10:00
Nicholas Piggin
861604614a KVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist
This facility is controlled by FSCR only. Reserved bits should not be
set in the HFSCR register (although it's likely harmless as this
position would not be re-used, and the L0 is forgiving here too).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122105639.3477407-1-npiggin@gmail.com
2022-05-13 21:33:19 +10:00
Laurent Dufour
b6b1c3ce06 powerpc/rtas: Keep MSR[RI] set when calling RTAS
RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big
endian mode (MSR[SF,LE] unset).

The change in MSR is done in enter_rtas() in a relatively complex way,
since the MSR value could be hardcoded.

Furthermore, a panic has been reported when hitting the watchdog interrupt
while running in RTAS, this leads to the following stack trace:

  watchdog: CPU 24 Hard LOCKUP
  watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago)
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  Oops: Unrecoverable System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 3ddec07f638c34a2 ]---

This happens because MSR[RI] is unset when entering RTAS but there is no
valid reason to not set it here.

RTAS is expected to be called with MSR[RI] as specified in PAPR+ section
"7.2.1 Machine State":

  R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect
  its own critical regions from recursion by setting the MSR[RI] bit to
  0 when in the critical regions.

Fixing this by reviewing the way MSR is compute before calling RTAS. Now a
hardcoded value meaning real mode, 32 bits big endian mode and Recoverable
Interrupt is loaded. In the case MSR[S] is set, it will remain set while
entering RTAS as only urfid can unset it (thanks Fabiano).

In addition a check is added in do_enter_rtas() to detect calls made with
MSR[RI] unset, as we are forcing it on later.

This patch has been tested on the following machines:
Power KVM Guest
  P8 S822L (host Ubuntu kernel 5.11.0-49-generic)
PowerVM LPAR
  P8 9119-MME (FW860.A1)
  p9 9008-22L (FW950.00)
  P10 9080-HEX (FW1010.00)

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
2022-05-11 23:06:40 +10:00
Christophe Leroy
5ad1aa007d powerpc/8xx: Use kmalloced data structure instead of global static
Use a kmalloced data structure to store interrupt controller internal
data instead of static global variables.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8f0866ee013113d5e28948943cf0586e49f5353.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:40 +10:00
Christophe Leroy
e3ba31b780 powerpc/8xx: Remove mpc8xx_pics_init()
mpc8xx_pics_init() is now only a trampoline to
mpc8xx_pic_init().

Remove mpc8xx_pics_init() and use mpc8xx_pic_init()
directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9c55a698adb5ba3b7b77023170fcaf0acb5d2d81.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:40 +10:00
Christophe Leroy
14d893fc68 powerpc/8xx: Convert CPM1 interrupt controller to platform_device
In the same logic as commit be7ecbd240b2 ("soc: fsl: qe: convert QE
interrupt controller to platform_device"), convert CPM1 interrupt
controller to platform_device.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb80d0b2077312079c49da0296e25591578771cd.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:40 +10:00
Christophe Leroy
22add2a20e powerpc/8xx: Convert CPM1 error interrupt handler to platform driver
Add CPM error interrupt as a standalone platform driver,
to simplify the init of CPM interrupt handler.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/375a72df6e4a26c5959cc81a6c6d46152efa2306.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:40 +10:00
Christophe Leroy
acf9e575d8 powerpc/8xx: Move CPM interrupt controller into a dedicated file
CPM interrupt controller is quite standalone. Move it into a
dedicated file. It will help for next step which will change
it to a platform driver.

This is pure code move, checkpatch report is ignored at this point,
except one parenthesis alignment which would remain at the end of
the series. All other points fly away with following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d3a7dc832d905bed14b35d83410cdb69a7ba20e8.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:39 +10:00
Christophe Leroy
d8d2af70b9 cxl/ocxl: Prepare cleanup of powerpc's asm/prom.h
powerpc's asm/prom.h brings some headers that it doesn't
need itself.

In order to clean it up, first add missing headers in
users of asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a2bae89b280e7a7cb87889635d9911d6a245e780.1648833388.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:39 +10:00
Christophe Leroy
a486e512d1 macintosh: Prepare cleanup of powerpc's asm/prom.h
powerpc's asm/prom.h brings some headers that it doesn't
need itself.

In order to clean it up, first add missing headers in
users of asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/04961364547fe4556e30cb302b0e20a939b83426.1648833027.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:39 +10:00
Christophe Leroy
1751289268 powerpc/code-patching: Use jump_label to check if poking_init() is done
It's only during early startup that poking_init() is not done yet,
for instance when calling ftrace_init().

Once poking_init() has been called there must be a poking area, no
need to check it everytime patch_instruction() is called.

ftrace activation time is reduced by 7% with the change on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8d6088aca7b63247377b6d9e4897d08d935fbe93.1647962456.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:39 +10:00
Christophe Leroy
b033767848 powerpc/code-patching: Use jump_label for testing freed initmem
Once init is done, initmem is freed forever so no need to
test system_state at every call to patch_instruction().

Use jump_label.

This reduces by 2% the time needed to activate ftrace on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0aee964721cab7316cffde21a2ca223cee14d373.1647962456.git.christophe.leroy@csgroup.eu
2022-05-11 23:06:39 +10:00
Alexander Graf
ee8348496c KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
moved the switch_mmu_context() to C. While in principle a good idea, it
meant that the function now uses the stack. The stack is not accessible
from real mode though.

So to keep calling the function, let's turn on MSR_DR while we call it.
That way, all pointer references to the stack are handled virtually.

In addition, make sure to save/restore r12 on the stack, as it may get
clobbered by the C function.

Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220510123717.24508-1-graf@amazon.com
2022-05-11 23:03:16 +10:00
Christophe Leroy
cb3ac45214 powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
If CONFIG_MODULES is not set, there is no point in checking
whether text is in module area.

This reduced the time needed to activate/deactivate ftrace
by more than 10% on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f3c701cce00a38620788c0fc43ff0b611a268c54.1647962456.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:41 +10:00
Christophe Leroy
65883b78bc powerpc: align address to page boundary in change_page_attr()
Aligning address to page boundary allows flush_tlb_kernel_range()
to know it's a single page flush and use tlbie instead of tlbia.

On 603 we now have the following code in first leg of
change_page_attr():

	  2c:	55 29 00 3c 	rlwinm  r9,r9,0,0,30
	  30:	91 23 00 00 	stw     r9,0(r3)
	  34:	7c 00 22 64 	tlbie   r4,r0
	  38:	7c 00 04 ac 	hwsync
	  3c:	38 60 00 00 	li      r3,0
	  40:	4e 80 00 20 	blr

Before we had:

	  28:	55 29 00 3c 	rlwinm  r9,r9,0,0,30
	  2c:	91 23 00 00 	stw     r9,0(r3)
	  30:	54 89 00 26 	rlwinm  r9,r4,0,0,19
	  34:	38 84 10 00 	addi    r4,r4,4096
	  38:	7c 89 20 50 	subf    r4,r9,r4
	  3c:	28 04 10 00 	cmplwi  r4,4096
	  40:	41 81 00 30 	bgt     70 <change_page_attr+0x70>
	  44:	7c 00 4a 64 	tlbie   r9,r0
	  48:	7c 00 04 ac 	hwsync
	  4c:	38 60 00 00 	li      r3,0
	  50:	4e 80 00 20 	blr
	...
	  70:	94 21 ff f0 	stwu    r1,-16(r1)
	  74:	7c 08 02 a6 	mflr    r0
	  78:	90 01 00 14 	stw     r0,20(r1)
	  7c:	48 00 00 01 	bl      7c <change_page_attr+0x7c>
				7c: R_PPC_REL24	_tlbia
	  80:	80 01 00 14 	lwz     r0,20(r1)
	  84:	38 60 00 00 	li      r3,0
	  88:	7c 08 03 a6 	mtlr    r0
	  8c:	38 21 00 10 	addi    r1,r1,16
	  90:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6bb118fb2ee89fa3c1f9cf90ed19f88220002cb0.1647877467.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:41 +10:00
Christophe Leroy
9290c379d1 powerpc/8xx: Simplify flush_tlb_kernel_range()
In the same spirit as commit 63f501e07a85 ("powerpc/8xx: Simplify TLB
handling"), simplify flush_tlb_kernel_range() for 8xx.

8xx cannot be SMP, and has 'tlbie' and 'tlbia' instructions, so
an inline version of flush_tlb_kernel_range() for 8xx is worth it.

With this page, first leg of change_page_attr() is:

	  2c:	55 29 00 3c 	rlwinm  r9,r9,0,0,30
	  30:	91 23 00 00 	stw     r9,0(r3)
	  34:	7c 00 22 64 	tlbie   r4,r0
	  38:	7c 00 04 ac 	hwsync
	  3c:	38 60 00 00 	li      r3,0
	  40:	4e 80 00 20 	blr

Before the patch it was:

	  30:	55 29 00 3c 	rlwinm  r9,r9,0,0,30
	  34:	91 2a 00 00 	stw     r9,0(r10)
	  38:	94 21 ff f0 	stwu    r1,-16(r1)
	  3c:	7c 08 02 a6 	mflr    r0
	  40:	38 83 10 00 	addi    r4,r3,4096
	  44:	90 01 00 14 	stw     r0,20(r1)
	  48:	48 00 00 01 	bl      48 <change_page_attr+0x48>
				48: R_PPC_REL24	flush_tlb_kernel_range
	  4c:	80 01 00 14 	lwz     r0,20(r1)
	  50:	38 60 00 00 	li      r3,0
	  54:	7c 08 03 a6 	mtlr    r0
	  58:	38 21 00 10 	addi    r1,r1,16
	  5c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d2610043419ce3e0e53a85386baf2c3625af5cfb.1647877442.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
e59596a2d6 powerpc: Use static call for get_irq()
__do_irq() inconditionnaly calls ppc_md.get_irq()

That's definitely a hot path.

At the time being ppc_md.get_irq address is read every time
from ppc_md structure.

Replace that call by a static call, and initialise that
call after ppc_md.init_IRQ() has set ppc_md.get_irq.

Emit a warning and don't set the static call if ppc_md.init_IRQ()
is still NULL, that way the kernel won't blow up if for some
reason ppc_md.get_irq() doesn't get properly set.

With the patch:

	00000000 <__SCT__ppc_get_irq>:
	   0:	48 00 00 20 	b       20 <__static_call_return0>	<== Replaced by 'b <ppc_md.get_irq>' at runtime
...
	00000020 <__static_call_return0>:
	  20:	38 60 00 00 	li      r3,0
	  24:	4e 80 00 20 	blr
...
	00000058 <__do_irq>:
...
	  64:	48 00 00 01 	bl      64 <__do_irq+0xc>
				64: R_PPC_REL24	__SCT__ppc_get_irq
	  68:	2c 03 00 00 	cmpwi   r3,0
...

Before the patch:

	00000038 <__do_irq>:
...
	  3c:	3d 20 00 00 	lis     r9,0
				3e: R_PPC_ADDR16_HA	ppc_md+0x1c
...
	  44:	81 29 00 00 	lwz     r9,0(r9)
				46: R_PPC_ADDR16_LO	ppc_md+0x1c
...
	  4c:	7d 29 03 a6 	mtctr   r9
	  50:	4e 80 04 21 	bctrl
	  54:	2c 03 00 00 	cmpwi   r3,0
...

On PPC64 which doesn't implement static calls yet we get:

00000000000000d0 <__do_irq>:
...
      dc:	00 00 22 3d 	addis   r9,r2,0
			dc: R_PPC64_TOC16_HA	.data+0x8
...
      e4:	00 00 89 e9 	ld      r12,0(r9)
			e4: R_PPC64_TOC16_LO_DS	.data+0x8
...
      f0:	a6 03 89 7d 	mtctr   r12
      f4:	18 00 41 f8 	std     r2,24(r1)
      f8:	21 04 80 4e 	bctrl
      fc:	18 00 41 e8 	ld      r2,24(r1)
...

So on PPC64 that's similar to what we get without static calls.
But at least until ppc_md.get_irq() is set the call is to
__static_call_return0.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afb92085f930651d8b1063e4d4bf0396c80ebc7d.1647002274.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
a1ae431705 powerpc: Use rol32() instead of opencoding in csum_fold()
rol32(x, 16) will do the rotate using rlwinm.

No need to open code using inline assembly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/794337eff7bb803d2c4e67d9eee635390c4c48fe.1646812553.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
e6f6390ab7 powerpc: Add missing headers
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h,
asm/pci.h etc...

Include the needed headers, and remove asm/prom.h when it was
needed exclusively for pulling necessary headers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:40 +10:00
Christophe Leroy
86c38fec69 powerpc: Remove asm/prom.h from all files that don't need it
Several files include asm/prom.h for no reason.

Clean it up.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop change to prom_parse.c as reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08 22:15:04 +10:00
Kajol Jain
348c713441 powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks for string size which can panic the kernel, like incase
of overflow detection.

In papr_scm, papr_scm_pmu_check_events function uses stat->stat_id with
string operations, to populate the nvdimm_events_map array. Since
stat_id variable is not NULL terminated, the kernel panics with
CONFIG_FORTIFY_SOURCE enabled at boot time.

Below are the logs of kernel panic:

  detected buffer overflow in __fortify_strlen
  ------------[ cut here ]------------
  kernel BUG at lib/string_helpers.c:980!
  Oops: Exception in kernel mode, sig: 5 [#1]
  NIP [c00000000077dad0] fortify_panic+0x28/0x38
  LR [c00000000077dacc] fortify_panic+0x24/0x38
  Call Trace:
  [c0000022d77836e0] [c00000000077dacc] fortify_panic+0x24/0x38 (unreliable)
  [c00800000deb2660] papr_scm_pmu_check_events.constprop.0+0x118/0x220 [papr_scm]
  [c00800000deb2cb0] papr_scm_probe+0x288/0x62c [papr_scm]
  [c0000000009b46a8] platform_probe+0x98/0x150

Fix this issue by using kmemdup_nul() to copy the content of
stat->stat_id directly to the nvdimm_events_map array.

mpe: stat->stat_id comes from the hypervisor, not userspace, so there is
no security exposure.

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220505153451.35503-1-kjain@linux.ibm.com
2022-05-06 12:44:03 +10:00
Christophe Leroy
669df99c95 powerpc: Add missing declaration in asm/drmem.h
Don't rely on random inclusion of linux/of.h by users
of asm/drmem.h

Add a forward declaration of struct property and
struct device_node.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5643ec410e51b749db0636471cb7979524f9ed0e.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:21 +10:00
Christophe Leroy
eb4713c40a powerpc: Include asm/reg.h in asm/svm.h
is_secure_guest() uses mfmsr().

Don't rely on users to include asm/reg.h, include
it in asm/svm.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/482c82c8a29d5fb3ea279b34f107e0e775001344.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:20 +10:00
Christophe Leroy
07071346bb powerpc: Don't include asm/prom.h in asm/parport.h
parport.h needs only of_irq.h, no need to go via asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec796ee56cf61f16ba24e62a9d3525d11931538c.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:20 +10:00
Christophe Leroy
0aa297e73b powerpc/64: Move pci_device_from_OF_node() out of asm/pci-bridge.h
Move pci_device_from_OF_node() in pci64.c because it needs definition
of struct device_node and is not worth inlining.

ppc32.c already has it in pci32.c.

That way pci-bridge.h doesn't need linux/of.h (Brought by asm/prom.h
via asm/pci.h)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3c88286b55413730d7784133993a46ef4a3607ce.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:20 +10:00
Christophe Leroy
f206fdd9d4 powerpc: Reduce csum_add() complexity for PPC64
PPC64 does everything in C, gcc is able to skip calculation
when one of the operands in zero.

Move the constant folding in PPC32 part.

This helps GCC and reduces ppc64_defconfig by 170 bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4ca63dd4c4b09e1906d08fb814af5a41d0f3fcb.1644651363.git.christophe.leroy@csgroup.eu
2022-05-06 00:00:20 +10:00
Nicholas Piggin
a553476c44 powerpc/64: remove system call instruction emulation
emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.

This patch removes system call emulation and disables stepping system
calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-06 00:00:20 +10:00