linux/arch/powerpc
Srikar Dronamraju b8b9280303 powerpc/smp: Update cpu_core_map on all PowerPc systems
lscpu() uses core_siblings to list the number of sockets in the
system. core_siblings is set using topology_core_cpumask.

While optimizing the powerpc bootup path, Commit 4ca234a9cb
("powerpc/smp: Stop updating cpu_core_mask").  it was found that
updating cpu_core_mask() ended up taking a lot of time. It was thought
that on Powerpc, cpu_core_mask() would always be same as
cpu_cpu_mask() i.e number of sockets will always be equal to number of
nodes. As an optimization, cpu_core_mask() was made a snapshot of
cpu_cpu_mask().

However that was found to be false with PowerPc KVM guests, where each
node could have more than one socket. So with Commit c47f892d7a
("powerpc/smp: Reintroduce cpu_core_mask"), cpu_core_mask was updated
based on chip_id but in an optimized way using some mask manipulations
and chip_id caching.

However on non-PowerNV and non-pseries KVM guests (i.e not
implementing cpu_to_chip_id(), continued to use a copy of
cpu_cpu_mask().

There are two issues that were noticed on such systems
1. lscpu would report one extra socket.
On a IBM,9009-42A (aka zz system) which has only 2 chips/ sockets/
nodes, lscpu would report
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              160
On-line CPU(s) list: 0-159
Thread(s) per core:  8
Core(s) per socket:  6
Socket(s):           3                <--------------
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-79
NUMA node1 CPU(s):   80-159

2. Currently cpu_cpu_mask is updated when a core is
added/removed. However its not updated when smt mode switching or on
CPUs are explicitly offlined. However all other percpu masks are
updated to ensure only active/online CPUs are in the masks.
This results in build_sched_domain traces since there will be CPUs in
cpu_cpu_mask() but those CPUs are not present in SMT / CACHE / MC /
NUMA domains. A loop of threads running smt mode switching and core
add/remove will soon show this trace.
Hence cpu_cpu_mask has to be update at smt mode switch.

This will have impact on cpu_core_mask(). cpu_core_mask() is a
snapshot of cpu_cpu_mask. Different CPUs within the same socket will
end up having different cpu_core_masks since they are snapshots at
different points of time. This means when lscpu will start reporting
many more sockets than the actual number of sockets/ nodes / chips.

Different ways to handle this problem:
A. Update the snapshot aka cpu_core_mask for all CPUs whenever
   cpu_cpu_mask is updated. This would a non-optimal solution.
B. Instead of a cpumask_var_t, make cpu_core_map a cpumask pointer
   pointing to cpu_cpu_mask. However percpu cpumask pointer is frowned
   upon and we need a clean way to handle PowerPc KVM guest which is
   not a snapshot.
C. Update cpu_core_masks all PowerPc systems like in PowerPc KVM
guests using mask manipulations. This approach is relatively simple
and unifies with the existing code.
D. On top of 3, we could also resurrect get_physical_package_id which
   could return a nid for the said CPU. However this is not needed at this
   time.

Option C is the preferred approach for now.

While this is somewhat a revert of Commit 4ca234a9cb ("powerpc/smp:
Stop updating cpu_core_mask").

1. Plain revert has some conflicts
2. For chip_id == -1, the cpu_core_mask is made identical to
cpu_cpu_mask, unlike previously where cpu_core_mask was set to a core
if chip_id doesn't exist.

This goes by the principle that if chip_id is not exposed, then
sockets / chip / node share the same set of CPUs.

With the fix, lscpu o/p would be
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              160
On-line CPU(s) list: 0-159
Thread(s) per core:  8
Core(s) per socket:  6
Socket(s):           2                     <--------------
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-79
NUMA node1 CPU(s):   80-159

Fixes: 4ca234a9cb ("powerpc/smp: Stop updating cpu_core_mask")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100401.412519-3-srikar@linux.vnet.ibm.com
2021-08-27 00:56:53 +10:00
..
boot powerpc/microwatt: Add Ethernet to device tree 2021-08-27 00:56:53 +10:00
configs powerpc/configs/microwatt: Enable options for systemd 2021-08-27 00:56:53 +10:00
crypto crypto: powepc/sha1 - remove unneeded semicolon 2021-03-07 15:13:14 +11:00
include powerpc: Redefine HMT_xxx macros as empty on PPC32 2021-08-27 00:56:52 +10:00
kernel powerpc/smp: Update cpu_core_map on all PowerPc systems 2021-08-27 00:56:53 +10:00
kexec powerpc: Avoid link stack corruption in misc asm functions 2021-08-25 13:35:47 +10:00
kvm Merge branch 'topic/ppc-kvm' into next 2021-08-26 21:21:11 +10:00
lib powerpc: Only build restart_table.c for 64s 2021-07-01 22:50:54 +10:00
math-emu powerpc/64s: avoid reloading (H)SRR registers if they are still valid 2021-06-25 00:06:55 +10:00
mm powerpc: Refactor verification of MSR_RI 2021-08-26 21:21:07 +10:00
net powerpc/bpf: Reject atomic ops in ppc32 JIT 2021-07-05 22:23:25 +10:00
perf powerpc/perf: Fix the check for SIAR value 2021-08-25 22:38:19 +10:00
platforms Merge changes from Paul Gortmaker 2021-08-27 00:49:06 +10:00
purgatory powerpc/kexec: Don't use .machine ppc64 in trampoline_64.S 2021-04-08 21:17:43 +10:00
sysdev powerpc: Refactor verification of MSR_RI 2021-08-26 21:21:07 +10:00
tools powerpc/head_check: Fix shellcheck errors 2021-08-17 22:52:02 +10:00
xmon powerpc: Refactor verification of MSR_RI 2021-08-26 21:21:07 +10:00
Kbuild
Kconfig powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP 2021-08-25 13:35:48 +10:00
Kconfig.debug powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP 2021-08-25 13:35:48 +10:00
Makefile powerpc: Add "-z notext" flag to disable diagnostic 2021-08-15 13:49:39 +10:00
Makefile.postlink