Go to file
Michael Roth 801980f649 powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
For a power9 KVM guest with XIVE enabled, running a test loop
where we hotplug 384 vcpus and then unplug them, the following traces
can be seen (generally within a few loops) either from the unplugged
vcpu:

  cpu 65 (hwid 65) Ready to die...
  Querying DEAD? cpu 66 (66) shows 2
  list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:56!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
  CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
  NIP:  c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
  REGS: c0000009e5a17840 TRAP: 0700   Not tainted  (4.18.0-221.el8.ppc64le)
  MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040000
  ...
  NIP __list_del_entry_valid+0xac/0x100
  LR  __list_del_entry_valid+0xa8/0x100
  Call Trace:
    __list_del_entry_valid+0xa8/0x100 (unreliable)
    free_pcppages_bulk+0x1f8/0x940
    free_unref_page+0xd0/0x100
    xive_spapr_cleanup_queue+0x148/0x1b0
    xive_teardown_cpu+0x1bc/0x240
    pseries_mach_cpu_die+0x78/0x2f0
    cpu_die+0x48/0x70
    arch_cpu_idle_dead+0x20/0x40
    do_idle+0x2f4/0x4c0
    cpu_startup_entry+0x38/0x40
    start_secondary+0x7bc/0x8f0
    start_secondary_prolog+0x10/0x14

or on the worker thread handling the unplug:

  pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
  Querying DEAD? cpu 314 (314) shows 2
  BUG: Bad page state in process kworker/u768:3  pfn:95de1
  cpu 314 (hwid 314) Ready to die...
  page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
  flags: 0x5ffffc00000000()
  raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
  page dumped because: nonzero mapcount
  Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
  CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
  Workqueue: pseries hotplug workque pseries_hp_work_fn
  Call Trace:
    dump_stack+0xb0/0xf4 (unreliable)
    bad_page+0x12c/0x1b0
    free_pcppages_bulk+0x5bc/0x940
    page_alloc_cpu_dead+0x118/0x120
    cpuhp_invoke_callback.constprop.5+0xb8/0x760
    _cpu_down+0x188/0x340
    cpu_down+0x5c/0xa0
    cpu_subsys_offline+0x24/0x40
    device_offline+0xf0/0x130
    dlpar_offline_cpu+0x1c4/0x2a0
    dlpar_cpu_remove+0xb8/0x190
    dlpar_cpu_remove_by_index+0x12c/0x150
    dlpar_cpu+0x94/0x800
    pseries_hp_work_fn+0x128/0x1e0
    process_one_work+0x304/0x5d0
    worker_thread+0xcc/0x7a0
    kthread+0x1ac/0x1c0
    ret_from_kernel_thread+0x5c/0x80

The latter trace is due to the following sequence:

  page_alloc_cpu_dead
    drain_pages
      drain_pages_zone
        free_pcppages_bulk

where drain_pages() in this case is called under the assumption that
the unplugged cpu is no longer executing. To ensure that is the case,
and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
loop that waits for the cpu to reach a halted state by polling its
status via query-cpu-stopped-state RTAS calls. It only polls for 25
iterations before giving up, however, and in the trace above this
results in the following being printed only .1 seconds after the
hotplug worker thread begins processing the unplug request:

  pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
  Querying DEAD? cpu 314 (314) shows 2

At that point the worker thread assumes the unplugged CPU is in some
unknown/dead state and procedes with the cleanup, causing the race
with the XIVE cleanup code executed by the unplugged CPU.

Fix this by waiting indefinitely, but also making an effort to avoid
spurious lockup messages by allowing for rescheduling after polling
the CPU status and printing a warning if we wait for longer than 120s.

Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[mpe: Trim oopses in change log slightly for readability]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
2020-08-18 17:09:52 +10:00
arch powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death 2020-08-18 17:09:52 +10:00
block block-5.9-2020-08-14 2020-08-15 20:36:42 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-14 13:09:15 -07:00
Documentation Misc fixes and small updates all around the place: 2020-08-15 10:38:03 -07:00
drivers block-5.9-2020-08-14 2020-08-15 20:36:42 -07:00
fs io_uring-5.9-2020-08-15 2020-08-16 10:55:12 -07:00
include io_uring-5.9-2020-08-15 2020-08-16 10:55:12 -07:00
init OpenRISC updates for 5.9 2020-08-14 14:04:53 -07:00
ipc ipc/shm.c: remove the superfluous break 2020-08-12 10:58:02 -07:00
kernel io_uring-5.9-2020-08-15 2020-08-16 10:55:12 -07:00
lib iomap: constify ioreadX() iomem argument (as in generic implementation) 2020-08-14 19:56:57 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Merge branch 'akpm' (patches from Andrew) 2020-08-15 08:02:03 -07:00
net 9p pull request for inclusion in 5.9 2020-08-15 08:34:36 -07:00
samples Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
scripts Kconfig updates for v5.9 2020-08-14 11:04:45 -07:00
security Merge branch 'akpm' (patches from Andrew) 2020-08-12 11:24:12 -07:00
sound sound fixes for 5.9-rc1 2020-08-14 15:58:57 -07:00
tools Cleanup, SECCOMP_FILTER support, message printing fixes, and other 2020-08-15 18:50:32 -07:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt Merge branch 'akpm' (patches from Andrew) 2020-08-12 11:24:12 -07:00
.clang-format block: add bio_for_each_bvec_all() 2020-05-25 11:25:24 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap mailmap: add entry for Greg Kurz 2020-08-14 19:56:56 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: Replace HTTP links with HTTPS ones 2020-07-23 14:53:58 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS perf tools changes for v5.9: 2nd batch 2020-08-15 11:17:15 -07:00
Makefile Linux 5.9-rc1 2020-08-16 13:04:57 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.