qemu/migration
Peter Xu 4146b77ec7 migration/postcopy: Add postcopy-recover-setup phase
This patch adds a migration state on src called "postcopy-recover-setup".
The new state will describe the intermediate step starting from when the
src QEMU received a postcopy recovery request, until the migration channels
are properly established, but before the recovery process take place.

The request came from Libvirt where Libvirt currently rely on the migration
state events to detect migration state changes.  That works for most of the
migration process but except postcopy recovery failures at the beginning.

Currently postcopy recovery only has two major states:

  - postcopy-paused: this is the state that both sides of QEMU will be in
    for a long time as long as the migration channel was interrupted.

  - postcopy-recover: this is the state where both sides of QEMU handshake
    with each other, preparing for a continuation of postcopy which used to
    be interrupted.

The issue here is when the recovery port is invalid, the src QEMU will take
the URI/channels, noticing the ports are not valid, and it'll silently keep
in the postcopy-paused state, with no event sent to Libvirt.  In this case,
the only thing Libvirt can do is to poll the migration status with a proper
interval, however that's less optimal.

Considering that this is the only case where Libvirt won't get a
notification from QEMU on such events, let's add postcopy-recover-setup
state to mimic what we have with the "setup" state of a newly initialized
migration, describing the phase of connection establishment.

With that, postcopy recovery will have two paths to go now, and either path
will guarantee an event generated.  Now the events will look like this
during a recovery process on src QEMU:

  - Initially when the recovery is initiated on src, QEMU will go from
    "postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
    this event.

  - Depending on whether the channel re-establishment is succeeded:

    - In succeeded case, src QEMU will move from "postcopy-recover-setup"
      to "postcopy-recover".  Old QEMUs also have this event.

    - In failure case, src QEMU will move from "postcopy-recover-setup" to
      "postcopy-paused" again.  Old QEMUs don't have this event.

This guarantees that Libvirt will always receive a notification for
recovery process properly.

One thing to mention is, such new status is only needed on src QEMU not
both.  On dest QEMU, the state machine doesn't change.  Hence the events
don't change either.  It's done like so because dest QEMU may not have an
explicit point of setup start.  E.g., it can happen that when dest QEMUs
doesn't use migrate-recover command to use a new URI/channel, but the old
URI/channels can be reused in recovery, in which case the old ports simply
can work again after the network routes are fixed up.

Add a new helper postcopy_is_paused() detecting whether postcopy is still
paused, taking RECOVER_SETUP into account too.  When using it on both
src/dst, a slight change is done altogether to always wait for the
semaphore before checking the status, because for both sides a sem_post()
will be required for a recovery.

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Prasad Pandit <ppandit@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Buglink: https://issues.redhat.com/browse/RHEL-38485
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21 09:47:59 -03:00
..
block-dirty-bitmap.c migration: Add Error** argument to add_bitmaps_to_list() 2024-04-23 18:36:01 -04:00
channel-block.c io: follow coroutine AioContext in qio_channel_yield() 2023-09-07 20:32:11 -05:00
channel-block.h
channel.c migration: Fix migration_channel_read_peek() error path 2024-01-04 09:52:42 +08:00
channel.h
colo-failover.c
colo-stubs.c migration/colo: make colo_incoming_co() return void 2024-05-22 17:34:31 -03:00
colo.c migration: Rename thread debug names 2024-06-21 09:47:59 -03:00
dirtyrate.c migration: remove unnecessary zlib dependency 2024-05-25 13:28:02 +02:00
dirtyrate.h migration/calc-dirty-rate: millisecond-granularity period 2023-10-10 08:03:50 +08:00
exec.c migration: simplify exec migration functions 2024-03-04 07:12:40 +01:00
exec.h migration: convert exec backend to accept MigrateAddress. 2023-11-02 11:35:04 +01:00
fd.c migration: Deprecate fd: for file migration 2024-05-08 09:20:59 -03:00
fd.h migration: Revert mapped-ram multifd support to fd: URI 2024-03-22 12:12:08 -04:00
file.c migration/multifd: Add direct-io support 2024-06-21 09:47:22 -03:00
file.h migration/multifd: Add direct-io support 2024-06-21 09:47:22 -03:00
global_state.c migration 1st pull for 9.0 2024-01-05 13:35:25 +00:00
meson.build migration/multifd: add uadk compression framework 2024-06-14 14:01:29 -03:00
migration-hmp-cmds.c migration: Add direct-io parameter 2024-06-21 09:47:22 -03:00
migration-stats.c migration: migration_rate_limit_reset() don't need the QEMUFile 2023-10-31 08:44:33 +01:00
migration-stats.h migration: Remove transferred atomic counter 2023-10-31 08:44:33 +01:00
migration.c migration/postcopy: Add postcopy-recover-setup phase 2024-06-21 09:47:59 -03:00
migration.h migration: Use MigrationStatus instead of int 2024-06-21 09:47:59 -03:00
multifd-qpl.c migration/multifd: implement qpl compression and decompression 2024-06-14 14:01:29 -03:00
multifd-uadk.c migration/multifd: Switch to no compression when no hardware support 2024-06-14 14:01:30 -03:00
multifd-zero-page.c migration/multifd: solve zero page causing multiple page faults 2024-04-23 18:36:01 -04:00
multifd-zlib.c migration/multifd: put IOV initialization into compression method 2024-06-14 14:01:28 -03:00
multifd-zstd.c migration/multifd: put IOV initialization into compression method 2024-06-14 14:01:28 -03:00
multifd.c migration: Rename thread debug names 2024-06-21 09:47:59 -03:00
multifd.h migration/multifd: add uadk compression framework 2024-06-14 14:01:29 -03:00
options.c migration: Add direct-io parameter 2024-06-21 09:47:22 -03:00
options.h migration: Add direct-io parameter 2024-06-21 09:47:22 -03:00
page_cache.c
page_cache.h
postcopy-ram.c migration/postcopy: Add postcopy-recover-setup phase 2024-06-21 09:47:59 -03:00
postcopy-ram.h migration/postcopy: Add postcopy-recover-setup phase 2024-06-21 09:47:59 -03:00
qemu-file.c migration: remove unnecessary zlib dependency 2024-05-25 13:28:02 +02:00
qemu-file.h migration: Remove non-multifd compression 2024-05-08 09:20:59 -03:00
ram.c migration/multifd: Avoid the final FLUSH in complete() 2024-06-21 09:47:59 -03:00
ram.h migration/multifd: solve zero page causing multiple page faults 2024-04-23 18:36:01 -04:00
rdma.c migration/rdma: Fix a memory issue for migration 2024-03-11 14:41:40 -04:00
rdma.h migration: convert rdma backend to accept MigrateAddress 2023-11-02 11:35:03 +01:00
savevm.c migration/postcopy: Add postcopy-recover-setup phase 2024-06-21 09:47:59 -03:00
savevm.h migration: Add Error** argument to qemu_savevm_state_setup() 2024-04-23 18:36:01 -04:00
socket.c migration/multifd: Drop unnecessary helper to destroy IOC 2024-02-28 11:31:28 +08:00
socket.h migration/multifd: Drop unnecessary helper to destroy IOC 2024-02-28 11:31:28 +08:00
target.c migration: Add migration prefix to functions in target.c 2023-09-11 08:34:06 +02:00
threadinfo.c migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
threadinfo.h migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
tls.c
tls.h
trace-events migration: add "exists" info to load-state-field trace 2024-05-22 17:34:40 -03:00
trace.h
vmstate-types.c
vmstate.c migration: fix a typo 2024-05-22 17:34:40 -03:00
xbzrle.c migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
xbzrle.h migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
yank_functions.c migration/yank: Use channel features 2024-01-29 11:02:12 +08:00
yank_functions.h