IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In commit a35f8577a0 ("include/hw: add macros for deprecation &
removal of versioned machines"), a new machine version deprecation and
removal policy was introduced. After only 3 years a machine version
will be deprecated while being removed after 6 years.
The deprecation is a bit early considering major PVE releases are
approximately every 2 years. This means that a deprecation warning can
already happen for a machine version that was introduced during the
previous major release. This would scare users for no good reason, so
avoid deprecating machine versions in PVE too early and define a
baseline of machine versions that will be supported throughout a
single major PVE release.
Reported-by: Martin Maurer <martin@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The variable has been unused since commit 67af0fa ("rebased pve
patches") back in 2017. There is no comment to why, but before that,
it was used to error out if there were no disks in the vma archive.
This should be possible however, so it's safe to assume this was an
intentional change.
This fixes compilation with clang19.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Pick up to stable fixes for virtio-net, one fixing multiqueue
initialization and one fixing potential out-of-bounds access (in the
work_around_broken_dhclient() hack that luckily seems to be
unreachable when 'vhost=on' is used for the device, which Proxmox VE
does except when running a non-native VM arch or if the vhost device
is not available).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Return values for qemu_savevm_state_setup() and blk_set_aio_context()
now get checked.
Move the qemu_coroutine_create() call to after the new early return
to avoid a potential memory leak.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Notable changes, most interestingly the two build system changes:
* avoid making 'migration' target depend on 'libproxmox_backup_qemu':
Having pbs-state.c be part of the 'migration_files' makes the
'migration' target depend on 'libproxmox_backup_qemu'. Adding the
dependency to 'migration' and 'libmigration' would not be enough
however, because pbs-state.c depends on savevm.c (for
register_savevm_live()), and savevm.c is not itself part of the
'migration_files' and would need to be moved too. Otherwise, linking
the 'test-xbzrle' unit test is broken. Instead, don't declare
pbs-state.c to be part of the 'migration_files'.
* meson: pbs-restore + vma: add qemuutil dependency explicitly
Both pbs-restore and vma use "qemu/osdep.h" so the dependency is
present. Being explicit is required after commit 414b180d42 ("meson:
Pass objects and dependencies to declare_dependency()").
* QAPI docs "Notes:" to ".. note::" conversion following commit
d461c27973 ("qapi: convert "Note" sections to plain rST").
* Removal of QERR_* macros following commit
a95921f171 ("qapi: Inline and remove QERR_DEVICE_HAS_NO_MEDIUM
definition") and friends.
* Signature change for .save_setup callbacks following commit
01c3ac681b ("migration: Add Error** argument to .save_setup()
handler").
* Removal of separate .bdrv_file_open callbacks following commit
44b424dc4a ("block: remove separate bdrv_file_open callback")
* Adapt dirty bitmap migration error handling following commit
dd03167725 ("migration: Add Error** argument to
add_bitmaps_to_list()")
* Adapt savevm async to removed block migration following commit
eef0bae3a7 ("migration: Remove block migration")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.
From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.
In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.
It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")
[0]: https://forum.proxmox.com/threads/153483
Reported-by: Mario Loderer <m.loderer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Mario Loderer <m.loderer@proxmox.com>
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Fix the two issues reported in the community forum[0][1], i.e.
regression in LSI-53c895a controller and ignored boot order for USB
storage (only possible via custom arguments in Proxmox VE), both
causing boot failures, and pick up fixes for VirtIO, ARM emulation,
char IO device and a graph lock fix for the block layer.
The block-copy patches that serve as a preparation for fleecing are
moved to the extra folder, because the graph lock fix requires them
to be present first. They have been applied upstream in the meantime
and should drop out with the rebase on 9.1.
[0]: https://forum.proxmox.com/threads/149772/post-679433
[1]: https://forum.proxmox.com/threads/149772/post-683459
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most relevant are some fixes for VirtIO and for ARM and i386
emulation. There also is a fix for VGA display to fix screen blanking,
which fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=4786
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
As reported in the community forum [0], cloning or importing images
to RBD storages (without the krbd setting) was broken. This is a
result of no filename parsing happening anymore in bdrv_open_child()
after commit b242e7f ("backport fix for CVE-2024-4467"), which the
zeroinit relied on for passing along the RBD filename+key-value pairs.
There is a dedicated function for opening the file child which still
does filename parsing. Use that for opening the file child. Role and
flags should still be the same as with the manual bdrv_open_child(),
because the zeroinit driver is a filter, and the assignment bs->file
is also done by bdrv_open_file_child().
Fixes: b242e7f ("backport fix for CVE-2024-4467")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[0]: https://forum.proxmox.com/threads/qemu-9-0-available-on-pve-no-subscription-as-of-now.149772/post-681620
FG: added missing link
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This prevents that malicious qcow2 images can already cause bad
effects if being queried via 'qemu-img info'.
For Proxmox VE, this is an additional safe guard, as currently it
directly creates and manages the qcow2 images used by VMs and does not
allow unprivileged users to import them.
Reference: https://access.redhat.com/security/cve/cve-2024-4467
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The 'status' pointer is dereferenced regardless of the NULL check,
i.e. 'status->closed' is accessed after the branch with the check.
Since all callers pass in the address of a struct on the stack, the
pointer can never be NULL. Remove the superfluous check and add an
assert instead.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
As reported in the community forum [0], doing a snapshot without
saving the VM state for a VM with a VirtIO block device with iothread
would lead to an assertion failure [1] and thus crash.
The issue is that vm_start() is called from the coroutine
qmp_savevm_end() which violates assumptions about graph locking down
the line. Factor out the part of qmp_savevm_end() that actually needs
to be a coroutine into a separate helper and turn qmp_savevm_end()
into a non-coroutine, so that it can call vm_start() safely.
The issue is likely not new, but was exposed by the recent graph
locking rework introducing stricter checks.
The issue does not occur when saving the VM state, because then the
non-coroutine process_savevm_finalize() will already call vm_start()
before qmp_savevm_end().
[0]: https://forum.proxmox.com/threads/149883/
[1]:
> #0 0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c)
> #1 0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2)
> #2 0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472)
> #3 0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395)
> #4 0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2)
> #5 0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3)
> #6 0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455)
> #7 0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb)
> #8 0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122)
> #9 0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90)
> #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927)
> #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4)
> #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517)
> #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b)
> #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd)
> #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749)
> #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8)
> #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98)
> #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e)
> #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b)
> #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
On ARM, gcc warns (-Werror=type-limits) that it will always be false
for the if statement. This is because here s->aid is defined as char,
while proxmox_restore_open_image() returns an int.
This is probably because chars are treated as unsigned on arm arch but
signed on x86 arch:
https://developer.arm.com/documentation/den0013/d/Porting/Miscellaneous-C-porting-issues/unsigned-char-and-signed-char
Make aid an explicit uint8_t, because that is the type for functions
taking the aid as a parameter, e.g. proxmox_restore_get_image_length().
Signed-off-by: Jing Luo <jing@jing.rocks>
[FE: slightly improve commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most importantly the first one "Revert "monitor: use
aio_co_reschedule_self()"", fixing a crash when doing hotplug+resize
with a disk using io_uring.
Other fixes (likely not too important) for TCG emulation of x86(_64)
and ARM.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most importantly, fix forwards and backwards migration with VirtIO-GPU
display.
Other fixes are for a regression in pflash device (introduced in 8.2)
and some fixes for x86(_64) TCG emulation. One of the patches needed
to be adapted, because it removed a helper that is still in use in
9.0.0.
There also is a revert for a fix in VirtIO PCI devices that turned out
to cause some issues, see the revert itself for more details.
Lastly, there is a change to move compatibility flags for a new
VirtIO-net feature to the correct machine type. The feature was
introduced in QEMU 8.2, but the compatibility flags got added to
machine version 8.0 instead of 8.1. This breaks backwards migration
with machine version 8.1 from a 8.2/9.0 binary to an 8.1 binary, in
cases where the guest kernel enables the feature (e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched
to a patched binary, Proxmox VE only ever had 8.2 on the test
repository and 9.0 not yet in any public repository. An upstream
developer suggested it is the proper fix [0]. Upstream submission [1].
[0]: https://lore.kernel.org/qemu-devel/CACGkMEtZrJuhof+hUGVRvLLQE+8nQE5XmSHpT0NAQ1EpnqfmsA@mail.gmail.com/T/#u
[1]: https://lore.kernel.org/qemu-devel/20240517075336.104091-1-f.ebner@proxmox.com/T/#u
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Biggest change is that AioContext locking got removed, but no changes
required other than dropping the calls to acquire and release it. As a
consequence, the single parameter for the bdrv_graph_wrlock() call got
removed which also required adaptation.
QAPI docs became stricter requiring to document all members.
Other minor changes:
- Single parameter from migration_is_running() was dropped.
- qemu_mutex_(un)lock_iothread() got renamed to bql_(un)lock().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.
During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.
The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.
Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.
Major changes only affect alloc-track:
* It is not possible to call a generated co-wrapper like
bdrv_get_info() while holding the block graph lock exclusively [0],
which does happen during initialization of alloc-track when the
backing hd is set and the refresh_limits driver callback is invoked.
The bdrv_get_info() call to get the cluster size is moved to
directly after opening the file child in track_open().
The important thing is that at least the request alignment for the
write target is used, because then the RMW cycle in bdrv_pwritev
will gather enough data from the backing file. Partial cluster
allocations in the target are not a fundamental issue, because the
driver returns its allocation status based on the bitmap, so any
other data that maps to the same cluster will still be copied later
by a stream job (or during writes to that cluster).
* Replacing the node cannot be done in the
track_co_change_backing_file() callback, because it is a coroutine
and cannot hold the block graph lock exclusively. So it is moved to
the stream job itself with the auto-remove option not having an
effect anymore (qemu-server would always set it anyways).
In the future, there could either be a special option for the stream
job, or maybe the upcoming blockdev-replace QMP command can be used.
Replacing the backing child is actually already done in the stream
job, so no need to do it in the track_co_change_backing_file()
callback. It also cannot be called from a coroutine. Looking at the
implementation in the qcow2 driver, it doesn't seem to be intended
to change the backing child itself, just update driver-internal
state.
Other changes:
* alloc-track: Error out early when used without auto-remove. Since
replacing the node now happens in the stream job, where the option
cannot be read from (it's internal to the driver), it will always be
treated as 'on'. Makes sure to have users beside qemu-server notice
the change (should they even exist). The option can be fully dropped
in the future while adding a version guard in qemu-server.
* alloc-track: Avoid seemingly superfluous child permission update.
Doesn't seem necessary nowadays (maybe after commit "alloc-track:
fix deadlock during drop" where the dropping is not rescheduled and
delayed anymore or some upstream change). Replacing the block node
will already update the permissions of the new node (which was the
file child before). Should there really be some issue, instead of
having a drop state, this could also be just based off the fact
whether there is still a backing child.
Dumping the cumulative (shared) permissions for the BDS with a debug
print yields the same values after this patch and with QEMU 8.1,
namely 3 and 5.
* PBS block driver: compile unconditionally. Proxmox VE always needs
it and something in the build process changed to make it not enabled
by default. Probably would need to move the build option to meson
otherwise.
* backup: job unreferencing during cleanup needs to happen outside of
coroutine, so it was moved to before invoking the clean
* mirror: Cherry-pick stable fix to avoid potential deadlock.
* savevm-async: migrate_init now can fail, so propagate potential
error.
* savevm-async: compression counters are not accessible outside
migration/ram-compress now, so drop code that prophylactically set
it to zero.
[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Same rationale as 6facdf3 ("also exclude hppa-firmware.img ROM from
build"), not used by Proxmox VE and would cause a failure during
build.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Namely, it's also necessary to remove .dts source files from the
meson.build file, because the .dtb file names are not directly listed
anymore since commit 6e0dc9d2a8 ("meson: compile bundled device
trees").
The same commit also introduced a "'.dtb'" in a line not just listing
a file name and removing that line would break the script. Be more
precise and require an alphanumeric character before the suffix.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
From man dpkg-buildpackage:
> -j, --jobs[=jobs|auto]
> Specifies the number of jobs allowed to be run simultaneously (since
> dpkg 1.14.7, long option since dpkg 1.18.8). The number of jobs
> matching the number of online processors if auto is specified (since
> dpkg 1.17.10), or unlimited number if jobs is not specified. The
> default behavior is auto (since dpkg 1.18.11) in non-forced mode
> (since dpkg 1.21.10), and as such it is always safer to use with any
> package including those that are not parallel-build safe.
The option was added in the Makefile by commit 4ba321f ("build qemu
multithreaded") which states:
> same as in pve-kernel where we have --jobs=auto
But according to the man page, -j without an argument is not the same
and means unlimited. Using the number of online cores seems more
sensible and was the original intention. Again, according to the man
page, the default is auto since dpkg 1.18.11 (or Debian Stretch), so
just drop the option.
The motivation to look into this was that after the recent upstream
commit d1ce2cc95b ("Makefile: preserve --jobserver-auth argument when
calling ninja") having -j as the make flag would be broken as it was
mistakenly passed to ninja (for which the argument for -j is not
optional). Should get fixed soon [0].
[0]: https://lore.kernel.org/qemu-devel/20240412100401.20047-2-pbonzini@redhat.com/T/#u
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Excerpt from Fiona's v3 cover-letter [0]:
When a backup for a VM is started, QEMU will install a
"copy-before-write" filter in its block layer. This filter ensures
that upon new guest writes, old data still needed for the backup is
sent to the backup target first. The guest write blocks until this
operation is finished so guest IO to not-yet-backed-up sectors will be
limited by the speed of the backup target.
With backup fleecing, such old data is cached in a fleecing image
rather than sent directly to the backup target. This can help guest IO
performance and even prevent hangs in certain scenarios, at the cost
of requiring more storage space.
With this series it will be possible to enable backup-fleecing via
e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` with fleecing
images created on the storage `local-lvm`. The fleecing storage should
be a fast local storage which supports thin-provisioning and discard.
If the storage supports qcow2, that is used as the fleecing image
format. If the underlying file system does not support discard, with
qcow2 and preallocation=off, at least already allocated parts of the
image can be re-used later.
Fleecing images are created by qemu-server via pve-storage and
attached to QEMU before the backup starts, and cleaned up after the
backup finished or failed. The naming schema for fleecing images is
'vm-ID-fleece-N(.FORMAT)'. The allocated images are recorded in the
guest configuration, so that even after a hard failure, clean-up can
be re-attempted. While not too bad, it's a non-trivial amount of code
and I'm not 100% sure about the cost-benefit, so sending those as RFC.
The fleecing image needs to be the exact same size as the source, but
luckily, an explicit size can be specified when attaching a raw image
to QEMU so there are no size issues when using storages that have
coarser allocation/round up. For qcow2, it seems that virtual size can
be nearly arbitrary (i.e. modulo 512 byte granularity) during
allocation.
[0]: https://lists.proxmox.com/pipermail/pve-devel/2024-April/062815.html
Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>