IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Below are the commands that generated the changes along with the
rationale:
command: spatch --in-place scripts/coccinelle/error_propagate_null.cocci pve-backup.c
rationale: error_propagate() already checks for NULL in its second
argument
command: spatch --in-place scripts/coccinelle/round.cocci vma-reader.c vma-writer.c
rationale: DIV_ROUND_UP() macro is more readable than the expanded
calculation
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
It is necessary to reset the error pointer after error_report_err(),
because that function frees the error. Not doing so can lead to a
use-after-free and in particular error_setg() with the same error
pointer will run into assertion failure, because it asserts that no
previous error is set:
> #5 0x00007c1723674eb2 in __GI___assert_fail (assertion=assertion@entry=0x59132c9fc540 "*errp == NULL",
> file=file@entry=0x59132c9fc530 "../util/error.c", line=line@entry=68,
> function=function@entry=0x59132c9fc5f8 <__PRETTY_FUNCTION__.2> "error_setv")
> #6 0x000059132c7d250f in error_setv (errp=0x7c15839fafb8, src=0x59132c9af224 "../block/dirty-bitmap.c", line=182,
> func=0x59132c9af9b0 <__func__.17> "bdrv_dirty_bitmap_check", err_class=err_class@entry=ERROR_CLASS_GENERIC_ERROR,
> fmt=fmt@entry=0x59132c9af380 "Bitmap '%s' is currently in use by another operation and cannot be used", ap=0x7c15839fad60,
> suffix=0x0)
> #7 0x000059132c7d265c in error_setg_internal (errp=errp@entry=0x7c15839fafb8,
> src=src@entry=0x59132c9af224 "../block/dirty-bitmap.c", line=line@entry=182,
> func=func@entry=0x59132c9af9b0 <__func__.17> "bdrv_dirty_bitmap_check",
> fmt=fmt@entry=0x59132c9af380 "Bitmap '%s' is currently in use by another operation and cannot be used")
> #8 0x000059132c68fbc1 in bdrv_dirty_bitmap_check (bitmap=bitmap@entry=0x5913542d6190, flags=flags@entry=7,
> errp=errp@entry=0x7c15839fafb8)
> #9 0x000059132c3b951d in add_bitmaps_to_list (s=s@entry=0x59132d87ee40 <dbm_state>, bs=bs@entry=0x591352d6b720,
> bs_name=bs_name@entry=0x591352d69900 "drive-scsi1", alias_map=alias_map@entry=0x0, errp=errp@entry=0x7c15839fafb8)
> #10 0x000059132c3ba23d in init_dirty_bitmap_migration (errp=<optimized out>, s=0x59132d87ee40 <dbm_state>)
> #11 dirty_bitmap_save_setup (f=0x591352ebdd30, opaque=0x59132d87ee40 <dbm_state>, errp=0x7c15839fafb8)
> #12 0x000059132c3d81f0 in qemu_savevm_state_setup (f=0x591352ebdd30, errp=errp@entry=0x7c15839fafb8)
Fix created using the appropriate in-tree coccinelle script:
spatch --in-place scripts/coccinelle/error-use-after-free.cocci migration/block-dirty-bitmap.c
The problematic change exposing the issue was part of 7882afe ("update
submodule and patches to QEMU 9.1.2") adapting to QEMU 9.1, commit
dd03167725 ("migration: Add Error** argument to
add_bitmaps_to_list()"), where the add_bitmaps_to_list() function
gained an error pointer argument, replacing the local error variable
that was used before.
Fixes: 7882afe ("update submodule and patches to QEMU 9.1.2")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most notabbly, there now is an upstream workaround for the "Windows
PCI Label bug" [0] and the revert of QEMU commit 44d975ef34 ("x86:
acpi: workaround Windows not handling name references in Package
properly") can be dropped.
Pick up some other fixes already merged in current master, for
emulation as well as x86(_64) KVM, some PCI/USB fixes and a pair of
regression fixes for the net subsystem.
[0]: https://gitlab.com/qemu-project/qemu/-/issues/774
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Notable changes:
* Commit 07bea2d35f ("block-backend: Remove deadcode") removed
blk_op_{,un}block_all() which was used by PVE async savevm code.
Fixed by switching to using bdrv_op_{,un}block_all().
* Drop patches that are already part of upstream.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Previously, the format would be auto-detected which can lead to
confusing raw images as a different format and being unable to detect
corrupted images of non-raw formats.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Proxmox VE always writes state files in raw format.
Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In commit a35f8577a0 ("include/hw: add macros for deprecation &
removal of versioned machines"), a new machine version deprecation and
removal policy was introduced. After only 3 years a machine version
will be deprecated while being removed after 6 years.
The deprecation is a bit early considering major PVE releases are
approximately every 2 years. This means that a deprecation warning can
already happen for a machine version that was introduced during the
previous major release. This would scare users for no good reason, so
avoid deprecating machine versions in PVE too early and define a
baseline of machine versions that will be supported throughout a
single major PVE release.
Reported-by: Martin Maurer <martin@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The variable has been unused since commit 67af0fa ("rebased pve
patches") back in 2017. There is no comment to why, but before that,
it was used to error out if there were no disks in the vma archive.
This should be possible however, so it's safe to assume this was an
intentional change.
This fixes compilation with clang19.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Pick up to stable fixes for virtio-net, one fixing multiqueue
initialization and one fixing potential out-of-bounds access (in the
work_around_broken_dhclient() hack that luckily seems to be
unreachable when 'vhost=on' is used for the device, which Proxmox VE
does except when running a non-native VM arch or if the vhost device
is not available).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Return values for qemu_savevm_state_setup() and blk_set_aio_context()
now get checked.
Move the qemu_coroutine_create() call to after the new early return
to avoid a potential memory leak.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Notable changes, most interestingly the two build system changes:
* avoid making 'migration' target depend on 'libproxmox_backup_qemu':
Having pbs-state.c be part of the 'migration_files' makes the
'migration' target depend on 'libproxmox_backup_qemu'. Adding the
dependency to 'migration' and 'libmigration' would not be enough
however, because pbs-state.c depends on savevm.c (for
register_savevm_live()), and savevm.c is not itself part of the
'migration_files' and would need to be moved too. Otherwise, linking
the 'test-xbzrle' unit test is broken. Instead, don't declare
pbs-state.c to be part of the 'migration_files'.
* meson: pbs-restore + vma: add qemuutil dependency explicitly
Both pbs-restore and vma use "qemu/osdep.h" so the dependency is
present. Being explicit is required after commit 414b180d42 ("meson:
Pass objects and dependencies to declare_dependency()").
* QAPI docs "Notes:" to ".. note::" conversion following commit
d461c27973 ("qapi: convert "Note" sections to plain rST").
* Removal of QERR_* macros following commit
a95921f171 ("qapi: Inline and remove QERR_DEVICE_HAS_NO_MEDIUM
definition") and friends.
* Signature change for .save_setup callbacks following commit
01c3ac681b ("migration: Add Error** argument to .save_setup()
handler").
* Removal of separate .bdrv_file_open callbacks following commit
44b424dc4a ("block: remove separate bdrv_file_open callback")
* Adapt dirty bitmap migration error handling following commit
dd03167725 ("migration: Add Error** argument to
add_bitmaps_to_list()")
* Adapt savevm async to removed block migration following commit
eef0bae3a7 ("migration: Remove block migration")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.
From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.
In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.
It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")
[0]: https://forum.proxmox.com/threads/153483
Reported-by: Mario Loderer <m.loderer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Mario Loderer <m.loderer@proxmox.com>
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Fix the two issues reported in the community forum[0][1], i.e.
regression in LSI-53c895a controller and ignored boot order for USB
storage (only possible via custom arguments in Proxmox VE), both
causing boot failures, and pick up fixes for VirtIO, ARM emulation,
char IO device and a graph lock fix for the block layer.
The block-copy patches that serve as a preparation for fleecing are
moved to the extra folder, because the graph lock fix requires them
to be present first. They have been applied upstream in the meantime
and should drop out with the rebase on 9.1.
[0]: https://forum.proxmox.com/threads/149772/post-679433
[1]: https://forum.proxmox.com/threads/149772/post-683459
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most relevant are some fixes for VirtIO and for ARM and i386
emulation. There also is a fix for VGA display to fix screen blanking,
which fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=4786
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
As reported in the community forum [0], cloning or importing images
to RBD storages (without the krbd setting) was broken. This is a
result of no filename parsing happening anymore in bdrv_open_child()
after commit b242e7f ("backport fix for CVE-2024-4467"), which the
zeroinit relied on for passing along the RBD filename+key-value pairs.
There is a dedicated function for opening the file child which still
does filename parsing. Use that for opening the file child. Role and
flags should still be the same as with the manual bdrv_open_child(),
because the zeroinit driver is a filter, and the assignment bs->file
is also done by bdrv_open_file_child().
Fixes: b242e7f ("backport fix for CVE-2024-4467")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[0]: https://forum.proxmox.com/threads/qemu-9-0-available-on-pve-no-subscription-as-of-now.149772/post-681620
FG: added missing link
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This prevents that malicious qcow2 images can already cause bad
effects if being queried via 'qemu-img info'.
For Proxmox VE, this is an additional safe guard, as currently it
directly creates and manages the qcow2 images used by VMs and does not
allow unprivileged users to import them.
Reference: https://access.redhat.com/security/cve/cve-2024-4467
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The 'status' pointer is dereferenced regardless of the NULL check,
i.e. 'status->closed' is accessed after the branch with the check.
Since all callers pass in the address of a struct on the stack, the
pointer can never be NULL. Remove the superfluous check and add an
assert instead.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
As reported in the community forum [0], doing a snapshot without
saving the VM state for a VM with a VirtIO block device with iothread
would lead to an assertion failure [1] and thus crash.
The issue is that vm_start() is called from the coroutine
qmp_savevm_end() which violates assumptions about graph locking down
the line. Factor out the part of qmp_savevm_end() that actually needs
to be a coroutine into a separate helper and turn qmp_savevm_end()
into a non-coroutine, so that it can call vm_start() safely.
The issue is likely not new, but was exposed by the recent graph
locking rework introducing stricter checks.
The issue does not occur when saving the VM state, because then the
non-coroutine process_savevm_finalize() will already call vm_start()
before qmp_savevm_end().
[0]: https://forum.proxmox.com/threads/149883/
[1]:
> #0 0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c)
> #1 0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2)
> #2 0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472)
> #3 0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395)
> #4 0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2)
> #5 0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3)
> #6 0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455)
> #7 0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb)
> #8 0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122)
> #9 0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90)
> #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927)
> #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4)
> #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517)
> #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b)
> #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd)
> #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749)
> #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8)
> #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98)
> #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e)
> #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b)
> #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
On ARM, gcc warns (-Werror=type-limits) that it will always be false
for the if statement. This is because here s->aid is defined as char,
while proxmox_restore_open_image() returns an int.
This is probably because chars are treated as unsigned on arm arch but
signed on x86 arch:
https://developer.arm.com/documentation/den0013/d/Porting/Miscellaneous-C-porting-issues/unsigned-char-and-signed-char
Make aid an explicit uint8_t, because that is the type for functions
taking the aid as a parameter, e.g. proxmox_restore_get_image_length().
Signed-off-by: Jing Luo <jing@jing.rocks>
[FE: slightly improve commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most importantly the first one "Revert "monitor: use
aio_co_reschedule_self()"", fixing a crash when doing hotplug+resize
with a disk using io_uring.
Other fixes (likely not too important) for TCG emulation of x86(_64)
and ARM.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most importantly, fix forwards and backwards migration with VirtIO-GPU
display.
Other fixes are for a regression in pflash device (introduced in 8.2)
and some fixes for x86(_64) TCG emulation. One of the patches needed
to be adapted, because it removed a helper that is still in use in
9.0.0.
There also is a revert for a fix in VirtIO PCI devices that turned out
to cause some issues, see the revert itself for more details.
Lastly, there is a change to move compatibility flags for a new
VirtIO-net feature to the correct machine type. The feature was
introduced in QEMU 8.2, but the compatibility flags got added to
machine version 8.0 instead of 8.1. This breaks backwards migration
with machine version 8.1 from a 8.2/9.0 binary to an 8.1 binary, in
cases where the guest kernel enables the feature (e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched
to a patched binary, Proxmox VE only ever had 8.2 on the test
repository and 9.0 not yet in any public repository. An upstream
developer suggested it is the proper fix [0]. Upstream submission [1].
[0]: https://lore.kernel.org/qemu-devel/CACGkMEtZrJuhof+hUGVRvLLQE+8nQE5XmSHpT0NAQ1EpnqfmsA@mail.gmail.com/T/#u
[1]: https://lore.kernel.org/qemu-devel/20240517075336.104091-1-f.ebner@proxmox.com/T/#u
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Biggest change is that AioContext locking got removed, but no changes
required other than dropping the calls to acquire and release it. As a
consequence, the single parameter for the bdrv_graph_wrlock() call got
removed which also required adaptation.
QAPI docs became stricter requiring to document all members.
Other minor changes:
- Single parameter from migration_is_running() was dropped.
- qemu_mutex_(un)lock_iothread() got renamed to bql_(un)lock().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>