1
0
mirror of https://github.com/OpenNebula/one.git synced 2024-12-25 23:21:29 +03:00
Commit Graph

79 Commits

Author SHA1 Message Date
Ruben S. Montero
e433ccb85b
F #5516: New backup interface for OpenNebula
co-authored-by: Frederick Borges <fborges@opennebula.io>
co-authored-by: Neal Hansen <nhansen@opennebula.io>
co-authored-by: Daniel Clavijo Coca <dclavijo@opennebula.io>
co-authored-by: Pavel Czerný <pczerny@opennebula.systems>

BACKUP INTERFACE
=================

* Backups are exposed through a a special Datastore (BACKUP_DS) and
  Image (BACKUP) types. These new types can only be used for backup'ing
  up VMs. This approach allows to:

  - Implement tier based backup policies (backups made on different
    locations).

  - Leverage access control and quota systems

  - Support differnt storage and backup technologies

* Backup interface for the VMs:

  - VM configures backups with BACKUP_CONFIG. This attribute can be set
    in the VM template or updated with updateconf API call. It can include:

    + BACKUP_VOLATILE: To backup or not volatile disks

    + FS_FREEZE: How the FS is freeze for running VMs (qemu-agent,
      suspend or none). When possible backups are crash consistent.

    + KEEP_LAST: keep only a given number of backups.

  - Backups are initiated by the one.vm.backup API call that requires
    the target Datastore to perform the backup (one-shot). This is
    exposed by the onevm backup command.

  - Backups can be periodic through scheduled actions.

  - Backup configuration is updated with one.vm.updateconf API call.

* Restore interface:

  - Restores are initiated by the one.image.restore API call. This is
    exposed by oneimage restore command.

  - Restore include configurable options for the VM template

    + NO_IP: to not preserve IP addresses (but keep the NICs and network
      mapping)

    + NO_NIC: to not preserve network mappings

  - Other template attributes:

    + Clean PCI devices, including network configuration in case of TYPE=NIC
    attributes. By default it removes SHORT_ADDRESS and leave the "auto"
    selection attributes.

    + Clean NUMA_NODE, removes node id and cpu sets. It keeps the NUMA node

  - It is possible to restore single files stored in the repository by
    using the backup specific URL.

* Sunstone (Ruby version) has been updated to expose this feautres.

BACKUP DRIVERS & IMPLEMENTATION
===============================

* Backup operation is implemented by a combination of 3 driver operations:

  - VMM. New (internal oned <-> one_vmm_exec.rb) to orchestrate
    backups for RUNNING VMs.

  - TM. This commit introduces 2 new operations (and their
    corresponding _live variants):

    + pre_backup(_live): Prepares the disks to be back'ed up in the
      repository. It is specific to the driver: (i) ceph uses the export
      operation; (ii) qcow2/raw uses snapshot-create-as and fs_freeze as
      needed.
    + post_backup(_live): Performs cleanning operations, i.e. KVM
      snapshots or tmp dirs.

  - DATASTORE. Each backup technology is represented by its
    corresponfing driver, that needs to implement:

    + backup: it takes the VM disks in file (qcow2) format and stores it
      the backup repository.

    + restore: it takes a backup image and restores the associated disks
      and VM template.

    + monitor: to gather available space in the repository

    + rm: to remove existing backups

    + stat: to return the "restored" size of a disk stored in a backup

    + downloader pseudo-URL handler: in the form
      <backup_proto>://<driver_snapshot_id>/<disk filename>

BACKUP MANAGEMENT
=================

Backup actions may potentially take some time, leaving some vmm_exec threads in
use for a long time, stucking other vmm operations. Backups are planned
by the scheduler through the sched action interface.

Two attributes has been added to sched.conf:
  * MAX_BACKUPS max active backup operations in the cloud. No more
    backups will be started beyond this limit.

  * MAX_BACKUPS_HOST max number of backups per host

* Fix onevm CLI to properly show and manage schedule actions. --schedule
  supports now, as well as relative times +<seconds_from_stime>

  onvm backup --schedule now -d 100 63

* Backup is added as VM_ADMIN_ACTIONS in oned.conf. Regular users needs
  to use the batch interface or request specific permissions

Internal restructure of Scheduler:

- All sched_actions interface is now in SchedActionsXML class and files.
  This class uses references to VM XML, and MUST be used in the same
  lifetime scope.

- XMLRPC API calls for sched actions has been moved to ScheduledActionXML.cc as
  static functions.

- VirtualMachineActionPool includes counters for active backups (total
  and per host).

SUPPORTED PLATFORMS
====================
* hypervisor: KVM
* TM: qcow2/shared/ssh, ceph
* backup: restic, rsync

Notes on Ceph

* Ceph backups are performed in the following steps:
    1. A snapshot of each disk is taken (group snapshots cannot be used as
       it seems we cannot export the disks afterwards)
    2. Disks are export to a file
    3. File is converted to qcow2 format
    4. Disk files are upload to the backup repo

TODO:
  * Confirm crash consistent snapshots cannot be used in Ceph

TODO:
  * Check if using VM dir instead of full path is better to accomodate
    DS migrations i.e.:
    - Current path: /var/lib/one/datastores/100/53/backup/disk.0
    - Proposal: 53/backup/disk.0

RESTIC DRIVER
=============
Developed together with this feature is part of the EE edtion.

* It supports the SFTP protocol, the following attributes are
  supported:

  - RESTIC_SFTP_SERVER
  - RESTIC_SFTP_USER: only if different from oneadmin
  - RESTIC_PASSWORD
  - RESTIC_IONICE: Run restic under a given ionice priority (class 2)
  - RESTIC_NICE: Run restic under a given nice
  - RESTIC_BWLIMIT: Limit restic upload/download BW
  - RESTIC_COMPRESSION: Restic 0.14 implements compression (three modes:
    off, auto, max). This requires repositories version 2. By default,
    auto is used (average compression without to much CPU usage)
  - RESTIC_CONNECTIONS: Sets the number of concurrent connections to a
    backend (5 by default). For high-latency backends this number can be
    increased.

* downloader URL: restic://<datastore_id>/<snapshot_id>/<file_name>
  snapshot_id is the restic snapshot hash. To recover single disk images
  from a backup. This URLs support:

  - RESTIC_CONNECTIONS
  - RESTIC_BWLIMIT
  - RESTIC_IONICE
  - RESTIC_NICE

  These options needs to be defined in the associated datastore.

RSYNC DRIVER
=============
A rsync driver is included as part of the CE distribution. It uses the
rsync tool to store backups in a remote server through SSH:

* The following attributes are supported to configure the backup
  datastore:

  - RSYNC_HOST
  - RSYNC_USER
  - RSYNC_ARGS: Arguments to perform the rsync operatin (-aS by default)

* downloader URL: rsync://<ds_id>/<vmid>/<hash>/<file> can be used to recover
  single files from an existing backup. (RSYNC_HOST and RSYN_USER needs
  to be set in ds_id

EMULATOR_CPUS
=============

This commit includes a non related backup feature:

* Add EMULATOR_CPUS (KVM). This host (or cluster attribute) defines the
  CPU IDs where the emulator threads will be pinned. If this value is
  not defined the allocated CPU wll be used when using a PIN policy.

(cherry picked from commit a9e6a8e000e9a5a2f56f80ce622ad9ffc9fa032b)

F OpenNebula/one#5516: adding rsync backup driver

(cherry picked from commit fb52edf5d009dc02b071063afb97c6519b9e8305)

F OpenNebula/one#5516: update install.sh, add vmid to source, some polish

Signed-off-by: Neal Hansen <nhansen@opennebula.io>
(cherry picked from commit 6fc6f8a67e435f7f92d5c40fdc3d1c825ab5581d)

F OpenNebula/one#5516: cleanup

Signed-off-by: Neal Hansen <nhansen@opennebula.io>
(cherry picked from commit 12f4333b833f23098142cd4762eb9e6c505e1340)

F OpenNebula/one#5516: update downloader, default args, size check

Signed-off-by: Neal Hansen <nhansen@opennebula.io>
(cherry picked from commit 510124ef2780a4e2e8c3d128c9a42945be38a305)

LL

(cherry picked from commit d4fcd134dc293f2b862086936db4d552792539fa)
2022-10-07 22:01:37 +02:00
Ruben S. Montero
7a1a85edb6
B #5867: Fix quotas and fsck
co-author: Pavel Czerny <pczerny@opennebula.systems>
(cherry picked from commit e2d4141599)
2022-06-20 18:35:40 +02:00
Tino Vazquez
f7d53e75ff M #-: Bump version 6.3.85 2022-04-07 19:49:58 +02:00
Vlastimil Holer
096754b63f
M #-: Bump year to 2021 (#778) 2021-02-09 16:07:56 +01:00
Pavel Czerný
7ba1bbe633
F #1660: Hotplug VCPU and mem for KVM (#392)
Co-authored-by: Ruben S. Montero <rsmontero@opennebula.org>
2020-11-17 11:24:52 +01:00
Ruben S. Montero
325db91bcb
F #4936:Refactor ActionManager and Timers
co-authored-by: Pavel Czerny <pczerny@opennebula.systems>
2020-07-24 16:00:59 +02:00
Pavel Czerný
daaf132a43
F #4936: Remove 'using namespace std' from headers (#60) 2020-07-02 22:42:10 +02:00
Vlastimil Holer
f3c50a5d89
M #-: Year bump to 2020 (#4634) 2020-04-30 15:00:02 +02:00
Ruben S. Montero
a6481bb038
F #1764: updateconf for running VM
L #-: Use nullptr, cpplint rules

co-authored-by: Pavel Czerny <pczerny@opennebula.systems>
2019-07-26 13:45:26 +02:00
Christian González
d9fc1fffce F #2138: Fix bug for poff and poff-hard migrate 2019-02-05 12:26:31 +01:00
Vlastimil Holer
441cf1f7f9 Bump version to 5.7.85, year to 2019 2019-01-16 11:47:59 +01:00
Ruben S. Montero
b89642aaab F #2138: Add options to migrate through a power-off life-cycle
Co-authored-by: Christian González <cgonzalez@opennebula.systems>
2018-12-04 12:00:13 +01:00
Vlastimil Holer
a4c0447ccf Bump year to 2018 (#1623) 2018-01-02 18:27:37 +01:00
Javi Fontan
8de979e42c Bump version 5.3.80 2017-05-25 16:07:35 +02:00
Ruben S. Montero
62d9ec2b39 F #5005 History records now include the UID/GID/REQUEST_ID that closed
the record
2017-02-09 16:58:47 +01:00
Ruben S. Montero
04e4991d4d F #5005: Fix action queue. Update new action classes 2017-02-08 12:24:42 +01:00
Ruben S. Montero
d143012eb6 F #5005: Add request information to events and callbacks from API calls. 2017-02-07 17:26:23 +01:00
Ruben S. Montero
7a2face60c F #5005: New interface of ActionManager to accommodate additional arguments. Updates managers to the new interface 2017-02-03 20:39:34 +01:00
Ruben S. Montero
9297321d91 F #4393: New VirtualMachineDisk interface to abstract all disk
management logic. Adapted classes to new interface and re-allocated some
functions. Work on disk resize operations.
2016-12-12 02:28:00 +01:00
Javi Fontan
e1f6dee180 Update copyright notice year 2016-05-04 12:33:23 +02:00
Ruben S. Montero
058e23c37a feature #3801, bug #3775: Delete operation is now under the recover
interface for admin. New terminate operation can be used in any "final"
state for end-users.
2016-05-02 18:34:42 +02:00
Carlos Martín
0d6dd0c6ff Feature #4400: New state cloning_failure 2016-04-22 16:06:43 +02:00
Carlos Martín
7855ebd22d Feature #4400: Allow VMs to use Images in the locked state 2016-04-19 15:20:34 +02:00
Ruben S. Montero
784a4fc960 feature #2980: VNC port tracking. Includes a VNC bitmap for each cluster to
track the ports in use in the cluster and avoid port collision. VNC ports are
assigned when the VM is deployed and released when the VM is stopped, undeployed
or done.

Includes the following:
  - 9da66150dc0e3dc2731518d8a215f9598696a999
  - 4c35a9fcccf70cbe87d2947403ea815967e7b605
  - ccfccb6d2fc40aa1c07eb994f37b8da4fb479082
  - b1b64e61a39f4452c7ba00e581de42888e0e84a5
  - d474ee4db9ed520bcae743d510be35b25ea988ed
  - dacb61b1402da2ec309b6e79bdd285d0d11de84f
2016-04-05 12:56:34 +02:00
Ruben S. Montero
9e4af1ebc6 feature #3204: Security Group dynamic update. Add support to update SG
rules of running VMs though a new one.secgroup.commit operation
2016-03-01 23:31:31 +01:00
Jaime Melis
e20fb5c4c4 Fix copyright in banners 2015-09-23 16:51:10 +02:00
Carlos Martín
ecb8d5d528 Feature #3782: Disk-snapshot actions operate on persistent images
For persisten images, quota usage is taken from the image owner.

The image snapshots are now in sync during the VM lifecycle, not
just when the VM is shutdown. This was needed because the quota
size is now the image size + image quotas, and it caused problems
if the owner was changed with snapshots pending to be saved back
to the image.
2015-07-17 18:35:52 +02:00
Javi Fontan
4b08d76fbf Whitespace cleanup 2015-07-01 15:18:28 -04:00
Ruben S. Montero
630e036005 feature #3782: Removed _hot from saveas functions 2015-06-10 12:53:55 +02:00
Ruben S. Montero
9bb5770f86 feature #3782: Snapshot event triggers 2015-05-21 14:52:36 +02:00
Ruben S. Montero
904b3248e9 feature #3782: Remove SOURCE and PARENT from snapshots. Drivers MUST
generate internal snapshot references from snapshot_id, parent_id
2015-05-21 14:12:35 +02:00
Carlos Martín
3eaffe16d6 Feature #3654: Refactor LCM to init pointers to managers only once 2015-04-24 16:27:08 +02:00
Carlos Martín
ae7fab2ff8 Feature #3654: Merge the cancel and shutdown states 2015-04-24 15:25:47 +02:00
Carlos Martín
4179a2f329 Feature #3654: Refactor LCM::cancel
(cherry picked from commit 7e3869be0aac24e267cc796f79afc472d6098099)
2015-04-23 14:55:34 +02:00
Ruben S. Montero
b5da40364f feature #3654: Add retry option to the recover API calls and onevm command 2015-04-21 17:15:10 +02:00
Ruben S. Montero
44ab9678a3 feature #3564: Remove FAILED and FAILURE states 2015-04-21 14:00:52 +02:00
Ruben S. Montero
dd3edff65b feature #3654: Recover through retry 2015-04-12 21:20:28 +02:00
Carlos Martín
757f908ff5 Feature #2065: Attach disk for VMs in poweroff 2015-03-18 16:28:43 +01:00
Carlos Martín
d1ca5081e0 Feature #2065: Attach disk for VMs in poweroff 2015-03-18 12:51:01 +01:00
Jaime Melis
8d00b74177 Change year to 2015 2015-02-24 12:27:59 +01:00
Ruben S. Montero
d6abae89db bug #3212: Move VMs to poweroff if the were running last time we got monitor data from the host. Move VMs to running if they are recovered. Remove unused code for vmware_driver 2014-10-09 16:24:24 +02:00
Ruben S. Montero
a2a54a5f92 feature #2452: Move lost VMs in hypervisor to power off state 2014-06-22 07:58:22 +02:00
Jaime Melis
3004e3c055 Bug #2645: A VM disk detach leaves Images to be saved-as in LOCKED state 2014-02-17 15:47:46 +01:00
Jaime Melis
11520021f7 Bump to version 4.5.0 2014-01-09 11:51:20 +01:00
Ruben S. Montero
fbbeaefc82 feature #2009: Implementation of recover XML-RPC method and Core functionality
(cherry picked from commit 051a575f1d8bbc8c78d96ccb426c2cd4de29bdb1)
2013-05-17 01:11:00 +02:00
Ruben S. Montero
72e68b0703 Merge branch 'feature-1835' into one-4.0
Conflicts:
	src/lcm/LifeCycleStates.cc
	src/vmm/VirtualMachineManager.cc
2013-04-03 17:11:16 +02:00
Carlos Martín
728d043574 Feature #1839: Rename shutdown-save to undeploy 2013-04-02 17:01:22 +02:00
Carlos Martín
675d936741 Feature #1839: New action shutdown-save(-hard) in core 2013-03-27 18:15:53 +01:00
Carlos Martín
2589ab86fe Feature #1835: Implement poweroff --hard in core 2013-03-27 13:48:06 +01:00
Carlos Martín
bbd2274c87 Feature #1791: Avoid deadlock 2013-03-14 18:03:15 +01:00