From c51f0d5e8dee8e3e970f47a8f611e76fd28e764e Mon Sep 17 00:00:00 2001 From: Christian Ebner Date: Tue, 26 Mar 2024 10:57:13 +0100 Subject: [PATCH] docs: add section describing change detection mode Describe the motivation and basic principle of the clients change detection mode and show an example invocation. Signed-off-by: Christian Ebner --- docs/backup-client.rst | 47 +++++++++++++++++++++++++++++++++++++ docs/technical-overview.rst | 3 +++ 2 files changed, 50 insertions(+) diff --git a/docs/backup-client.rst b/docs/backup-client.rst index 00a1abbb3..e541c5537 100644 --- a/docs/backup-client.rst +++ b/docs/backup-client.rst @@ -280,6 +280,53 @@ Multiple paths can be excluded like this: # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust +.. _client_change_detection_mode: + +Change Detection Mode +~~~~~~~~~~~~~~~~~~~~~ + +File-based backups containing a lot of data can take a long time, as the default +behavior for the Proxmox backup client is to read all data and encode it into a +pxar archive. +The encoded stream is split into variable sized chunks. For each chunk, a digest +is calculated and used to decide whether the chunk needs to be uploaded or can +be indexed without upload, as it is already available on the server (and +therefore deduplicated). If the backed up files are largely unchanged, +re-reading and then detecting the corresponding chunks don't need to be uploaded +after all is time consuming and undesired. + +The backup client's `change-detection-mode` can be switched from default to +`metadata` based detection to reduce limitations as described above, instructing +the client to avoid re-reading files with unchanged metadata whenever possible. +When using this mode, instead of the regular pxar archive, the backup snapshot +is stored into two separate files: the `mpxar` containing the archive's metadata +and the `ppxar` containing a concatenation of the file contents. This splitting +allows for efficient metadata lookups. + +Using the `change-detection-mode` set to `data` allows to create the same split +archive as when using the `metadata` mode, but without using a previous +reference and therefore reencoding all file payloads. +When creating the backup archives, the current file metadata is compared to the +one looked up in the previous `mpxar` archive. +The metadata comparison includes file size, file type, ownership and permission +information, as well as acls and attributes and most importantly the file's +mtime, for details see the +:ref:`pxar metadata archive format `. + +If unchanged, the entry is cached for possible re-use of content chunks without +re-reading, by indexing the already present chunks containing the contents from +the previous backup snapshot. Since the file might only partially re-use chunks +(thereby introducing wasted space in the form of padding), the decision whether +to re-use or re-encode the currently cached entries is postponed to when enough +information is available, comparing the possible padding to a threshold value. + +The following shows an example for the client invocation with the `metadata` +mode: + +.. code-block:: console + + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata + .. _client_encryption: Encryption diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst index 89835a7cc..a8b1c7268 100644 --- a/docs/technical-overview.rst +++ b/docs/technical-overview.rst @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes When uploading an index, the client first has to read the source data, chunk it and send the data as chunks with their identifying checksum to the server. +When using the :ref:`change detection mode ` payload +chunks for unchanged files are reused from the previous snapshot, thereby not +reading the source data again. If there is a previous Snapshot in the backup group, the client can first download the chunk list of the previous Snapshot. If it detects a chunk that