docs: add section describing change detection mode

Describe the motivation and basic principle of the clients change
detection mode and show an example invocation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
This commit is contained in:
Christian Ebner 2024-03-26 10:57:13 +01:00 committed by Fabian Grünbichler
parent 5cff9c6fe8
commit c51f0d5e8d
2 changed files with 50 additions and 0 deletions

View File

@ -280,6 +280,53 @@ Multiple paths can be excluded like this:
# proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
.. _client_change_detection_mode:
Change Detection Mode
~~~~~~~~~~~~~~~~~~~~~
File-based backups containing a lot of data can take a long time, as the default
behavior for the Proxmox backup client is to read all data and encode it into a
pxar archive.
The encoded stream is split into variable sized chunks. For each chunk, a digest
is calculated and used to decide whether the chunk needs to be uploaded or can
be indexed without upload, as it is already available on the server (and
therefore deduplicated). If the backed up files are largely unchanged,
re-reading and then detecting the corresponding chunks don't need to be uploaded
after all is time consuming and undesired.
The backup client's `change-detection-mode` can be switched from default to
`metadata` based detection to reduce limitations as described above, instructing
the client to avoid re-reading files with unchanged metadata whenever possible.
When using this mode, instead of the regular pxar archive, the backup snapshot
is stored into two separate files: the `mpxar` containing the archive's metadata
and the `ppxar` containing a concatenation of the file contents. This splitting
allows for efficient metadata lookups.
Using the `change-detection-mode` set to `data` allows to create the same split
archive as when using the `metadata` mode, but without using a previous
reference and therefore reencoding all file payloads.
When creating the backup archives, the current file metadata is compared to the
one looked up in the previous `mpxar` archive.
The metadata comparison includes file size, file type, ownership and permission
information, as well as acls and attributes and most importantly the file's
mtime, for details see the
:ref:`pxar metadata archive format <pxar-meta-format>`.
If unchanged, the entry is cached for possible re-use of content chunks without
re-reading, by indexing the already present chunks containing the contents from
the previous backup snapshot. Since the file might only partially re-use chunks
(thereby introducing wasted space in the form of padding), the decision whether
to re-use or re-encode the currently cached entries is postponed to when enough
information is available, comparing the possible padding to a threshold value.
The following shows an example for the client invocation with the `metadata`
mode:
.. code-block:: console
# proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
.. _client_encryption:
Encryption

View File

@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes
When uploading an index, the client first has to read the source data, chunk it
and send the data as chunks with their identifying checksum to the server.
When using the :ref:`change detection mode <change_detection_mode>` payload
chunks for unchanged files are reused from the previous snapshot, thereby not
reading the source data again.
If there is a previous Snapshot in the backup group, the client can first
download the chunk list of the previous Snapshot. If it detects a chunk that