7dcbe69a87
When walking the file system tree, check for each entry if it is reusable, meaning that the metadata did not change and the payload chunks can be reindexed instead of reencoding the whole data. If the metadata matched, the range of the dynamic index entries for that file are looked up in the previous payload data index. Use the range and possible padding introduced by partial reuse of chunks to decide whether to reuse the dynamic entries and encode the file payloads as payload reference right away or cache the entry for now and keep looking ahead. If however a non-reusable (because changed) entry is encountered before the padding threshold is reached, the entries on the cache are flushed to the archive by reencoding them, resetting the cached state. Reusable chunk digests and size as well as reference offsets to the start of regular files payloads within the payload stream are injected into the backup stream by sending them to the chunker via a dedicated channel, forcing a chunk boundary and inserting the chunks. If the threshold value for reuse is reached, the chunks are injected in the payload stream and the references with the corresponding offsets encoded in the metadata stream. Since multiple files might be contained within a single chunk, it is assured that the deduplication of chunks is performed, by keeping back the last chunk, so following files might as well reuse that same chunk without double indexing it. It is assured that this chunk is injected in the stream also in case that the following lookups lead to a cache clear and reencoding. Directory boundaries are cached as well, and written as part of the encoding when flushing. Signed-off-by: Christian Ebner <c.ebner@proxmox.com> |
||
---|---|---|
.. | ||
src | ||
Cargo.toml |