1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-03 05:18:29 +03:00
lvm2/doc/vdo.md
2018-01-25 11:15:23 +01:00

3.1 KiB

VDO - Compression and deduplication.

Currently device stacking looks like this:

Physical x [multipath] x [partition] x [mdadm] x [LUKS] x [LVS] x [LUKS] x [FS|Database|...]

Adding VDO:

Physical x [multipath] x [partition] x [mdadm] x [LUKS] x [LVS] x [LUKS] x VDO x [LVS] x [FS|Database|...]

Where VDO fits (and where it does not):

Backing devices for VDO volumes:

  1. Physical x [multipath] x [partition] x [mdadm],
  2. LUKS over (1) - full disk encryption.
  3. LVs (raids|mirror|stripe|linear) x [cache] over (1).
  4. LUKS over (3) - especially when using raids.

Usual limitations apply:

  • Never layer LUKS over another LUKS - it makes no sense.
  • LUKS is better over the raids, than under.

Using VDO as a PV:

  1. under tpool
    • The best fit - it will deduplicate additional redundancies among all snapshots and will reduce the footprint.
    • Risks: Resize! dmevent will not be able to handle resizing of tpool ATM.
  2. under corig
    • Cache fits better under VDO device - it will reduce amount of data, and deduplicate, so there should be more hits.
    • This is useful to keep the most frequently used data in cache uncompressed (if that happens to be a bottleneck.)
  3. under (multiple) linear LVs - e.g. used for VMs.

And where VDO does not fit:

  • never use VDO under LUKS volumes
    • these are random data and do not compress nor deduplicate well,
  • never use VDO under cmeta and tmeta LVs
    • these are random data and do not compress nor deduplicate well,
  • under raids
    • raid{4,5,6} scrambles data, so they do not deduplicate well,
    • raid{1,4,5,6,10} also causes amount of data grow, so more (duplicit in case of raid{1,10}) work has to be done in order to find less duplicates.

And where it could be useful:

  • under snapshot CoW device - when there are multiple of those it could deduplicate

Things to decide

  • under integrity devices - it should work - mostly for data
    • hash is not compressible and unique - it makes sense to have separate imeta and idata volumes for integrity devices

Future Integration of VDO into LVM:

One issue is using both LUKS and RAID under VDO. We have two options:

  • use mdadm x LUKS x VDO+LV
  • use LV RAID x LUKS x VDO+LV - still requiring recursive LVs.

Another issue is duality of VDO - it is a top level LV but it can be seen as a "pool" for multiple devices.

  • This is one usecase which could not be handled by LVM at the moment.
  • Size of the VDO is its physical size and virtual size - just like tpool. - same problems with virtual vs physical size - it can get full, without exposing it fo a FS

Another possible RFE is to split data and metadata:

  • e.g. keep data on HDD and metadata on SSD

Issues / Testing

  • fstrim/discard pass down - does it work with VDO?
  • VDO can run in synchronous vs. asynchronous mode
    • synchronous for devices where write is safe after it is confirmed. Some devices are lying.
    • asynchronous for devices requiring flush
  • multiple devices under VDO - need to find common options
  • pvmove - changing characteristics of underlying device
  • autoactivation during boot
    • Q: can we use VDO for RootFS?