mirror of
git://sourceware.org/git/lvm2.git
synced 2025-01-02 01:18:26 +03:00
kernel docs: Refresh kernel target documentation
Update the packaged copy of the in-kernel target documentation files. Adds dm-verity, updates thin provisioning and makes minor corrections elsewhere.
This commit is contained in:
parent
461eb1ac6a
commit
da42ee3a1f
@ -3,7 +3,7 @@ Introduction
|
|||||||
|
|
||||||
The more-sophisticated device-mapper targets require complex metadata
|
The more-sophisticated device-mapper targets require complex metadata
|
||||||
that is managed in kernel. In late 2010 we were seeing that various
|
that is managed in kernel. In late 2010 we were seeing that various
|
||||||
different targets were rolling their own data strutures, for example:
|
different targets were rolling their own data structures, for example:
|
||||||
|
|
||||||
- Mikulas Patocka's multisnap implementation
|
- Mikulas Patocka's multisnap implementation
|
||||||
- Heinz Mauelshagen's thin provisioning target
|
- Heinz Mauelshagen's thin provisioning target
|
||||||
|
@ -28,7 +28,7 @@ The target is named "raid" and it accepts the following parameters:
|
|||||||
raid6_nc RAID6 N continue
|
raid6_nc RAID6 N continue
|
||||||
- rotating parity N (right-to-left) with data continuation
|
- rotating parity N (right-to-left) with data continuation
|
||||||
|
|
||||||
Refererence: Chapter 4 of
|
Reference: Chapter 4 of
|
||||||
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
|
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
|
||||||
|
|
||||||
<#raid_params>: The number of parameters that follow.
|
<#raid_params>: The number of parameters that follow.
|
||||||
|
@ -9,15 +9,14 @@ devices in parallel.
|
|||||||
|
|
||||||
Parameters: <num devs> <chunk size> [<dev path> <offset>]+
|
Parameters: <num devs> <chunk size> [<dev path> <offset>]+
|
||||||
<num devs>: Number of underlying devices.
|
<num devs>: Number of underlying devices.
|
||||||
<chunk size>: Size of each chunk of data. Must be a power-of-2 and at
|
<chunk size>: Size of each chunk of data. Must be at least as
|
||||||
least as large as the system's PAGE_SIZE.
|
large as the system's PAGE_SIZE.
|
||||||
<dev path>: Full pathname to the underlying block-device, or a
|
<dev path>: Full pathname to the underlying block-device, or a
|
||||||
"major:minor" device-number.
|
"major:minor" device-number.
|
||||||
<offset>: Starting sector within the device.
|
<offset>: Starting sector within the device.
|
||||||
|
|
||||||
One or more underlying devices can be specified. The striped device size must
|
One or more underlying devices can be specified. The striped device size must
|
||||||
be a multiple of the chunk size and a multiple of the number of underlying
|
be a multiple of the chunk size multiplied by the number of underlying devices.
|
||||||
devices.
|
|
||||||
|
|
||||||
|
|
||||||
Example scripts
|
Example scripts
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
This document descibes a collection of device-mapper targets that
|
This document describes a collection of device-mapper targets that
|
||||||
between them implement thin-provisioning and snapshots.
|
between them implement thin-provisioning and snapshots.
|
||||||
|
|
||||||
The main highlight of this implementation, compared to the previous
|
The main highlight of this implementation, compared to the previous
|
||||||
@ -75,10 +75,12 @@ less sharing than average you'll need a larger-than-average metadata device.
|
|||||||
|
|
||||||
As a guide, we suggest you calculate the number of bytes to use in the
|
As a guide, we suggest you calculate the number of bytes to use in the
|
||||||
metadata device as 48 * $data_dev_size / $data_block_size but round it up
|
metadata device as 48 * $data_dev_size / $data_block_size but round it up
|
||||||
to 2MB if the answer is smaller. The largest size supported is 16GB.
|
to 2MB if the answer is smaller. If you're creating large numbers of
|
||||||
|
snapshots which are recording large amounts of change, you may find you
|
||||||
|
need to increase this.
|
||||||
|
|
||||||
If you're creating large numbers of snapshots which are recording large
|
The largest size supported is 16GB: If the device is larger,
|
||||||
amounts of change, you may need find you need to increase this.
|
a warning will be issued and the excess space will not be used.
|
||||||
|
|
||||||
Reloading a pool table
|
Reloading a pool table
|
||||||
----------------------
|
----------------------
|
||||||
@ -167,6 +169,38 @@ ii) Using an internal snapshot.
|
|||||||
|
|
||||||
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
|
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
|
||||||
|
|
||||||
|
External snapshots
|
||||||
|
------------------
|
||||||
|
|
||||||
|
You can use an external _read only_ device as an origin for a
|
||||||
|
thinly-provisioned volume. Any read to an unprovisioned area of the
|
||||||
|
thin device will be passed through to the origin. Writes trigger
|
||||||
|
the allocation of new blocks as usual.
|
||||||
|
|
||||||
|
One use case for this is VM hosts that want to run guests on
|
||||||
|
thinly-provisioned volumes but have the base image on another device
|
||||||
|
(possibly shared between many VMs).
|
||||||
|
|
||||||
|
You must not write to the origin device if you use this technique!
|
||||||
|
Of course, you may write to the thin device and take internal snapshots
|
||||||
|
of the thin volume.
|
||||||
|
|
||||||
|
i) Creating a snapshot of an external device
|
||||||
|
|
||||||
|
This is the same as creating a thin device.
|
||||||
|
You don't mention the origin at this stage.
|
||||||
|
|
||||||
|
dmsetup message /dev/mapper/pool 0 "create_thin 0"
|
||||||
|
|
||||||
|
ii) Using a snapshot of an external device.
|
||||||
|
|
||||||
|
Append an extra parameter to the thin target specifying the origin:
|
||||||
|
|
||||||
|
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
|
||||||
|
|
||||||
|
N.B. All descendants (internal snapshots) of this snapshot require the
|
||||||
|
same extra origin parameter.
|
||||||
|
|
||||||
Deactivation
|
Deactivation
|
||||||
------------
|
------------
|
||||||
|
|
||||||
@ -189,7 +223,13 @@ i) Constructor
|
|||||||
<low water mark (blocks)> [<number of feature args> [<arg>]*]
|
<low water mark (blocks)> [<number of feature args> [<arg>]*]
|
||||||
|
|
||||||
Optional feature arguments:
|
Optional feature arguments:
|
||||||
- 'skip_block_zeroing': skips the zeroing of newly-provisioned blocks.
|
|
||||||
|
skip_block_zeroing: Skip the zeroing of newly-provisioned blocks.
|
||||||
|
|
||||||
|
ignore_discard: Disable discard support.
|
||||||
|
|
||||||
|
no_discard_passdown: Don't pass discards down to the underlying
|
||||||
|
data device, but just remove the mapping.
|
||||||
|
|
||||||
Data block size must be between 64KB (128 sectors) and 1GB
|
Data block size must be between 64KB (128 sectors) and 1GB
|
||||||
(2097152 sectors) inclusive.
|
(2097152 sectors) inclusive.
|
||||||
@ -237,16 +277,6 @@ iii) Messages
|
|||||||
|
|
||||||
Deletes a thin device. Irreversible.
|
Deletes a thin device. Irreversible.
|
||||||
|
|
||||||
trim <dev id> <new size in sectors>
|
|
||||||
|
|
||||||
Delete mappings from the end of a thin device. Irreversible.
|
|
||||||
You might want to use this if you're reducing the size of
|
|
||||||
your thinly-provisioned device. In many cases, due to the
|
|
||||||
sharing of blocks between devices, it is not possible to
|
|
||||||
determine in advance how much space 'trim' will release. (In
|
|
||||||
future a userspace tool might be able to perform this
|
|
||||||
calculation.)
|
|
||||||
|
|
||||||
set_transaction_id <current id> <new id>
|
set_transaction_id <current id> <new id>
|
||||||
|
|
||||||
Userland volume managers, such as LVM, need a way to
|
Userland volume managers, such as LVM, need a way to
|
||||||
@ -257,12 +287,23 @@ iii) Messages
|
|||||||
the current transaction id is when you change it with this
|
the current transaction id is when you change it with this
|
||||||
compare-and-swap message.
|
compare-and-swap message.
|
||||||
|
|
||||||
|
reserve_metadata_snap
|
||||||
|
|
||||||
|
Reserve a copy of the data mapping btree for use by userland.
|
||||||
|
This allows userland to inspect the mappings as they were when
|
||||||
|
this message was executed. Use the pool's status command to
|
||||||
|
get the root block associated with the metadata snapshot.
|
||||||
|
|
||||||
|
release_metadata_snap
|
||||||
|
|
||||||
|
Release a previously reserved copy of the data mapping btree.
|
||||||
|
|
||||||
'thin' target
|
'thin' target
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
i) Constructor
|
i) Constructor
|
||||||
|
|
||||||
thin <pool dev> <dev id>
|
thin <pool dev> <dev id> [<external origin dev>]
|
||||||
|
|
||||||
pool dev:
|
pool dev:
|
||||||
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
|
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
|
||||||
@ -271,6 +312,11 @@ i) Constructor
|
|||||||
the internal device identifier of the device to be
|
the internal device identifier of the device to be
|
||||||
activated.
|
activated.
|
||||||
|
|
||||||
|
external origin dev:
|
||||||
|
an optional block device outside the pool to be treated as a
|
||||||
|
read-only snapshot origin: reads to unprovisioned areas of the
|
||||||
|
thin target will be mapped to this device.
|
||||||
|
|
||||||
The pool doesn't store any size against the thin devices. If you
|
The pool doesn't store any size against the thin devices. If you
|
||||||
load a thin target that is smaller than you've been using previously,
|
load a thin target that is smaller than you've been using previously,
|
||||||
then you'll have no access to blocks mapped beyond the end. If you
|
then you'll have no access to blocks mapped beyond the end. If you
|
||||||
|
155
doc/kernel/verity.txt
Normal file
155
doc/kernel/verity.txt
Normal file
@ -0,0 +1,155 @@
|
|||||||
|
dm-verity
|
||||||
|
==========
|
||||||
|
|
||||||
|
Device-Mapper's "verity" target provides transparent integrity checking of
|
||||||
|
block devices using a cryptographic digest provided by the kernel crypto API.
|
||||||
|
This target is read-only.
|
||||||
|
|
||||||
|
Construction Parameters
|
||||||
|
=======================
|
||||||
|
<version> <dev> <hash_dev>
|
||||||
|
<data_block_size> <hash_block_size>
|
||||||
|
<num_data_blocks> <hash_start_block>
|
||||||
|
<algorithm> <digest> <salt>
|
||||||
|
|
||||||
|
<version>
|
||||||
|
This is the type of the on-disk hash format.
|
||||||
|
|
||||||
|
0 is the original format used in the Chromium OS.
|
||||||
|
The salt is appended when hashing, digests are stored continuously and
|
||||||
|
the rest of the block is padded with zeros.
|
||||||
|
|
||||||
|
1 is the current format that should be used for new devices.
|
||||||
|
The salt is prepended when hashing and each digest is
|
||||||
|
padded with zeros to the power of two.
|
||||||
|
|
||||||
|
<dev>
|
||||||
|
This is the device containing data, the integrity of which needs to be
|
||||||
|
checked. It may be specified as a path, like /dev/sdaX, or a device number,
|
||||||
|
<major>:<minor>.
|
||||||
|
|
||||||
|
<hash_dev>
|
||||||
|
This is the device that supplies the hash tree data. It may be
|
||||||
|
specified similarly to the device path and may be the same device. If the
|
||||||
|
same device is used, the hash_start should be outside the configured
|
||||||
|
dm-verity device.
|
||||||
|
|
||||||
|
<data_block_size>
|
||||||
|
The block size on a data device in bytes.
|
||||||
|
Each block corresponds to one digest on the hash device.
|
||||||
|
|
||||||
|
<hash_block_size>
|
||||||
|
The size of a hash block in bytes.
|
||||||
|
|
||||||
|
<num_data_blocks>
|
||||||
|
The number of data blocks on the data device. Additional blocks are
|
||||||
|
inaccessible. You can place hashes to the same partition as data, in this
|
||||||
|
case hashes are placed after <num_data_blocks>.
|
||||||
|
|
||||||
|
<hash_start_block>
|
||||||
|
This is the offset, in <hash_block_size>-blocks, from the start of hash_dev
|
||||||
|
to the root block of the hash tree.
|
||||||
|
|
||||||
|
<algorithm>
|
||||||
|
The cryptographic hash algorithm used for this device. This should
|
||||||
|
be the name of the algorithm, like "sha1".
|
||||||
|
|
||||||
|
<digest>
|
||||||
|
The hexadecimal encoding of the cryptographic hash of the root hash block
|
||||||
|
and the salt. This hash should be trusted as there is no other authenticity
|
||||||
|
beyond this point.
|
||||||
|
|
||||||
|
<salt>
|
||||||
|
The hexadecimal encoding of the salt value.
|
||||||
|
|
||||||
|
Theory of operation
|
||||||
|
===================
|
||||||
|
|
||||||
|
dm-verity is meant to be set up as part of a verified boot path. This
|
||||||
|
may be anything ranging from a boot using tboot or trustedgrub to just
|
||||||
|
booting from a known-good device (like a USB drive or CD).
|
||||||
|
|
||||||
|
When a dm-verity device is configured, it is expected that the caller
|
||||||
|
has been authenticated in some way (cryptographic signatures, etc).
|
||||||
|
After instantiation, all hashes will be verified on-demand during
|
||||||
|
disk access. If they cannot be verified up to the root node of the
|
||||||
|
tree, the root hash, then the I/O will fail. This should detect
|
||||||
|
tampering with any data on the device and the hash data.
|
||||||
|
|
||||||
|
Cryptographic hashes are used to assert the integrity of the device on a
|
||||||
|
per-block basis. This allows for a lightweight hash computation on first read
|
||||||
|
into the page cache. Block hashes are stored linearly, aligned to the nearest
|
||||||
|
block size.
|
||||||
|
|
||||||
|
Hash Tree
|
||||||
|
---------
|
||||||
|
|
||||||
|
Each node in the tree is a cryptographic hash. If it is a leaf node, the hash
|
||||||
|
of some data block on disk is calculated. If it is an intermediary node,
|
||||||
|
the hash of a number of child nodes is calculated.
|
||||||
|
|
||||||
|
Each entry in the tree is a collection of neighboring nodes that fit in one
|
||||||
|
block. The number is determined based on block_size and the size of the
|
||||||
|
selected cryptographic digest algorithm. The hashes are linearly-ordered in
|
||||||
|
this entry and any unaligned trailing space is ignored but included when
|
||||||
|
calculating the parent node.
|
||||||
|
|
||||||
|
The tree looks something like:
|
||||||
|
|
||||||
|
alg = sha256, num_blocks = 32768, block_size = 4096
|
||||||
|
|
||||||
|
[ root ]
|
||||||
|
/ . . . \
|
||||||
|
[entry_0] [entry_1]
|
||||||
|
/ . . . \ . . . \
|
||||||
|
[entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
|
||||||
|
/ ... \ / . . . \ / \
|
||||||
|
blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
|
||||||
|
|
||||||
|
|
||||||
|
On-disk format
|
||||||
|
==============
|
||||||
|
|
||||||
|
The verity kernel code does not read the verity metadata on-disk header.
|
||||||
|
It only reads the hash blocks which directly follow the header.
|
||||||
|
It is expected that a user-space tool will verify the integrity of the
|
||||||
|
verity header.
|
||||||
|
|
||||||
|
Alternatively, the header can be omitted and the dmsetup parameters can
|
||||||
|
be passed via the kernel command-line in a rooted chain of trust where
|
||||||
|
the command-line is verified.
|
||||||
|
|
||||||
|
Directly following the header (and with sector number padded to the next hash
|
||||||
|
block boundary) are the hash blocks which are stored a depth at a time
|
||||||
|
(starting from the root), sorted in order of increasing index.
|
||||||
|
|
||||||
|
The full specification of kernel parameters and on-disk metadata format
|
||||||
|
is available at the cryptsetup project's wiki page
|
||||||
|
http://code.google.com/p/cryptsetup/wiki/DMVerity
|
||||||
|
|
||||||
|
Status
|
||||||
|
======
|
||||||
|
V (for Valid) is returned if every check performed so far was valid.
|
||||||
|
If any check failed, C (for Corruption) is returned.
|
||||||
|
|
||||||
|
Example
|
||||||
|
=======
|
||||||
|
Set up a device:
|
||||||
|
# dmsetup create vroot --readonly --table \
|
||||||
|
"0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
|
||||||
|
"4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
|
||||||
|
"1234000000000000000000000000000000000000000000000000000000000000"
|
||||||
|
|
||||||
|
A command line tool veritysetup is available to compute or verify
|
||||||
|
the hash tree or activate the kernel device. This is available from
|
||||||
|
the cryptsetup upstream repository http://code.google.com/p/cryptsetup/
|
||||||
|
(as a libcryptsetup extension).
|
||||||
|
|
||||||
|
Create hash on the device:
|
||||||
|
# veritysetup format /dev/sda1 /dev/sda2
|
||||||
|
...
|
||||||
|
Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
||||||
|
|
||||||
|
Activate the device:
|
||||||
|
# veritysetup create vroot /dev/sda1 /dev/sda2 \
|
||||||
|
4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
Loading…
Reference in New Issue
Block a user