mirror of
git://sourceware.org/git/lvm2.git
synced 2024-12-22 17:35:59 +03:00
105 lines
3.8 KiB
Plaintext
105 lines
3.8 KiB
Plaintext
|
The main goal of this driver is to support volume management in
|
||
|
general, not just for LVM. The kernel should provide general
|
||
|
services, not support specific applications. eg, The driver has no
|
||
|
concept of volume groups.
|
||
|
|
||
|
The driver does this by mapping sector ranges for the logical device
|
||
|
onto 'targets'.
|
||
|
|
||
|
When the logical device is accessed, the make_request function looks
|
||
|
up the correct target for the given sector, and then asks this target
|
||
|
to do the remapping.
|
||
|
|
||
|
A btree structure is used to hold the sector range -> target mapping.
|
||
|
Since we know all the entries in the btree in advance we can make a
|
||
|
very compact tree, omitting pointers to child nodes, (child nodes
|
||
|
locations can be calculated). Typical users would find they only have
|
||
|
a handful of targets for each logical volume LV.
|
||
|
|
||
|
Benchmarking with bonnie++ suggests that this is certainly no slower
|
||
|
than current LVM.
|
||
|
|
||
|
|
||
|
Target types are not hard coded, instead the register_mapping_type
|
||
|
function should be called. A target type is specified using three
|
||
|
functions (see the header):
|
||
|
|
||
|
dm_ctr_fn - takes a string and contructs a target specific piece of
|
||
|
context data.
|
||
|
dm_dtr_fn - destroy contexts.
|
||
|
dm_map_fn - function that takes a buffer_head and some previously
|
||
|
constructed context and performs the remapping.
|
||
|
|
||
|
Currently there are two two trivial mappers, which are automatically
|
||
|
registered: 'linear', and 'io_error'. Linear alone is enough to
|
||
|
implement most of LVM.
|
||
|
|
||
|
|
||
|
I do not like ioctl interfaces so this driver is currently controlled
|
||
|
through a /proc interface. /proc/device-mapper/control allows you to
|
||
|
create and remove devices by 'cat'ing a line of the following format:
|
||
|
|
||
|
create <device name> [minor no]
|
||
|
remove <device name>
|
||
|
|
||
|
If you're not using devfs you'll have to do the mknod'ing yourself,
|
||
|
otherwise the device will appear in /dev/device-mapper automatically.
|
||
|
|
||
|
/proc/device-mapper/<device name> accepts the mapping table:
|
||
|
|
||
|
begin
|
||
|
<sector start> <length> <target name> <target args>...
|
||
|
...
|
||
|
end
|
||
|
|
||
|
where <target args> are specific to the target type, eg. for a linear
|
||
|
mapping:
|
||
|
|
||
|
<sector start> <length> linear <major> <minor> <start>
|
||
|
|
||
|
and the io-err mapping:
|
||
|
|
||
|
<sector start> <length> io-err
|
||
|
|
||
|
The begin/end lines around the table are nasty, they should be handled
|
||
|
by open/close of the file.
|
||
|
|
||
|
The interface is far from complete, currently loading a table either
|
||
|
succeeds or fails, you have no way of knowing which line of the
|
||
|
mapping table was erroneous. Also there is no way to get status
|
||
|
information out, though this should be easy to add, either as another
|
||
|
/proc file, or just by reading the same /proc/device-mapper/<device>
|
||
|
file. I will be seperating the loading and validation of a table from
|
||
|
the binding of a valid table to a device.
|
||
|
|
||
|
It has been suggested that I should implement a little custom
|
||
|
filesystem rather than labouring with /proc. For example doing a
|
||
|
mkdir foo in /wherever/device-mapper would create a new device. People
|
||
|
waiting for a status change (eg, a mirror operation to complete) could
|
||
|
poll a file. Does the community find this an acceptable way to go ?
|
||
|
|
||
|
|
||
|
At the moment the table assumes 32 bit keys (sectors), the move to 64
|
||
|
bits will involve no interface changes, since the tables will be read
|
||
|
in as ascii data. A different table implementation can therefor be
|
||
|
provided at another time. Either just by changing offset_t to 64
|
||
|
bits, or maybe implementing a structure which looks up the keys in
|
||
|
stages (ie, 32 bits at a time).
|
||
|
|
||
|
|
||
|
More interesting targets:
|
||
|
|
||
|
striped mapping; given a stripe size and a number of device regions
|
||
|
this would stripe data across the regions. Especially useful, since
|
||
|
we could limit each striped region to a 32 bit area and then avoid
|
||
|
nasty 64 bit %'s.
|
||
|
|
||
|
mirror mapping; would set off a kernel thread slowly copying data from
|
||
|
one region to another, ensuring that any new writes got copied to both
|
||
|
destinations correctly. Enabling us to implement a live pvmove
|
||
|
correctly.
|
||
|
|
||
|
|
||
|
|
||
|
|