1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-04 09:18:36 +03:00
lvm2/driver/device-mapper
Steven Whitehouse 5c8de1dbd5 o I'm afraid that wu and wl etc. is just too confusing.... I've changed it
to up_write() and down_write() etc so that you can see what kind of a lock
   it is (otherwise it could be anything.. semaphore, spinlock, spinlock_bh,
   spinlock_irq, br_lock, etc.)
2001-09-13 11:29:38 +00:00
..
patches o rebuilt 00_latest 2001-08-31 09:14:55 +00:00
device-mapper.h o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
dm-fs.c o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
dm-parse.c o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
dm-table.c o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
dm-target.c o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
dm.c o I'm afraid that wu and wl etc. is just too confusing.... I've changed it 2001-09-13 11:29:38 +00:00
dm.h o first sattab at custom fs. Very rough ATM. 2001-09-07 11:34:46 +00:00
README o rebuilt 00_latest 2001-08-31 09:14:55 +00:00

The main goal of this driver is to support volume management in
general, not just for LVM.  The kernel should provide general
services, not support specific applications.  eg, The driver has no
concept of volume groups.

The driver does this by mapping sector ranges for the logical device
onto 'targets'.

When the logical device is accessed, the make_request function looks
up the correct target for the given sector, and then asks this target
to do the remapping.

A btree structure is used to hold the sector range -> target mapping.
Since we know all the entries in the btree in advance we can make a
very compact tree, omitting pointers to child nodes, (child nodes
locations can be calculated).  Typical users would find they only have
a handful of targets for each logical volume LV.

Benchmarking with bonnie++ suggests that this is certainly no slower
than current LVM.


Target types are not hard coded, instead the register_mapping_type
function should be called.  A target type is specified using three
functions (see the header):

dm_ctr_fn - takes a string and contructs a target specific piece of
            context data.
dm_dtr_fn - destroy contexts.
dm_map_fn - function that takes a buffer_head and some previously
            constructed context and performs the remapping.

Currently there are two two trivial mappers, which are automatically
registered: 'linear', and 'io_error'.  Linear alone is enough to
implement most of LVM.


I do not like ioctl interfaces so this driver is currently controlled
through a /proc interface.  /proc/device-mapper/control allows you to
create and remove devices by 'cat'ing a line of the following format:

create <device name> [minor no]
remove <device name>

If you're not using devfs you'll have to do the mknod'ing yourself,
otherwise the device will appear in /dev/device-mapper automatically.

/proc/device-mapper/<device name> accepts the mapping table:

begin
<sector start> <length> <target name> <target args>...
...
end

where <target args> are specific to the target type, eg. for a linear
mapping:

<sector start> <length> linear <major> <minor> <start>

and the io-err mapping:

<sector start> <length> io-err

The begin/end lines around the table are nasty, they should be handled
by open/close of the file.

The interface is far from complete, currently loading a table either
succeeds or fails, you have no way of knowing which line of the
mapping table was erroneous.  Also there is no way to get status
information out, though this should be easy to add, either as another
/proc file, or just by reading the same /proc/device-mapper/<device>
file.  I will be seperating the loading and validation of a table from
the binding of a valid table to a device.

It has been suggested that I should implement a little custom
filesystem rather than labouring with /proc.  For example doing a
mkdir foo in /wherever/device-mapper would create a new device. People
waiting for a status change (eg, a mirror operation to complete) could
poll a file.  Does the community find this an acceptable way to go ?


At the moment the table assumes 32 bit keys (sectors), the move to 64
bits will involve no interface changes, since the tables will be read
in as ascii data.  A different table implementation can therefor be
provided at another time.  Either just by changing offset_t to 64
bits, or maybe implementing a structure which looks up the keys in
stages (ie, 32 bits at a time).


More interesting targets:

striped mapping; given a stripe size and a number of device regions
this would stripe data across the regions.  Especially useful, since
we could limit each striped region to a 32 bit area and then avoid
nasty 64 bit %'s.

mirror mapping; would set off a kernel thread slowly copying data from
one region to another, ensuring that any new writes got copied to both
destinations correctly.  Enabling us to implement a live pvmove
correctly.