mirror of
git://sourceware.org/git/lvm2.git
synced 2025-01-18 10:04:20 +03:00
e55ef72786
N.B. This means that you have to take very great care in the event that you want to access the dcache tree from in kernel o Added extra field to allow out of memory conditions to result in the correct error code. (This hasn't received a lot of testing...) I've ditched the final project (which would have cleared my whole list) since its got other complications which I don't have time to fix right now. Still as Meatloaf says, two out of three ain't bad!
The main goal of this driver is to support volume management in general, not just for LVM. The kernel should provide general services, not support specific applications. eg, The driver has no concept of volume groups. The driver does this by mapping sector ranges for the logical device onto 'targets'. When the logical device is accessed, the make_request function looks up the correct target for the given sector, and then asks this target to do the remapping. A btree structure is used to hold the sector range -> target mapping. Since we know all the entries in the btree in advance we can make a very compact tree, omitting pointers to child nodes, (child nodes locations can be calculated). Typical users would find they only have a handful of targets for each logical volume LV. Benchmarking with bonnie++ suggests that this is certainly no slower than current LVM. Target types are not hard coded, instead the register_mapping_type function should be called. A target type is specified using three functions (see the header): dm_ctr_fn - takes a string and contructs a target specific piece of context data. dm_dtr_fn - destroy contexts. dm_map_fn - function that takes a buffer_head and some previously constructed context and performs the remapping. Currently there are two two trivial mappers, which are automatically registered: 'linear', and 'io_error'. Linear alone is enough to implement most of LVM. I do not like ioctl interfaces so this driver is currently controlled through a /proc interface. /proc/device-mapper/control allows you to create and remove devices by 'cat'ing a line of the following format: create <device name> [minor no] remove <device name> If you're not using devfs you'll have to do the mknod'ing yourself, otherwise the device will appear in /dev/device-mapper automatically. /proc/device-mapper/<device name> accepts the mapping table: begin <sector start> <length> <target name> <target args>... ... end where <target args> are specific to the target type, eg. for a linear mapping: <sector start> <length> linear <major> <minor> <start> and the io-err mapping: <sector start> <length> io-err The begin/end lines around the table are nasty, they should be handled by open/close of the file. The interface is far from complete, currently loading a table either succeeds or fails, you have no way of knowing which line of the mapping table was erroneous. Also there is no way to get status information out, though this should be easy to add, either as another /proc file, or just by reading the same /proc/device-mapper/<device> file. I will be seperating the loading and validation of a table from the binding of a valid table to a device. It has been suggested that I should implement a little custom filesystem rather than labouring with /proc. For example doing a mkdir foo in /wherever/device-mapper would create a new device. People waiting for a status change (eg, a mirror operation to complete) could poll a file. Does the community find this an acceptable way to go ? At the moment the table assumes 32 bit keys (sectors), the move to 64 bits will involve no interface changes, since the tables will be read in as ascii data. A different table implementation can therefor be provided at another time. Either just by changing offset_t to 64 bits, or maybe implementing a structure which looks up the keys in stages (ie, 32 bits at a time). More interesting targets: striped mapping; given a stripe size and a number of device regions this would stripe data across the regions. Especially useful, since we could limit each striped region to a 32 bit area and then avoid nasty 64 bit %'s. mirror mapping; would set off a kernel thread slowly copying data from one region to another, ensuring that any new writes got copied to both destinations correctly. Enabling us to implement a live pvmove correctly.