From 10c7d802a04b4f92b933a8fb287127370c424e72 Mon Sep 17 00:00:00 2001
From: Petr Rockai <prockai@redhat.com>
Date: Wed, 25 May 2011 21:43:12 +0000
Subject: [PATCH] First draft of a document describing how we will
 automatically and incrementally assemble (possibly multi-component, like LVM)
 storage devices.

---
 doc/udev_assembly.txt | 83 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 doc/udev_assembly.txt

diff --git a/doc/udev_assembly.txt b/doc/udev_assembly.txt
new file mode 100644
index 000000000..acf9d2c3f
--- /dev/null
+++ b/doc/udev_assembly.txt
@@ -0,0 +1,83 @@
+Automatic device assembly by udev
+=================================
+
+We want to asynchronously assemble and activate devices as their components
+become available. Eventually, the complete storage stack should be covered,
+including: multipath, cryptsetup, LVM, mdadm. Each of these can be addressed
+more or less separately.
+
+The general plan of action is to simply provide udev rules for each of the
+device "type": for MD component devices, PVs, LUKS/crypto volumes and for
+multipathed SCSI devices. There's no compelling reason to have a daemon do these
+things: all systems that actually need to assemble multiple devices into a
+single entity already either support incremental assembly or will do so shortly.
+
+Whenever in this document we talk about udev rules, these may include helper
+programs that implement a multi-step process. In many cases, it can be expected
+that the functionality can be implemented in couple lines of shell (or couple
+hundred of C).
+
+Multipath
+---------
+
+For multipath, we will need to rely on SCSI IDs for now, until we have a better
+scheme of things, since multipath devices can't be identified until the second
+path appears, and unfortunately we need to decide whether a device is multipath
+when the *first* path appears. Anyway, the multipath folks need to sort this
+out, but it shouldn't bee too hard. Just bring up multipathing on anything that
+appears and is set up for multipathing.
+
+LVM
+---
+
+For LVM, the crucial piece of the puzzle is lvmetad, which allows us to build up
+VGs from PVs as they appear, and at the same time collect information on what is
+already available. A command, pvscan --lvmetad is expected to be used to
+implement udev rules. It is relatively easy to make this command print out a
+list of VGs (and possibly LVs) that have been made available by adding any
+particular device to the set of visible devices. In othe words, udev says "hey,
+/dev/sdb just appeared", calls pvscan --lvmetad, which talks to lvmetad, which
+says "cool, that makes vg0 complete". Pvscan takes this info and prints it out,
+and the udev rule can then somehow decide whether anything needs to be done
+about this "vg0". Presumably a table of devices that need to be activated
+automatically is made available somewhere in /etc (probably just a simple list
+of volume groups or logical volumes, given by name or UUID, globbing
+possible). The udev rule can then consult this file.
+
+Cryptsetup
+----------
+
+This may be the trickiest of the lot: the obvious hurdle here is that crypto
+volumes need to somehow obtain a key (passphrase, physical token or such),
+meaning there is interactivity involved. On the upside, dm-crypt is a 1:1
+system: one encrypted device results in one decrypted device, so no assembly or
+notification needs to be done. While interactivity is a challenge, there are at
+least partial solutions around. (TODO: Milan should probably elaborate here.)
+
+(For LUKS devices, these can probably be detected automatically. I suppose that
+non-LUKS devices can be looked up in crypttab by the rule, to decide what is the
+appropriate action to take.)
+
+MD
+--
+
+Fortunately, MD (namely mdadm) already comes with a mechanism for incremental
+assembly (mdadm -I or such). We can assume that this fits with the rest of stack
+nicely.
+
+
+Filesystem &c. discovery
+========================
+
+Considering other requirements that exist for storage systems (namely
+large-scale storage deployments), it is absolutely not feasible to have the
+system hunt automatically for filesystems based on their UUIDs. In a number of
+cases, this could mean activating tens of thousands of volumes. On small
+systems, asking for all volumes to be brought up automatically is probably the
+best route anyway, and once all storage devices are activated, scanning for
+filesystems is no different from today.
+
+In effect, no action is required on this count: only filesystems that are
+available on already active devices can be mounted by their UUID. Activating
+volumes by naming a filesystem UUID is useless, since to read the UUID the
+volume needs to be active first.