1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2024-12-23 17:34:00 +03:00
systemd-stable/docs/BLOCK_DEVICE_LOCKING.md
Filipe Brandenburger c3e270f4ee docs: add a "front matter" snippet to our markdown pages
It turns out Jekyll (the engine behind GitHub Pages) requires that pages
include a "Front Matter" snippet of YAML at the top for proper rendering.

Omitting it will still render the pages, but including it opens up new
possibilities, such as using a {% for %} loop to generate index.md instead of
requiring a separate script.

I'm hoping this will also fix the issue with some of the pages (notably
CODE_OF_CONDUCT.html) not being available under systemd.io

Tested locally by rendering the website with Jekyll. Before this change, the
*.md files were kept unchanged (so not sure how that even works?!), after this
commit, proper *.html files were generated from it.
2019-01-02 14:16:34 -08:00

3.7 KiB

title
Locking Block Device Access

Locking Block Device Access

TL;DR: Use BSD file locks (flock(2)) on block device nodes to synchronize access for partitioning and file system formatting tools.

systemd-udevd probes all block devices showing up for file system superblock and partition table information (utilizing libblkid). If another program concurrently modifies a superblock or partition table this probing might be affected, which is bad in itself, but also might in turn result in undesired effects in programs subscribing to udev events.

Applications manipulating a block device can temporarily stop systemd-udevd from processing rules on it — and thus bar it from probing the device — by taking a BSD file lock on the block device node. Specifically, whenever systemd-udevd starts processing a block device it takes a LOCK_SH|LOCK_NB lock using flock(2) on the main block device (i.e. never on any partition block device, but on the device the partition belongs to). If this lock cannot be taken (i.e. flock() returns EBUSY), it refrains from processing the device. If it manages to take the lock it is kept for the entire time the device is processed.

Note that systemd-udevd also watches all block device nodes it manages for inotify() IN_CLOSE events: whenever such an event is seen, this is used as trigger to re-run the rule-set for the device.

These two concepts allow tools such as disk partitioners or file system formatting tools to safely and easily take exclusive ownership of a block device while operating: before starting work on the block device, they should take an LOCK_EX lock on it. This has two effects: first of all, in case systemd-udevd is still processing the device the tool will wait for it to finish. Second, after the lock is taken, it can be sure that that systemd-udevd will refrain from processing the block device, and thus all other client applications subscribed to it won't get device notifications from potentially half-written data either. After the operation is complete the partitioner/formatter can simply close the device node. This has two effects: it implicitly releases the lock, so that systemd-udevd can process events on the device node again. Secondly, it results an IN_CLOSE event, which causes systemd-udevd to immediately re-process the device — seeing all changes the tool made — and notify subscribed clients about it.

Besides synchronizing block device access between systemd-udevd and such tools this scheme may also be used to synchronize access between those tools themselves. However, do note that flock() locks are advisory only. This means if one tool honours this scheme and another tool does not, they will of course not be synchronized properly, and might interfere with each other's work.

Note that the file locks follow the usual access semantics of BSD locks: since systemd-udevd never writes to such block devices it only takes a LOCK_SH shared lock. A program intending to make changes to the block device should take a LOCK_EX exclusive lock instead. For further details, see the flock(2) man page.

And please keep in mind: BSD file locks (flock()) and POSIX file locks (lockf(), F_SETLK, …) are different concepts, and in their effect orthogonal. The scheme discussed above uses the former and not the latter, because these types of locks more closely match the required semantics.

Summarizing: it is recommended to take LOCK_EX BSD file locks when manipulating block devices in all tools that change file system block devices (mkfs, fsck, …) or partition tables (fdisk, parted, …), right after opening the node.