IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We already have /dev/disk/by-id/dm-uuid-... (which encompasses the
VG UUID and LV UUID in case of LVs since the mapping's UUID is
VG+LV UUID together) and /dev/disk/by-id/dm-name-... (which encompasses
the VG and LV name in case of LVs).
This patch addds /dev/disk/by-id/lvm-pv-uuid-<PV_UUID> that completes
this scheme and makes navigation a bit easier using PV UUIDs since
one can navigate using PV UUIDs only and there's no need to do extra
PV UUID <--> kernel name matching (the PV UUID is stable across reboots).
This may come in handy in various scripts.
Since we already have the PV UUID stored in udev database (as a result
of blkid call - returned in ID_FS_UUID blkid's variable), this operation
is very cheap indeed, just creating the extra one symlink.
Clear temporary DM_DISABLE_OTHER_RULES_FLAG properly. This did not
cause any bug or problem as the temporary variable is overwritten next
time it's used again, but we should still clean it properly!
Do not drop device's flag to report readiness for systemd
processing if there's any event that follows the activatiion
event itself. Otherwise, systemd would lost track of this device
on any other event that follows the activating event (IOW, we
need to make SYSTEMD_READY variable change level-based, not edge-based).
This patch applies for MD and loop devices used as PVs.
(intra-release fix for commit 4c267c7286)
Some devices, similarly to us, are not prepared after ADD event, but
after an extra CHANGE event when the device is properly set up.
This includes MD and loop devices. This patch fixes the
SYSTEMD_READY assignment that is crucial for proper functionality
of SYSTEMD_WANTS that we use to instantiate a lvm2-pvscan@.service
systemd service to activate the VG/LVs (see also bug
info).
All that extra handling of foreign devices should eventually be moved
to rules which process those devices primarily (MD and loop)! We should
only check a dedicated variable whether the device is usable or not.
MD can directly create partition devices without a need to run
an extra kpartx or partprobe call. We need to react to this event in
a different way as for bare MD devices - we need to handle the ADD event
for KERNEL=="md[0-9]*p[0-9]*" kernel name and trigger the LVM scanning
to update lvmetad to trigger autoactivation and so on...
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1023250
Reset the DM_UDEV_OTHER_RULES_FLAG to original value right at the
time of dropping the DM_NOSCAN flag.
When DM_NOSCAN is set, the DM_UDEV_DISABLE_OTHER_RULES_FLAG is also set
to avoid udev processing in "other/foreign" rules. If the noscan flag
is dropped, the DM_UDEV_DISABLE_OTHER_RULES_FLAG should be reset to
its original value.
Also, lvmetad should respect the DM_UDEV_DISABLE_OTHER_RULES_FLAG
because if the volume is set with this flag it:
- definitely is not a top-level device (so makes no sense for lvmetad scanning)
- is not supposed to be scanned further (for any stacking on top of
it, including LVM stacking itself and any autoactivation of stacked LVs)
When using ENV{SYSTEMD_WANTS}=lvm2-pvscan@... to instantiate a service
for lvmetad scan when the new PV appears in the system, the service
is started and executed. However, to track device removal, we need
to bind it (the "BindsTo" systemd directive) to a certain .device
systemd unit.
In default systemd setup, the device is tracked by it's name and
sysfs path (there's normally a sysfs path .device systemd unit for
a device and then the device name .device unit as an alias for it).
Neither of these two is useful for lvmetad update as we need to bind
it to device's <major>:<minor> pair.
The /dev/block/<major>:<minor> is the essential symlink under /dev
that exists for each block device (created by default udev rules
provided by udev directly). So let's use this as an alias for
the device's .device unit as well by means of "ENV{SYSTEMD_ALIAS}"
declaration within udev rules which systemd understands (this will
create a new alias "dev-block-<major>:<minor>.device".
Then we can easily bind the "dev-block-<major>:<minor>" device
systemd unit with instantiated lvm2-pvscan@<major>:<minor>.service.
So once the device is removed from the systemd, the
lvm-pvscan@<major>:<minor>.service executes it's ExecStop action
(which in turn notifies lvmetad about the device being gone).
This completes the udev-systemd-lvmetad interaction then.
The new lvm2-pvscan@.service is responsible for on-demand execution
of "pvscan --cache --activate ay" which causes lvmetad to be
updated and LVM activation done if the VG is complete.
Also, use udev-systemd mechanism to instantiate the job as the
lvm2-pvscan@$devnode.service on each newly appeared PV in the system.
This prevents the background job to be killed (that would happen
if it was directly forked from udev rule - this behaviour is seen
in recent versions of udev with the help of systemd that can track
detached processes - the detached process would still be in the same
cgroup).
To enable this official udev-systemd protocol for instantiating
background jobs, use new --enable-udev-systemd-background-jobs
configure switch (it's disabled by default). This option is highly
recommended wherever systemd is used!
Recognize DM_SUBSYSTEM_UDEV_FLAG0 which for LVM is the "LVM_NOSCAN"
flag that causes the scanning to be skipped (mainly blkid) and
also directs all the foreign rules to be skipped as well.
Important thing here is that the "watch" udev rules is still set
as well as the /dev/disk/by-id content created (which does not
require any scanning to be done). Also, the flag is dropped on
any subsequent event and scanning done...
Each subsystem rule that needs to import any of DM_SUBSYSTEM_UDEV_FLAG*
flags is responsible for doing so. This simply moves control of these
flags from general 10-dm.rules to any subsystem rule using these flags
as each subsystem knows better how to handle these flags on its own.
The DM_ACTIVATION and DM_UDEV_PRIMARY_SOURCE_FLAG needs to be kept the
way it was for backward compatibility (e.g. the old rules are still
in initramfs). This way the check in whether the device should be
scanned in 69-dm-lvmetad.rules is even easier.
New versions of udev changed the default event timeout to 30s
from original 3min. This causes problems with LVM processes that
starve because of the IO load caused by some LVM actions (e.g.
mirror/raid synchronization).
Reinstate the 3min udev timeout for now until we optimize this
in a way that even the 30s timeout is sufficient.
This patch fixes the way the special devices are handled
(special in this context means that they're not usable
after the usual ADD event like other generic devices):
- DM and MD devices are pvscanned only when they are just set up.
This is the first CHANGE event that makes the device usable
(the DM_UDEV_PRIMARY_SOURCE_FLAG is set for DM and the
md/array_state sysfs attribute is present for MD).
Whether the device is activated is remembered via
DM_ACTIVATED (for DM) and LVM_MD_PV_ACTIVATED (for MD)
udev environment variable. This is then used to decide
whether we should fire the pvscan on ADD event to
support coldplugging. For any (artificial) ADD event
generated during coldplug, the device must be already
set up properly to fire the pvscan on it.
- Similar for loop devices. For loop devices, only CHANGE
events are relevant (so there's a CHANGE after the loop
device is set up as well as detached). Whether the loop
has just been activated is detected via loop/backing_file
sysfs attribute presence. The activation state is remembered
via LVM_LOOP_PV_ACTIVATED udev environment variable.
- Do not pvscan multipath device components (underlying paths).
- Do not pvscan RAID device components.
- Also, set LVM_SCANNED="1" udev environment variable for
debug purposes (it's visible in the lvmdump -u that takes
the current udev database). This variable is set once
the pvscan is triggered.
The table below summarises when the pvscan is triggered
(marked with X, X* means fire only if the special dev is properly set up):
| real ADD | real CHANGE | artificial ADD | artificial CHANGE | remove
=============================================================================
DM | | X | X* | | X
MD | | X | X* | |
loop | | X | X* | |
other | X | | X | | X
Udev daemon has recently introduced a limit on the number of udev
processes (there was no limit before). This causes a problem
when calling pvscan --cache -aay in lvmetad udev rules which
is supposed to activate the volumes. This activation is itself
synced with udev and so it waits for the activation to complete
before the pvscan finishes. The event processing can't continue
until this pvscan call is finished.
But if we're at the limit with the udev process count, we can't
instatiate any more udev processes, all such events are queued
and so we can't process the lvm activation event for which the
pvscan is waiting.
Then we're in a deadlock since the udev process with the
pvscan --cache -aay call waits for the lvm activation udev
processing to complete, but that will never happen as there's
this limit hit with the number of udev processes.
The process with pvscan --cache -aay actually times out eventually
(3min or 30sec, depends on the version of udev).
This patch makes it possible to run the pvscan --cache -aay
in the background so the udev processing can continue and hence
we can avoid the deadlock mentioned above.
In stacked environment where we have a PV layered on top of a
snapshot LV and then removing the LV, lvmetad still keeps information
about the PV:
[0] raw/~ $ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[0] raw/~ $ vgcreate vg /dev/sda
Volume group "vg" successfully created
[0] raw/~ $ lvcreate -L32m vg
Logical volume "lvol0" created
[0] raw/~ $ lvcreate -L32m -s vg/lvol0
Logical volume "lvol1" created
[0] raw/~ $ pvcreate /dev/vg/lvol1
Physical volume "/dev/vg/lvol1" successfully created
[0] raw/~ $ lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[0] raw/~ $ pvs
No device found for PV BdNlu2-7bHV-XcIp-mFFC-PPuR-ef6K-yffdzO.
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 92.00m
[0] raw/~ $ pvscan --cache --major 253 --minor 3
Device 253:3 not found. Cleared from lvmetad cache.
This is because of the reactivation that is done just before
snapshot removal as part of the process (vg/lvol1 from the example above).
This causes a CHANGE event to be generated, but any scan done
on the LV does not see the original data anymore (in this case
the stacked PV label on top) and consequently the ID_FS_TYPE="LVM2_member"
(provided by blkid scan) is not stored in udev db anymore for the LV.
Consequently, the pvscan --cache is not run anymore as the dev is not
identified as LVM PV by the "LVM2_member" id - lvmetad loses this info
and still keeps records about the PV.
We can run into a very similar problem with erasing the PV label directly:
[0] raw/~ $ lvcreate -L32m vg
Logical volume "lvol0" created
[0] raw/~ $ pvcreate /dev/vg/lvol0
Physical volume "/dev/vg/lvol0" successfully created
[0] raw/~ $ dd if=/dev/zero of=/dev/vg/lvol0 bs=1M
dd: error writing '/dev/vg/lvol0': No space left on device
33+0 records in
32+0 records out
33554432 bytes (34 MB) copied, 0.380921 s, 88.1 MB/s
[0] raw/~ $ pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 92.00m
/dev/vg/lvol0 lvm2 a-- 32.00m 32.00m
[0] raw/~ $ pvscan --cache --major 253 --minor 2
No PV label found on /dev/vg/lvol0.
This patch adds detection of this change from ID_FS_LABEL="LVM2_member"
to ID_FS_LABEL="<whatever_else>" and hence informing the lvmetad
about PV being gone.
If loop device is first configured on systems where /dev/loop-control
is used to dynamically create the loop device itself, there's an
ADD+CHANGE even generated. But next time the existing /dev/loop[0-9]*
is reused, there's only a CHANGE event since the device representing
it is already present in kernel (so no ADD event in this case).
We can't ignore this CHANGE event for loop devices! This is a regression
caused by 756bcabbfe. We already had
a similar problem with MD devices which was fixed by
2ac217d408 (but that one was
only an intra-release fix).
Commit 756bcabbfe restricted the
situations at which the LVM autoactivation fires - only on ADD
event for devices other than DM. However, this caused a problem
for MD devices...
MD devices are activated in a very similar way as DM devices:
the MD dev is created on first appeareance of MD array member
(ADD event) and stays *inactive* until the array is complete.
Just then the MD dev turns to active state and this is reported
to userspace by CHANGE event.
Unfortunately, we can't differentiate between the CHANGE event
coming from udev trigger/WATCH rule and CHANGE event coming from
the transition to active state - MD would need to add similar logic
we already use to detect this in DM world. For now, we just have
to enable pvscan --cache on *all* CHANGE events for MD so the
autoactivation of the LVM volumes on top of MD works.
A downside of this is that a spurious CHANGE event for MD dev
can cause the LVM volumes on top of it to be automatically activated.
However, one should not open/change the device underneath until
the device above in the stack is removed! So this situation should
only happen if one opens the MD dev for read-write by mistake
(and hence firing the CHANGE event because of the WATCH udev rule),
or if calling udev trigger manually for the MD dev.
(No WHATS_NEW here as this fixes the commit mentioned
above and which has not been released yet.)
Commit 756bcabbfe fixed autoactivation
to not trigger on each uevent for a PV that appeared in the system
most notably the events that are triggered artificially (udevadm
trigger or as the result of the WATCH udev rule being applied that
consequently generates CHANGE uevents). This fixed a situation in
which VGs/LVs were activated when they should not.
BUT we still need to care about the coldplug used at boot to
retrigger the ADD events - the "udevadm trigger --action=add"!
For non-DM-based PVs, this is already covered as for these we
run the autoactivation on ADD event only.
However, for DM-based PVs, we still need to run the
autoactivation even for the artificial ADD event, reusing
the udev DB content from previous proper CHANGE event that
came with the DM device activation.
Simply, this patch fixes a situation in which we run extra
"udevadm trigger --action=add" (or echo add > /sys/block/<dev>/uevent)
for DM-based PVs (cryptsetup devices, multipath devices, any
other DM devices...).
Without this patch, while using lvmetad + autoactivation,
any VG/LV that has a DM-based PV and for which we do not
call the activation directly, the VG/LV is not activated.
For example a VG with an LV with root FS on it which is directly
activated in initrd and then missing activation of the rest
of the LVs in the VG because of unhandled uevent retrigger on
boot after switching to root FS (the "coldplug").
(No WHATS_NEW here as this fixes the commit mentioned
above and which was not released yet.)
Before, the pvscan --cache -aay was called on each ADD and CHANGE
uevent (for a device that is not a device-mapper device) and each CHANGE
event (for a PV that is a device-mapper device).
This causes troubles with autoactivation in some cases as CHANGE event
may originate from using the OPTION+="watch" udev rule that is defined
in 60-persistent-storage.rules (part of the rules provided by udev
directly) and it's used for all block devices
(except fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md* devices). For example, the
following sequence incorrectly activates the rest of LVs in a VG if one
of the LVs in the VG is being removed:
[root@rhel6-a ~]# pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[root@rhel6-a ~]# vgcreate vg /dev/sda
Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol0" created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol1" created
[root@rhel6-a ~]# vgchange -an vg
0 logical volume(s) in volume group "vg" now active
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi------ 4.00m
lvol1 vg -wi------ 4.00m
[root@rhel6-a ~]# lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi-a---- 4.00m
...so the vg was deactivated, then lvol1 removed, and we end up with
lvol1 removed (which is ok) BUT with lvol0 activated (which is wrong)!!!
This is because after lvol1 removal, we need to write metadata to the
underlying device /dev/sda and that causes the CHANGE event to be
generated (because of the WATCH udev rule set on this device) and this
causes the pvscan --cache -aay to be reevaluated.
We have to limit this and call pvscan --cache -aay to autoactivate
VGs/LVs only in these cases:
--> if the *PV is not a dm device*, scan only after proper device
addition (ADD event) and not with any other changes (CHANGE event)
--> if the *PV is a dm device*, scan only after proper mapping
activation (CHANGE event + the underlying PV in a state "just
activated")
Define auto_activation_handler that activates VGs/LVs automatically
based on the activation/auto_activation_volume_list (activating all
volumes by default if the list is not defined).
The autoactivation is done within the pvscan call in 69-dm-lvmetad.rules
that watches for udev events (device appearance/removal).
For now, this works for non-clustered and complete VGs only.
Remove executable path detection in udev rules and use sbindir that
is configured, but still provide the original functionality by means
of 'configure --enable-udev-rule-exec-detection'.
Normally, the exec path for the tools called in udev rules should
not differ from the sbindir used, however, there are cases this is
necessary. For example different environments could be assembled
in a way that these path differ for some reason (distribution installer,
initrd ...).
This functionality is kept for compatibility only. Any environment
moving the binaries around and using different paths should be fixed
eventually!
We can't use 'DM_SBIN_PATH'. This one is set only for DM devices but not
for all block devices - the pvscan is run on all relevant block devices!
This LVM_SBIN_PATH (as well as DM_SBIN_PATH) detection should be removed
eventually but for upstream solution, we still have to do that as there are
known cases where the binaries are put either in /sbin or /usr/sbin
(some installation systems, initrd systems etc.).
Why using the order 69:
- Storage processing in general happens in 60-persistent-storage.rules,
including the blkid call that adds some usable information we can use
for filtering and speedup (these rules are part of upstream udev and
the order is preserved on most distros)
- There's still some other storage-related processing done after
60-persistent-storage.rules in general. These might add some detailed
storage-related information we might use to filter devices effectively
(e.g. MD udev rules, ...).
- We need lvmetad rules to be processed before any consumers can use the
output - so the metadata cache is ready soon enough (e.g. udisks rules).
- There's no official (upstream udev) document about assigning the order,
so this number is chosen in best belief it will suit all scenarios.
We don't have anything better yet...
The problems the watch rule caused when removing devices should be covered
now with the "retry remove" logic. It's also better to have this maintained
by us, rather than having this rule anywhere else without proper control.
Skip decoding of DM flags when device is removed.
We currently need DM flags only for add|change events. So forking
dmsetup process for removed devices is a waste of CPU time.
Udev is already quite slow, so make it just a tiny bit faster.
This is to avoid any scanning and processing of DM devices while they are in
suspended state (e.g. a rename while the device is suspended - a CHANGE event
is generated!). Otherwise, any scanning in the rules could end up with locking
the calling process until the device is resumed and so we don't receive a
notification about udev rules completion until then (and that effectively
locks out the process awaiting the notification!).
However, we still keep 'disk' and any 'subsystem' related udev rules running.
We trust these and these should check themselves whether a device is suspended
or not, not trying to run any scanning if it is.
This can happen with older rules (without support for synthesized events)
that are still part of initrd while using new udev rules in the system itself.
The consequence was that new udev rules incorrectly assumed that not having
DM_UDEV_PRIMARY_SOURCE_FLAG set always means the uevent is synthesized and
inappropriate (device is still not properly activated) and so it should be
ignored. However, initrd is not updated automatically while updating the
libdevmapper/udev rules in the system and so we end up with the rules not
detecting and setting crucial parts in the initrd environment and the rules
in the system that rely on the information that should have been stored in
udev db (which is incorrect in this configuration, of course).
The overall consequence is that the update of libdevmapper/lvm2 without
regenerating the initrd could end up with a boot failure! Ignoring the event
means removing any existing symlinks in /dev!
To fix this, increase udev rules version to make a difference. So from now on,
mark rules without proper support for synthesized events as
DM_UDEV_RULES_VSN="1" and 2 (or higher) if that support is included.
We still need to detect this one! We're not so strict with CHANGE events as
with the ADD events while applying filters in the rules so this one would
pass and it would process the rules prematurely (because it appears *before*
the actual CHANGE event used when resuming a DM device while setting read-only
state at the same time).
For now, this is just a precaution. Normally, all the other (non-dm) rules
should check DM_UDEV_DISABLE_OTHER_RULES_FLAG and therefore avoid setting
any inotify watches as well. But let's make sure.
Support for final assignment of the "nowatch" rule (the use of ":=") will
appear in next udev release, v160. This should also work in previous udev
versions but the setting won't be sealed so any further OPTIONS="watch" will
always prevail there.
We may want to add more specific "nowatch" rules later if needed.
We can use DM_UDEV_PRIMARY_SOURCE_FLAG to identify the spurious events
and use it as an indication that the device has already been activated before
(and hence we can find this property in udev database).
WARNING: This change requires udev startup script to preserve udev database
from initrd. All the information stored there during activation of devices
is important for the initial "udevadm trigger --action=add" call that is
used in udev startup script. If not done this way, udev startup script needs
to define DM_UDEV_PRIMARY_SOURCE_FLAG=1 property for any ADD events it uses.
This rule appeared in udev v152 and it helps us to support spurious events
where we didn't have any flags set (events originated in udevadm trigger
or the watch rule). These flags are important to direct the rule application.
Now, with the help of this rule, we can regenerate old udev db content.
To implement this correctly, we need to flag all proper DM udev events with
DM_UDEV_PRIMARY_SOURCE_FLAG. That happens automatically for all ioctls
generating events originated in libdevmapper.
Fix unwanted modification of $(top_builddir)/make.tmpl.
Using dependency rules to install rules for udev.
There is minor problem, with concurent usage of builddir
and srcdir could lead to missuse of 10-dm.rules which
could be found in VPATH from different builddir.
However current solution uses intermediate target so
the generated 10-dm.rules exists only for short period of time
during make install execution.
Usage of VPATH makes troubles when used within $(builddir).
Not only source files are being found through VPATH,
but targets as well. (make --debug=v)
Thus if user builds the code in $(srcdir) and also in some $(builddir)
he gets mangled results as some generated files (i.e. .export.sym)
are 'reused' from $(srcdir) instead of $(builddir).
This patch switches to use vpath were we could explicitly name
suffixes that should be looked via vpath - we must take care,
we do not generate files with these suffixes:
.c, .in, .po, .exported_symbols