IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We already have /dev/disk/by-id/dm-uuid-... (which encompasses the
VG UUID and LV UUID in case of LVs since the mapping's UUID is
VG+LV UUID together) and /dev/disk/by-id/dm-name-... (which encompasses
the VG and LV name in case of LVs).
This patch addds /dev/disk/by-id/lvm-pv-uuid-<PV_UUID> that completes
this scheme and makes navigation a bit easier using PV UUIDs since
one can navigate using PV UUIDs only and there's no need to do extra
PV UUID <--> kernel name matching (the PV UUID is stable across reboots).
This may come in handy in various scripts.
Since we already have the PV UUID stored in udev database (as a result
of blkid call - returned in ID_FS_UUID blkid's variable), this operation
is very cheap indeed, just creating the extra one symlink.
Do not drop device's flag to report readiness for systemd
processing if there's any event that follows the activatiion
event itself. Otherwise, systemd would lost track of this device
on any other event that follows the activating event (IOW, we
need to make SYSTEMD_READY variable change level-based, not edge-based).
This patch applies for MD and loop devices used as PVs.
(intra-release fix for commit 4c267c7286145165dfe078f77d18d194a21a2e1c)
Some devices, similarly to us, are not prepared after ADD event, but
after an extra CHANGE event when the device is properly set up.
This includes MD and loop devices. This patch fixes the
SYSTEMD_READY assignment that is crucial for proper functionality
of SYSTEMD_WANTS that we use to instantiate a lvm2-pvscan@.service
systemd service to activate the VG/LVs (see also bug
info).
All that extra handling of foreign devices should eventually be moved
to rules which process those devices primarily (MD and loop)! We should
only check a dedicated variable whether the device is usable or not.
MD can directly create partition devices without a need to run
an extra kpartx or partprobe call. We need to react to this event in
a different way as for bare MD devices - we need to handle the ADD event
for KERNEL=="md[0-9]*p[0-9]*" kernel name and trigger the LVM scanning
to update lvmetad to trigger autoactivation and so on...
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1023250
Reset the DM_UDEV_OTHER_RULES_FLAG to original value right at the
time of dropping the DM_NOSCAN flag.
When DM_NOSCAN is set, the DM_UDEV_DISABLE_OTHER_RULES_FLAG is also set
to avoid udev processing in "other/foreign" rules. If the noscan flag
is dropped, the DM_UDEV_DISABLE_OTHER_RULES_FLAG should be reset to
its original value.
Also, lvmetad should respect the DM_UDEV_DISABLE_OTHER_RULES_FLAG
because if the volume is set with this flag it:
- definitely is not a top-level device (so makes no sense for lvmetad scanning)
- is not supposed to be scanned further (for any stacking on top of
it, including LVM stacking itself and any autoactivation of stacked LVs)
The new lvm2-pvscan@.service is responsible for on-demand execution
of "pvscan --cache --activate ay" which causes lvmetad to be
updated and LVM activation done if the VG is complete.
Also, use udev-systemd mechanism to instantiate the job as the
lvm2-pvscan@$devnode.service on each newly appeared PV in the system.
This prevents the background job to be killed (that would happen
if it was directly forked from udev rule - this behaviour is seen
in recent versions of udev with the help of systemd that can track
detached processes - the detached process would still be in the same
cgroup).
To enable this official udev-systemd protocol for instantiating
background jobs, use new --enable-udev-systemd-background-jobs
configure switch (it's disabled by default). This option is highly
recommended wherever systemd is used!
Recognize DM_SUBSYSTEM_UDEV_FLAG0 which for LVM is the "LVM_NOSCAN"
flag that causes the scanning to be skipped (mainly blkid) and
also directs all the foreign rules to be skipped as well.
Important thing here is that the "watch" udev rules is still set
as well as the /dev/disk/by-id content created (which does not
require any scanning to be done). Also, the flag is dropped on
any subsequent event and scanning done...
The DM_ACTIVATION and DM_UDEV_PRIMARY_SOURCE_FLAG needs to be kept the
way it was for backward compatibility (e.g. the old rules are still
in initramfs). This way the check in whether the device should be
scanned in 69-dm-lvmetad.rules is even easier.
This patch fixes the way the special devices are handled
(special in this context means that they're not usable
after the usual ADD event like other generic devices):
- DM and MD devices are pvscanned only when they are just set up.
This is the first CHANGE event that makes the device usable
(the DM_UDEV_PRIMARY_SOURCE_FLAG is set for DM and the
md/array_state sysfs attribute is present for MD).
Whether the device is activated is remembered via
DM_ACTIVATED (for DM) and LVM_MD_PV_ACTIVATED (for MD)
udev environment variable. This is then used to decide
whether we should fire the pvscan on ADD event to
support coldplugging. For any (artificial) ADD event
generated during coldplug, the device must be already
set up properly to fire the pvscan on it.
- Similar for loop devices. For loop devices, only CHANGE
events are relevant (so there's a CHANGE after the loop
device is set up as well as detached). Whether the loop
has just been activated is detected via loop/backing_file
sysfs attribute presence. The activation state is remembered
via LVM_LOOP_PV_ACTIVATED udev environment variable.
- Do not pvscan multipath device components (underlying paths).
- Do not pvscan RAID device components.
- Also, set LVM_SCANNED="1" udev environment variable for
debug purposes (it's visible in the lvmdump -u that takes
the current udev database). This variable is set once
the pvscan is triggered.
The table below summarises when the pvscan is triggered
(marked with X, X* means fire only if the special dev is properly set up):
| real ADD | real CHANGE | artificial ADD | artificial CHANGE | remove
=============================================================================
DM | | X | X* | | X
MD | | X | X* | |
loop | | X | X* | |
other | X | | X | | X
Udev daemon has recently introduced a limit on the number of udev
processes (there was no limit before). This causes a problem
when calling pvscan --cache -aay in lvmetad udev rules which
is supposed to activate the volumes. This activation is itself
synced with udev and so it waits for the activation to complete
before the pvscan finishes. The event processing can't continue
until this pvscan call is finished.
But if we're at the limit with the udev process count, we can't
instatiate any more udev processes, all such events are queued
and so we can't process the lvm activation event for which the
pvscan is waiting.
Then we're in a deadlock since the udev process with the
pvscan --cache -aay call waits for the lvm activation udev
processing to complete, but that will never happen as there's
this limit hit with the number of udev processes.
The process with pvscan --cache -aay actually times out eventually
(3min or 30sec, depends on the version of udev).
This patch makes it possible to run the pvscan --cache -aay
in the background so the udev processing can continue and hence
we can avoid the deadlock mentioned above.
In stacked environment where we have a PV layered on top of a
snapshot LV and then removing the LV, lvmetad still keeps information
about the PV:
[0] raw/~ $ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[0] raw/~ $ vgcreate vg /dev/sda
Volume group "vg" successfully created
[0] raw/~ $ lvcreate -L32m vg
Logical volume "lvol0" created
[0] raw/~ $ lvcreate -L32m -s vg/lvol0
Logical volume "lvol1" created
[0] raw/~ $ pvcreate /dev/vg/lvol1
Physical volume "/dev/vg/lvol1" successfully created
[0] raw/~ $ lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[0] raw/~ $ pvs
No device found for PV BdNlu2-7bHV-XcIp-mFFC-PPuR-ef6K-yffdzO.
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 92.00m
[0] raw/~ $ pvscan --cache --major 253 --minor 3
Device 253:3 not found. Cleared from lvmetad cache.
This is because of the reactivation that is done just before
snapshot removal as part of the process (vg/lvol1 from the example above).
This causes a CHANGE event to be generated, but any scan done
on the LV does not see the original data anymore (in this case
the stacked PV label on top) and consequently the ID_FS_TYPE="LVM2_member"
(provided by blkid scan) is not stored in udev db anymore for the LV.
Consequently, the pvscan --cache is not run anymore as the dev is not
identified as LVM PV by the "LVM2_member" id - lvmetad loses this info
and still keeps records about the PV.
We can run into a very similar problem with erasing the PV label directly:
[0] raw/~ $ lvcreate -L32m vg
Logical volume "lvol0" created
[0] raw/~ $ pvcreate /dev/vg/lvol0
Physical volume "/dev/vg/lvol0" successfully created
[0] raw/~ $ dd if=/dev/zero of=/dev/vg/lvol0 bs=1M
dd: error writing '/dev/vg/lvol0': No space left on device
33+0 records in
32+0 records out
33554432 bytes (34 MB) copied, 0.380921 s, 88.1 MB/s
[0] raw/~ $ pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 92.00m
/dev/vg/lvol0 lvm2 a-- 32.00m 32.00m
[0] raw/~ $ pvscan --cache --major 253 --minor 2
No PV label found on /dev/vg/lvol0.
This patch adds detection of this change from ID_FS_LABEL="LVM2_member"
to ID_FS_LABEL="<whatever_else>" and hence informing the lvmetad
about PV being gone.
If loop device is first configured on systems where /dev/loop-control
is used to dynamically create the loop device itself, there's an
ADD+CHANGE even generated. But next time the existing /dev/loop[0-9]*
is reused, there's only a CHANGE event since the device representing
it is already present in kernel (so no ADD event in this case).
We can't ignore this CHANGE event for loop devices! This is a regression
caused by 756bcabbfe297688ba240a880bc2b55265ad33f0. We already had
a similar problem with MD devices which was fixed by
2ac217d408470dcecb69b83d9cbf7a254747fa5b (but that one was
only an intra-release fix).
Commit 756bcabbfe297688ba240a880bc2b55265ad33f0 restricted the
situations at which the LVM autoactivation fires - only on ADD
event for devices other than DM. However, this caused a problem
for MD devices...
MD devices are activated in a very similar way as DM devices:
the MD dev is created on first appeareance of MD array member
(ADD event) and stays *inactive* until the array is complete.
Just then the MD dev turns to active state and this is reported
to userspace by CHANGE event.
Unfortunately, we can't differentiate between the CHANGE event
coming from udev trigger/WATCH rule and CHANGE event coming from
the transition to active state - MD would need to add similar logic
we already use to detect this in DM world. For now, we just have
to enable pvscan --cache on *all* CHANGE events for MD so the
autoactivation of the LVM volumes on top of MD works.
A downside of this is that a spurious CHANGE event for MD dev
can cause the LVM volumes on top of it to be automatically activated.
However, one should not open/change the device underneath until
the device above in the stack is removed! So this situation should
only happen if one opens the MD dev for read-write by mistake
(and hence firing the CHANGE event because of the WATCH udev rule),
or if calling udev trigger manually for the MD dev.
(No WHATS_NEW here as this fixes the commit mentioned
above and which has not been released yet.)
Commit 756bcabbfe297688ba240a880bc2b55265ad33f0 fixed autoactivation
to not trigger on each uevent for a PV that appeared in the system
most notably the events that are triggered artificially (udevadm
trigger or as the result of the WATCH udev rule being applied that
consequently generates CHANGE uevents). This fixed a situation in
which VGs/LVs were activated when they should not.
BUT we still need to care about the coldplug used at boot to
retrigger the ADD events - the "udevadm trigger --action=add"!
For non-DM-based PVs, this is already covered as for these we
run the autoactivation on ADD event only.
However, for DM-based PVs, we still need to run the
autoactivation even for the artificial ADD event, reusing
the udev DB content from previous proper CHANGE event that
came with the DM device activation.
Simply, this patch fixes a situation in which we run extra
"udevadm trigger --action=add" (or echo add > /sys/block/<dev>/uevent)
for DM-based PVs (cryptsetup devices, multipath devices, any
other DM devices...).
Without this patch, while using lvmetad + autoactivation,
any VG/LV that has a DM-based PV and for which we do not
call the activation directly, the VG/LV is not activated.
For example a VG with an LV with root FS on it which is directly
activated in initrd and then missing activation of the rest
of the LVs in the VG because of unhandled uevent retrigger on
boot after switching to root FS (the "coldplug").
(No WHATS_NEW here as this fixes the commit mentioned
above and which was not released yet.)
Before, the pvscan --cache -aay was called on each ADD and CHANGE
uevent (for a device that is not a device-mapper device) and each CHANGE
event (for a PV that is a device-mapper device).
This causes troubles with autoactivation in some cases as CHANGE event
may originate from using the OPTION+="watch" udev rule that is defined
in 60-persistent-storage.rules (part of the rules provided by udev
directly) and it's used for all block devices
(except fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md* devices). For example, the
following sequence incorrectly activates the rest of LVs in a VG if one
of the LVs in the VG is being removed:
[root@rhel6-a ~]# pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[root@rhel6-a ~]# vgcreate vg /dev/sda
Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol0" created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol1" created
[root@rhel6-a ~]# vgchange -an vg
0 logical volume(s) in volume group "vg" now active
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi------ 4.00m
lvol1 vg -wi------ 4.00m
[root@rhel6-a ~]# lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi-a---- 4.00m
...so the vg was deactivated, then lvol1 removed, and we end up with
lvol1 removed (which is ok) BUT with lvol0 activated (which is wrong)!!!
This is because after lvol1 removal, we need to write metadata to the
underlying device /dev/sda and that causes the CHANGE event to be
generated (because of the WATCH udev rule set on this device) and this
causes the pvscan --cache -aay to be reevaluated.
We have to limit this and call pvscan --cache -aay to autoactivate
VGs/LVs only in these cases:
--> if the *PV is not a dm device*, scan only after proper device
addition (ADD event) and not with any other changes (CHANGE event)
--> if the *PV is a dm device*, scan only after proper mapping
activation (CHANGE event + the underlying PV in a state "just
activated")
Define auto_activation_handler that activates VGs/LVs automatically
based on the activation/auto_activation_volume_list (activating all
volumes by default if the list is not defined).
The autoactivation is done within the pvscan call in 69-dm-lvmetad.rules
that watches for udev events (device appearance/removal).
For now, this works for non-clustered and complete VGs only.
Remove executable path detection in udev rules and use sbindir that
is configured, but still provide the original functionality by means
of 'configure --enable-udev-rule-exec-detection'.
Normally, the exec path for the tools called in udev rules should
not differ from the sbindir used, however, there are cases this is
necessary. For example different environments could be assembled
in a way that these path differ for some reason (distribution installer,
initrd ...).
This functionality is kept for compatibility only. Any environment
moving the binaries around and using different paths should be fixed
eventually!