IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
There are currently a few issues with the reporting done on RAID LVs and
sub-LVs. The most concerning is that 'lvs' does not always report the
correct failure status of individual RAID sub-LVs (devices). This can
occur when a device fails and is restored after the failure has been
detected by the kernel. In this case, 'lvs' would report all devices are
fine because it can read the labels on each device just fine.
Example:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
However, 'dmsetup status' on the device tells us a different story:
[root@bp-01 lvm2]# dmsetup status vg-lv
0 1024000 raid raid1 2 DA 1024000/1024000
In this case, we must also be sure to check the RAID LVs kernel status
in order to get the proper information. Here is an example of the correct
output that is displayed after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-p 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-p /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-p /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
The other case where 'lvs' gives incomplete or improper output is when a
device is replaced or added to a RAID LV. It should display that the RAID
LV is in the process of sync'ing and that the new device is the only one
that is not-in-sync - as indicated by a leading 'I' in the Attr column.
(Remember that 'i' indicates an (i)mage that is in-sync and 'I' indicates
an (I)mage that is not in sync.) Here's an example of the old incorrect
behaviour:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg Iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0) ** Note that all the images currently are marked as 'I' even though it is
only the last device that has been added that should be marked.
Here is an example of the correct output after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
** Note only the last image is marked with an 'I'. This is correct and we can
tell that it isn't the whole array that is sync'ing, but just the new
device.
It also works under snapshots...
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg owi-a-r-p 33.47 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-p /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-p /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
snap vg swi-a-s-- /dev/sda1(51201)
If there was a nested mountpoint inside an existing mount path,
blkdeactivate could fail to unmount such a mountpoint as it
needs to deactivate the deepest path first and continue upwards.
For example the simplest reproducer:
[root@rhel6-a ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
|-vg-lvol0 (dm-2) 253:2 0 32M 0 lvm /mnt/a
`-vg-lvol1 (dm-3) 253:3 0 32M 0 lvm /mnt/a/b
Before this patch:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/a
umount: /mnt/a: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
UMOUNT: unmounting vg-lvol1 (dm-3) mounted on /mnt/a/b
LVM: deactivating Logical Volume vg/lvol1
(deactivation of vg/lvol0 is skipped as /mnt/a that is on lvol0
can't be unmounted - it still has /mnt/a/b as nested mountpoint!)
With this patch applied:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol1 (dm-3) mounted on /mnt/a/b
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/a
LVM: deactivating Logical Volume vg/lvol0
LVM: deactivating Logical Volume vg/lvol1
===
Also, this patch contains a fix for processing mangled mount paths:
[root@rhel6-a ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
`-vg-lvol0 (dm-2) 253:2 0 32M 0 lvm /mnt/x y z
[root@rhel6-a ~]# lsblk -r
vg-lvol0 253:2 0 32M 0 lvm /mnt/x\x20y\x20z
(the mount path is mangled with \xNN that is visible in raw
lsblk output only and which is used in blkdeactive as well)
Before this patch:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
umount: /mnt/x\x20y\x20z: not found
After this patch applied:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/x\x20y\x20z
LVM: deactivating Logical Volume vg/lvol0
For reseting locale environment into significantly less memory
consuming version 'C' - use LC_ALL instead of LANG since it has
higher priority in locale settings.
Otherwise we may observe whole locale-archive which might be
over 100MB on i.e. Fedora systems locked in memory with
some daemons.
Add log/debug_classes to lvm.conf to allow debug messages to be
classified and filtered at runtime.
The dm_errno field is only used by log_error(), so I've redefined it
for log_debug() messages to hold the message class.
By default, all existing messages appear, but we can add categories that
generate high volumes of data, such as logging all traffic to/from
lvmetad.
We need to call sync_local_dev_names directly as pvscan uses
VG_GLOBAL lock and this one *does not* cause the synchronization
(sync_dev_names) to be called on unlock (VG_GLOBAL is not a real VG):
define unlock_vg(cmd, vol)
do { \
if (is_real_vg(vol)) \
sync_dev_names(cmd); \
(void) lock_vol(cmd, vol, LCK_VG_UNLOCK); \
} while (0)
Without this fix, we end up without udev synchronization for the
pvscan --cache (mainly for -aay that causes the VGs/LVs to be
autoactivated) and also udev synchronization cookies are then left
in the system since they're not managed properly (code before sets
up udev sync cookies, but we have to call dm_udev_wait at least once
after that to do the wait and cleanup).
Before, the pvscan --cache -aay was called on each ADD and CHANGE
uevent (for a device that is not a device-mapper device) and each CHANGE
event (for a PV that is a device-mapper device).
This causes troubles with autoactivation in some cases as CHANGE event
may originate from using the OPTION+="watch" udev rule that is defined
in 60-persistent-storage.rules (part of the rules provided by udev
directly) and it's used for all block devices
(except fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md* devices). For example, the
following sequence incorrectly activates the rest of LVs in a VG if one
of the LVs in the VG is being removed:
[root@rhel6-a ~]# pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[root@rhel6-a ~]# vgcreate vg /dev/sda
Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol0" created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol1" created
[root@rhel6-a ~]# vgchange -an vg
0 logical volume(s) in volume group "vg" now active
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi------ 4.00m
lvol1 vg -wi------ 4.00m
[root@rhel6-a ~]# lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi-a---- 4.00m
...so the vg was deactivated, then lvol1 removed, and we end up with
lvol1 removed (which is ok) BUT with lvol0 activated (which is wrong)!!!
This is because after lvol1 removal, we need to write metadata to the
underlying device /dev/sda and that causes the CHANGE event to be
generated (because of the WATCH udev rule set on this device) and this
causes the pvscan --cache -aay to be reevaluated.
We have to limit this and call pvscan --cache -aay to autoactivate
VGs/LVs only in these cases:
--> if the *PV is not a dm device*, scan only after proper device
addition (ADD event) and not with any other changes (CHANGE event)
--> if the *PV is a dm device*, scan only after proper mapping
activation (CHANGE event + the underlying PV in a state "just
activated")
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
If the lvmcache_info_from_pvid() fails to find valid
info, invoke the lookup by dev, and only in this case
call lvmcache_info_from_pvid() again.
Also check for the result of info and return
error directly, so the NULL is not passed
to lvmcache_get_label().
If we fail to get memory for mutex, hash the mutex
or fail somewhere along pthread function calls
return allocated resources back and unlock vg_lock_map mutex.
Detect failure of dm_pool_strdup() and print error in fail path.
Save one extra strchr call - since we already know the distance
for the '=' character.
Drop stack trace from return after log_error().
When the abort_on_internal_errors is enabled, we aborted prior
the syslog logging output.
Since such fatal error gets level _LOG_FATAL it should
not be blocked by debug_level() check so lets move it further,
to get abort error logged also via syslog.
Tabify
Remove use of asize, unneeded.
Don't initialize lvobj->parent_vgobj to NULL, the object ctor already
zeroed everything on alloc.
Redo call to lvm_lv_snapshot to use the liblvm snapshot implementation
we went with.
Add {}s to silence warning in lv_dealloc.
Rename snapshot function for consistency.
Update WHATS_NEW.
Signed-off-by: Andy Grover <agrover@redhat.com>
Function _ignore_blocked_mirror_devices was not release
allocated strings images_health and log_health.
In error paths it was also not releasing dm_task structure.
Swaped return code of _ignore_blocked_mirror_devices and
use 1 as success.
In _parse_mirror_status use log_error if memory allocation
fails and few more errors so they are no going unnoticed
as debug messages.
On error path always clear return values and free strings.
For dev_create_file use cache mem pool to avoid memleak.
Attempting pvmove on RAID LVs replaces the kernel RAID target with
a temporary pvmove target, ultimately destroying the RAID LV. pvmove
must be prevented on RAID LVs for now.
Use 'lvconvert --replace old_pv vg/lv new_pv' if you want to move
an image of the RAID LV.
In case we don't want to activate, autoactivate or have the
VG/LV read-only. Primarily targeted for the auto_activation_volume_list,
but it makes no harm for other settings (the part of the code
that reads these three settings is shared, but there's no
reason to separate it only for this change).
Rework thin feature detection to support runtime
section to allow to disable them selectively.
New lvm.conf option is born: global/thin_disabled_features
Support swapping of metadata device if the thin pool already
exists. This way it's easy to i.e. resize metadata or their
repair operation.
User may create some empty LV, replace existing metadata
or dump and restore them into bigger LV.
Setting this environment variable will cause a full fallback
to old direct node and symlink management in libdevmapper and lvm2.
It means:
- disabling udev synchronization
(--noudevsync in dmsetup and --noudevsync + activation/udev_sync=0
lvm2 config)
- disabling dm and any subsystem related udev rules
(--noudevrules in dmsetup and activation/udev_rules=0 lvm2 config)
- management of nodes/symlinks under /dev directly by libdevmapper/lvm2
(--verifyudev in dmsetup and activation/verify_udev_operations=1
lvm2 config)
- not obtaining any device list from udev database
(devices/obtain_device_list_from_udev=0 lvm2 config)
Note: we could set all of these before - there's no functional change!
However the DM_DISABLE_UDEV environment variable is a nice shortcut
to make it easier for libdevmapper users so that one can switch off all
of the udev management off at one go directly on the command line,
without a need to modify any source or add any extra switches.
If udev synchronization is disabled by means of --noudevsync
option, we should disable just the synchronization and nothing else.
The udev fallback (verifying udev operations and fixing the
nodes/symlinks if found incorrect) is orthogonal and controlled
by a separate activation/verify_udev_operations configuration option.
Allow restoring metadata with thin pool volumes.
No validation is done for this case within vgcfgrestore tool -
thus incorrect metadata may lead to destruction of pool content.
Configurable settings for thin pool create
if they are not specified on command line.
New supported lvm.conf options are:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
Check if target supports discards for chunk sizes,
that are not power of 2 (just multiple of 64K),
and enable it in case it's supported by thin kernel target.
Similar to the way the 'mirror', 'raid1' and 'raid10' segment types set
the number of mirrors to 2 ('-m 1') if the argument is not specified,
here we set the number of stripes to 2 if not given on the command line
when creating a RAID10 LV.
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.
This patch is intended to fix bug 825323 - FS turns read-only during a double
fault of a mirror leg and mirrored log's leg at the same time. It only
affects a 2-way mirror with a mirrored log. 3+-way mirrors and mirrors
without a mirrored log are not affected.
The problem resulted from the fact that the top level mirror was not
using 'noflush' when suspending before its "down-convert". When a
mirror image fails, the bios are queue until a suspend is recieved. If
it is a 'noflush' suspend, the bios can be safely requeued in the DM
core. If 'noflush' is not used, the bios must be pushed through the
target and if a device is failed for a mirror, that means issuing an
error. When an error is received by a file system, it results in it
turning read-only (depending on the FS).
Part of the problem was is due to the nature of the stacking involved in
using a mirror as a mirror's log. When an image in each fail, the top
level mirror stalls because it is waiting for a log flush. The other
stalls waiting for corrective action. When the repair command is issued,
the entire stacked arrangement is collapsed to a linear LV. The log
flush then fails (somewhat uncleanly) and the top-level mirror is suspended
without 'noflush' because it is a linear device.
This patch allows the log to be repaired first, which in turn allows the
top-level mirror's log flush to complete cleanly. The top-level mirror
is then secondarily reduced to a linear device - at which time this mirror
is suspended properly with 'noflush'.
Don't use lvmetad in lvm2-monitor.service ExecStop to avoid a systemd issue.
- a systemd design issue while processing dependencies
with socket-based activation that ends up with a hang
- https://bugzilla.redhat.com/show_bug.cgi?id=843587
(also tracker bug https://bugzilla.redhat.com/show_bug.cgi?id=871527)
- not using lvmetad in this case is just a workaround, once the bug
above is resolved, we should enable the lvmetad in that specific case
Remove dependency on fedora-storage-init.service in lvm2 systemd units.
- fedora-storage-init.service and fedora-storage-init-late.service is
going to be separated into respective units that belong to each block
device subsystem:
- mpath + mdraid activated via udev solely
- dmraid with its own dmraid-activation.service unit
- lvm2 with the lvm2-activation-generator to generate the
activation units runtime if lvmetad disabled
(global/use_lvmetad=0 set in lvm.conf) and activation done
via udev+lvmetad if lvmetad enabled (global/use_lvmetad=1 set
in lvm.conf)
Depend on lvm2-lvmetad.socket in lvm2-monitor.service systemd unit.
- as lvm2-monitor uses lvmetad if lvmetad is enabled
Commit 9fd7ac7d03 did not handle mirrors
that contained mirrored logs. This is because the status line of the
mirror does not give an indication of the health of the mirrored log,
as you can see here:
[root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog
vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A
vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core
Thus, the possibility for LVM commands to hang still persists when mirror
have mirrored logs. I discovered this while performing some testing that
does polling with 'pvs' while doing I/O and killing devices. The 'pvs'
managed to get between the mirrored log device failure and the attempt
by dmeventd to repair it. The result was a very nasty block in LVM
commands that is very difficult to remove - even for someone who knows
what is going on. Thus, it is absolutely essential that the log of a
mirror be recursively checked for mirror devices which may be failed
as well.
Despite what the code comment says in the aforementioned commit...
+ * _mirrored_transient_status(). FIXME: It is unable to handle mirrors
+ * with mirrored logs because it does not have a way to get the status of
+ * the mirror that forms the log, which could be blocked.
... it is possible to get the status of the log because the log device
major/minor is given to us by the status output of the top-level mirror.
We can use that to query the log device for any DM status and see if it
is a mirror that needs to be bypassed. This patch does just that and is
now able to avoid reading from mirrors that have failed devices in a
mirrored log.