IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
A new function (dm_tree_node_force_identical_table_reload) was added to
avoid the suppression of identical table reloads. This allows RAID LVs
to reload the on-disk superblock information that contains which devices
have failed and the bitmaps. If the failed device has returned, this has
the effect of restoring the device and initiating recovery. Without this
patch, the user had to completely deactivate their RAID LV and re-activate
it in order to restore the failed device. Now they simply need to
suspend and resume (which is done by 'lvchange --refresh').
The identical table suppression is only avoided if the LV is not PARTAIL
(i.e. all of it's devices can be seen and read by LVM) and the kernel
status of the array contains failed devices. In other words, the function
will only be called in the case where we may have success in restoring
a failed device in the array.
When there are missing PVs in a volume group, most operations that alter
the LVM metadata are disallowed. It turns out that 'vgimport' is one of
those disallowed operations. This is bad because it creates a circular
dependency. 'vgimport' will complain that the VG is inconsistent and that
'vgreduce --removemissing' must be run. However, 'vgreduce' cannot be run
because it has not been imported. Therefore, 'vgimport' must be one of
the operations allowed to change the metadata when PVs are missing. The
'--force' option is the way to make 'vgimport' happen in spite of the
missing PVs.
If zero metadata copies are used, there's no further recalculation of
PV alignment that happens when adding metadata areas to the PV and
which actually calculates the alignment correctly as a matter of fact.
So fix this for "PV without MDA" case as well.
Before this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 8.00m
After this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
Also, remove a superfluous condition "pv->pe_start < pv->pe_align" in:
if (pe_start == PV_PE_START_CALC && pv->pe_start < pv->pe_align)
pv->pe_start = pv->pe_align ...
This part of the condition is not reachable as with the PV_PE_START_CALC,
we always have pv->pe_start set to 0 from the PV struct initialisation
(...the pv->pe_start value is just being calculated).
If '--mirrors/-m' and '--stripes/-i' are used together when creating
a logical volume, mirrors-over-stripes is currently chosen. The user
can override this by using the '--type raid10' option on creation.
However, we want a place where we can set the default behavior to
'raid10' explicitly - similar to the "mirror" and "raid1" tunable,
mirror_segtype_default.
A follow-on patch should use this new setting to change the default
from "mirror" to "raid10", as this is the preferred segment type.
When a device fails, we may wish to replace those segments with an
error segment. (Like when a 'vgreduce --removemissing' removes a
failed device that happens to be a RAID image/meta.) We are then left
with images that we will eventually want to remove or replace.
This patch allows us to pull out these virtual "error" sub-LVs. This
allows a user to 'lvconvert -m -1 vg/lv' to extract the bad sub-LVs.
Sub-LVs with error segments are considered for extraction before other
possible devices so that good devices are not accidentally removed.
This patch also adds the ability to replace RAID images that contain error
segments. The user will still be unable to run 'lvconvert --replace'
because there is no way to address the 'error' segment (i.e. no PV
that it is associated with). However, 'lvconvert --repair' can be
used to replace the image's error segment with a new PV. This is also
the most appropriate way to do it, since the LV will continue to be
reported as 'partial'.
Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.
Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.
Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes. Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting. We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
Instead of check for lv_is_active() for thin pool LV,
query the whole pool via new pool_is_active().
Fixes a problem when we cannot change discards settings
for active pool device where the actual layer for pool
device was inactive, but thin volumes using thin pool
have been active.
This internal function check for active pool device.
For cluster it checks every thin volume,
On the non-clustered VG we need to check just
for presence of -tpool device.
Update the error path after problems with suspend_lv or vg_commit.
It's not exactly well defined what should happen, and this
code seems to appear in many different instancies<F2> in the
whole source code tree - we should probably pick the best version.
On glibc, those are erroneously (namespace pollution) pulled in via
other headers. this doesn't work with conformant libcs (musl libc in
this case), we simply need to include all needed headers.
Signed-Off-By: John Spencer <maillist-lvm@barfooze.de>
This patch fixes problem reported here:
https://www.redhat.com/archives/dm-devel/2013-January/msg00311.html
Fixing it by separating function for duplicating string token.
---
When /etc/lvm/lvm.conf is truncated at the first '"' of a line, all LVM
utilities crash with a segfault.
The segfault only seems to occur if the last character is the first '"'
(double quote) of a line. If you truncate it at any other point, lvm
detects the error and report parse error
lvm.conf ends like this.
$hexdump -C lvm.conf
....
69 72 20 3d 20 22 2f 64 65 76 22 0a 0a 0a 20 20 |ir = "/dev"... |
20 20 23 20 41 6e 20 61 72 72 61 79 20 6f 66 20 | # An array of |
64 69 72 65 63 74 6f 72 69 65 73 20 74 68 61 74 |directories that|
20 63 6f 6e 74 61 69 6e 20 74 68 65 20 64 65 76 | contain the dev|
69 63 65 20 6e 6f 64 65 73 20 79 6f 75 20 77 69 |ice nodes you wi|
73 68 0a 20 20 20 20 23 20 74 6f 20 75 73 65 20 |sh. # to use |
77 69 74 68 20 4c 56 4d 32 2e 0a 20 20 20 20 73 |with LVM2.. s|
63 61 6e 20 3d 20 5b 20 22 2f 78 22 2c 0a 20 20 |can = [ "/x",. |
20 20 20 20 20 20 20 20 20 20 20 22 | "|
...
Reported-by: dongmao zhang <dmzhang suse com>
There are currently a few issues with the reporting done on RAID LVs and
sub-LVs. The most concerning is that 'lvs' does not always report the
correct failure status of individual RAID sub-LVs (devices). This can
occur when a device fails and is restored after the failure has been
detected by the kernel. In this case, 'lvs' would report all devices are
fine because it can read the labels on each device just fine.
Example:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
However, 'dmsetup status' on the device tells us a different story:
[root@bp-01 lvm2]# dmsetup status vg-lv
0 1024000 raid raid1 2 DA 1024000/1024000
In this case, we must also be sure to check the RAID LVs kernel status
in order to get the proper information. Here is an example of the correct
output that is displayed after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-p 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-p /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-p /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
The other case where 'lvs' gives incomplete or improper output is when a
device is replaced or added to a RAID LV. It should display that the RAID
LV is in the process of sync'ing and that the new device is the only one
that is not-in-sync - as indicated by a leading 'I' in the Attr column.
(Remember that 'i' indicates an (i)mage that is in-sync and 'I' indicates
an (I)mage that is not in sync.) Here's an example of the old incorrect
behaviour:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg Iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0) ** Note that all the images currently are marked as 'I' even though it is
only the last device that has been added that should be marked.
Here is an example of the correct output after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
** Note only the last image is marked with an 'I'. This is correct and we can
tell that it isn't the whole array that is sync'ing, but just the new
device.
It also works under snapshots...
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg owi-a-r-p 33.47 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-p /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-p /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
snap vg swi-a-s-- /dev/sda1(51201)
We can avoid many dev_manager (ioctl) calls by caching the results of
previous calls to lv_raid_dev_health. Just considering the case where
'lvs -a' is called to get the attributes of a RAID LV and its sub-lvs,
this function would be called many times. (It would be called at least
7 times for a 3-way RAID1 - once for the health of each sub-LV and once
for the health of the top-level LV.) This is a good idea because the
sub-LVs are processed in groups along with their parent RAID LV and in
each case, it is the parent LV whose status will be queried. Therefore,
there only needs to be one trip through dev_manager for each time the
group is processed.
Similar to the way thin* accesses its kernel status, we add a method
for RAID to grab the various values in its status output without the
higher levels (LVM) having to understand how to parse the output.
Added functions include:
- lib/activate/dev_manager.c:dev_manager_raid_status()
Pulls the status line from the kernel
- libdm/libdm-deptree.c:dm_get_status_raid()
Parses status line and puts components into dm_status_raid struct
- lib/activate/activate.c:lv_raid_dev_health()
Accesses dm_status_raid to deliver raid dev_health string
The new structure and functions can provide a more unified way to access
status information. ('lv_raid_percent' could switch to using these
functions, for example.)
If there was a nested mountpoint inside an existing mount path,
blkdeactivate could fail to unmount such a mountpoint as it
needs to deactivate the deepest path first and continue upwards.
For example the simplest reproducer:
[root@rhel6-a ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
|-vg-lvol0 (dm-2) 253:2 0 32M 0 lvm /mnt/a
`-vg-lvol1 (dm-3) 253:3 0 32M 0 lvm /mnt/a/b
Before this patch:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/a
umount: /mnt/a: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
UMOUNT: unmounting vg-lvol1 (dm-3) mounted on /mnt/a/b
LVM: deactivating Logical Volume vg/lvol1
(deactivation of vg/lvol0 is skipped as /mnt/a that is on lvol0
can't be unmounted - it still has /mnt/a/b as nested mountpoint!)
With this patch applied:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol1 (dm-3) mounted on /mnt/a/b
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/a
LVM: deactivating Logical Volume vg/lvol0
LVM: deactivating Logical Volume vg/lvol1
===
Also, this patch contains a fix for processing mangled mount paths:
[root@rhel6-a ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
`-vg-lvol0 (dm-2) 253:2 0 32M 0 lvm /mnt/x y z
[root@rhel6-a ~]# lsblk -r
vg-lvol0 253:2 0 32M 0 lvm /mnt/x\x20y\x20z
(the mount path is mangled with \xNN that is visible in raw
lsblk output only and which is used in blkdeactive as well)
Before this patch:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
umount: /mnt/x\x20y\x20z: not found
After this patch applied:
[root@rhel6-a ~]# blkdeactivate -u
Deactivating block devices:
UMOUNT: unmounting vg-lvol0 (dm-2) mounted on /mnt/x\x20y\x20z
LVM: deactivating Logical Volume vg/lvol0
For reseting locale environment into significantly less memory
consuming version 'C' - use LC_ALL instead of LANG since it has
higher priority in locale settings.
Otherwise we may observe whole locale-archive which might be
over 100MB on i.e. Fedora systems locked in memory with
some daemons.
The idea is to avoid a period when an existing VG is not mapped to either the
old or the new name. (Note that the brief "blackout" was present even if the
name did not actually change.) We instead allow a brief overlap of a VG existing
under both names, i.e. a query for a VG might succeed but before a lock is
acquired the VG disappears.
Fix this:
pvcreate /dev/scma
Device /dev/scma not found (or ignored by filtering).
Reported-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>