IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use the field 'origin' for reporting external origin lv name.
For thin volumes with external origin, report the size of
external origin size via:
lvs -o+origin_size
If zero metadata copies are used, there's no further recalculation of
PV alignment that happens when adding metadata areas to the PV and
which actually calculates the alignment correctly as a matter of fact.
So fix this for "PV without MDA" case as well.
Before this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 8.00m
After this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
PV 1st PE
/dev/sda 12.00m
Also, remove a superfluous condition "pv->pe_start < pv->pe_align" in:
if (pe_start == PV_PE_START_CALC && pv->pe_start < pv->pe_align)
pv->pe_start = pv->pe_align ...
This part of the condition is not reachable as with the PV_PE_START_CALC,
we always have pv->pe_start set to 0 from the PV struct initialisation
(...the pv->pe_start value is just being calculated).
When a device fails, we may wish to replace those segments with an
error segment. (Like when a 'vgreduce --removemissing' removes a
failed device that happens to be a RAID image/meta.) We are then left
with images that we will eventually want to remove or replace.
This patch allows us to pull out these virtual "error" sub-LVs. This
allows a user to 'lvconvert -m -1 vg/lv' to extract the bad sub-LVs.
Sub-LVs with error segments are considered for extraction before other
possible devices so that good devices are not accidentally removed.
This patch also adds the ability to replace RAID images that contain error
segments. The user will still be unable to run 'lvconvert --replace'
because there is no way to address the 'error' segment (i.e. no PV
that it is associated with). However, 'lvconvert --repair' can be
used to replace the image's error segment with a new PV. This is also
the most appropriate way to do it, since the LV will continue to be
reported as 'partial'.
Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.
Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.
Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes. Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting. We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
This internal function check for active pool device.
For cluster it checks every thin volume,
On the non-clustered VG we need to check just
for presence of -tpool device.
There are currently a few issues with the reporting done on RAID LVs and
sub-LVs. The most concerning is that 'lvs' does not always report the
correct failure status of individual RAID sub-LVs (devices). This can
occur when a device fails and is restored after the failure has been
detected by the kernel. In this case, 'lvs' would report all devices are
fine because it can read the labels on each device just fine.
Example:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
However, 'dmsetup status' on the device tells us a different story:
[root@bp-01 lvm2]# dmsetup status vg-lv
0 1024000 raid raid1 2 DA 1024000/1024000
In this case, we must also be sure to check the RAID LVs kernel status
in order to get the proper information. Here is an example of the correct
output that is displayed after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-p 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-p /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-p /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
The other case where 'lvs' gives incomplete or improper output is when a
device is replaced or added to a RAID LV. It should display that the RAID
LV is in the process of sync'ing and that the new device is the only one
that is not-in-sync - as indicated by a leading 'I' in the Attr column.
(Remember that 'i' indicates an (i)mage that is in-sync and 'I' indicates
an (I)mage that is not in sync.) Here's an example of the old incorrect
behaviour:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg Iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0) ** Note that all the images currently are marked as 'I' even though it is
only the last device that has been added that should be marked.
Here is an example of the correct output after this patch is applied:
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg iwi-aor-- /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
** Note only the last image is marked with an 'I'. This is correct and we can
tell that it isn't the whole array that is sync'ing, but just the new
device.
It also works under snapshots...
[root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg
LV VG Attr Cpy%Sync Devices
lv vg owi-a-r-p 33.47 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] vg iwi-aor-- /dev/sda1(1)
[lv_rimage_1] vg Iwi-aor-p /dev/sdb1(1)
[lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1)
[lv_rmeta_0] vg ewi-aor-- /dev/sda1(0)
[lv_rmeta_1] vg ewi-aor-p /dev/sdb1(0)
[lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0)
snap vg swi-a-s-- /dev/sda1(51201)
fmt1 doesn't have a separate commit function: updates take effect
immediately vg_write is called, so we must update lvmetad at this
point if we're going to go on and ask lvmetad for the VG metadata
again before calling the commit function (though that's probably an
unsupported and pointless thing to do anyway as the client must
already have that data and it cannot have changed because it's locked
and with devs suspended we shouldn't be communicating with lvmetad;
so when that's fixed properly, this fix here can be reverted).
This problem showed up as an internal error when lvremoving an LVM1
snapshot.
> Internal error: LV snap1 (00000000000000000000000000000001) missing from preload metadata
https://bugzilla.redhat.com/891855
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
If the lvmcache_info_from_pvid() fails to find valid
info, invoke the lookup by dev, and only in this case
call lvmcache_info_from_pvid() again.
Also check for the result of info and return
error directly, so the NULL is not passed
to lvmcache_get_label().
Commit bf2741376d started to use
lv_is_active() instead of call for lv_info & info.exists so
we cover also cluster activated devices.
For snapshost the conversion was not correct and introduced
regression by blocking creation of snapshot of inactive LV.
Fix it by assigning lv_is_active() directly.
Note: we still have minor issue to fix - to make
lv_is_???? function able to return error states since
lv_info() may fail.
Target tells us its version, and we may allow different set of options
to be supported with different version of driver.
Idea is to provide individual feature flags and later be
able to query for them.
The 'copy_percent' function takes the 'extents_copied' field from each
segment in an LV to create the numerator for the ratio that is to
become the copy_percent. (Otherwise known as the 'sync' percent for
non-pvmove uses, like mirror LVs and RAID LVs.) This function safely
works on RAID - not just mirrors - so it is better to have it in
lv_manip.c rather than mirror.c.
There's a lot of different functions that do a lot of different things
in lv_manip.c, so I placed the function near a function in lv_manip.c
that it was close to in metadata-exported.h. Different placement in the
file or a different name for the function may be useful.
Use log_warn to print non-fatal warning messages.
Use of log_error would confuse checker for testing
whether proper error has been reported for some real error.
A message is printed when the region_size of a RAID LV is adjusted
to allow for large (> ~1TB) LVs. The message wasn't very clear.
Hopefully, this is better.
It would be possible to activate a RAID LV exclusively in a cluster
volume group, but for now we do not allow RAID LVs to exist in a
clustered volume group at all. This has two components:
1) Do not allow RAID LVs to be created in a clustered VG
2) Do not allow changing a VG from single-machine to clustered
if there are RAID LVs present.
MD's bitmaps can handle 2^21 regions at most. The RAID code has always
used a region_size of 1024 sectors. That means the size of a RAID LV was
limited to 1TiB. (The user can adjust the region_size when creating a
RAID LV, which can affect the maximum size.) Thus, creating, extending or
converting to a RAID LV greater than 1TiB would result in a failure to
load the new device-mapper table.
Again, the size of the RAID LV is not limited by how much space is allocated
for the metadata area, but by the limitations of the MD bitmap. Therefore,
we must adjust the 'region_size' to ensure that the number of regions does
not exceed the limit. I've added code to do this when extending a RAID LV
(which covers 'create' and 'extend' operations) and when up-converting -
specifically from linear to RAID1.
We were using daemon_send_simple until now, but it is no longer adequate, since
we need to manipulate requests in a generic way (adding a validity token to each
request), and the tree-based request interface is much more suitable for this.
Don't try to issue discards to a missing PV to avoid segfault.
Prevent lvremove from removing LVs that have any part missing.
https://bugzilla.redhat.com/857554
Failing to clear the LV_NOTSYNCED flag when converting a RAID1 LV to
linear can result in the flag being present after an upconvert - even
if the sync is performed when upconverting.
Mirrors do not allow upconverting if the LV has been created with --nosync.
We will enforce the same rule for RAID1. It isn't hugely critical, since
the portions that have been written will be copied over to the new device
identically from either of the existing images. However, the unwritten
sections may be different, causing the added image to be a hybrid of the
existing images.
Also, we are disallowing the addition of new images to a RAID1 LV that has
not completed the initial sync. This may be different from mirroring, but
that is due to the fact that the 'mirror' segment type "stacks" when adding
a new image and RAID1 does not. RAID1 will rebuild a newly added image
"inline" from the existant images, so they should be in-sync.
We cannot add images to a RAID array while it is not in-sync. The
kernel will simply reject the table, saying:
'rebuild' specified while array is not in-sync
Now we check to ensure the LV is in-sync before attempting image
additions.
It is necessary when creating a RAID LV to clear the new metadata areas.
Failure to do so could result in a prepopulated bitmap that would cause
the new array to skip syncing portions of the array. It is a requirement
that the metadata LVs be activated and cleared in the process of creating.
However in test mode, this requirement should be lifted - no new LVs should
be created or written to.
When printing a message for the user and the lv_segment pointer is available,
use segtype->ops->name() instead of segtype->name. This gives a better
user-readable name for the segment. This is especially true for the
'striped' segment type, which prints "linear" if there is an area_count of
one.