IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Support remove of thin volumes With --force --force
when thin pools is damaged.
This way it's possible to remove thin pool with
unrepairable metadata without requiring to
manually edit lvm2 metadata.
lvremove -ff vg/pool
removes all thin volumes and pool even when
thin pool cannot be activated (to accept
removal of thin volumes in kernel metadata)
Since vg_name inside /lib function has already been ignored mostly
except for a few debug prints - make it and official internal API
feature.
vg_name is used only in /tools while the VG is not yet openned,
and when lvresize/lvcreate /lib function is called with VG pointer
already being used, then vg_name becomes irrelevant (it's not been
validated anyway).
So any internal user of lvcreate_params and lvresize_params does not
need to set vg_name pointer and may leave it NULL.
When creating pool's metadata - create initial LV for clearing with some
generic name and after the volume is create & cleared - rename it to
reserved name '_tmeta/_cmeta'.
We should not expose 'reserved' names for public LVs.
When repairing RAID LVs that have multiple PVs per image, allow
replacement images to be reallocated from the PVs that have not
failed in the image if there is sufficient space.
This allows for scenarios where a 2-way RAID1 is spread across 4 PVs,
where each image lives on two PVs but doesn't use the entire space
on any of them. If one PV fails and there is sufficient space on the
remaining PV in the image, the image can be reallocated on just the
remaining PV.
Previously, the seg_pvs used to track free and allocated space where left
in place after 'release_pv_segment' was called to free space from an LV.
Now, an attempt is made to combine any adjacent seg_pvs that also track
free space. Usually, this doesn't provide much benefit, but in a case
where one command might free some space and then do an allocation, it
can make a difference. One such case is during a repair of a RAID LV,
where one PV of a multi-PV image fails. This new behavior is used when
the replacement image can be allocated from the remaining space of the
PV that did not fail. (First the entire image with the failed PV is
removed. Then the image is reallocated from the remaining PVs.)
I've changed build_parallel_areas_from_lv to take a new parameter
that allows the caller to build parallel areas by LV vs by segment.
Previously, the function created a list of parallel areas for each
segment in the given LV. When it came time for allocation, the
parallel areas were honored on a segment basis. This was problematic
for RAID because any new RAID image must avoid being placed on any
PVs used by other images in the RAID. For example, if we have a
linear LV that has half its space on one PV and half on another, we
do not want an up-convert to use either of those PVs. It should
especially not wind up with the following, where the first portion
of one LV is paired up with the second portion of the other:
------PV1------- ------PV2-------
[ 2of2 image_1 ] [ 1of2 image_1 ]
[ 1of2 image_0 ] [ 2of2 image_0 ]
---------------- ----------------
Previously, it was possible for this to happen. The change makes
it so that the returned parallel areas list contains one "super"
segment (seg_pvs) with a list of all the PVs from every actual
segment in the given LV and covering the entire logical extent range.
This change allows RAID conversions to function properly when there
are existing images that contain multiple segments that span more
than one PV.
...to avoid using cached value (persistent filter) and therefore
not noticing any change made after last scan/filtering - the state
of the device may have changed, for example new signatures added.
$ lvm dumpconfig --type diff
allocation {
use_blkid_wiping=0
}
devices {
obtain_device_list_from_udev=0
}
$ cat /etc/lvm/cache/.cache | grep sda
$ vgscan
Reading all physical volumes. This may take a while...
Found volume group "fedora" using metadata type lvm2
$ cat /etc/lvm/cache/.cache | grep sda
"/dev/sda",
$ parted /dev/sda mklabel gpt
Information: You may need to update /etc/fstab.
$ parted /dev/sda print
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 134MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
$ cat /etc/lvm/cache/.cache | grep sda
"/dev/sda",
====
Before this patch:
$ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
With this patch applied:
$ pvcreate /dev/sda
Physical volume /dev/sda not found
Device /dev/sda not found (or ignored by filtering).
'lvs' would segfault if trying to display the "move pv" if the
pvmove was run with '--atomic'. The structure of an atomic pvmove
is different and requires us to descend another level in the
LV tree to retrieve the PV information.
In 'find_pvmove_lv', separate the code that searches the atomic
pvmove LVs from the code that searches the normal pvmove LVs. This
cleans up the segment iterator code a bit.
pvmove can be used to move single LVs by name or multiple LVs that
lie within the specified PV range (e.g. /dev/sdb1:0-1000). When
moving more than one LV, the portions of those LVs that are in the
range to be moved are added to a new temporary pvmove LV. The LVs
then point to the range in the pvmove LV, rather than the PV
range.
Example 1:
We have two LVs in this example. After they were
created, the first LV was grown, yeilding two segments
in LV1. So, there are two LVs with a total of three
segments.
Before pvmove:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
After pvmove inserts the temporary pvmove LV:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
Each of the affected LV segments now point to a
range of blocks in the pvmove LV, which purposefully
corresponds to the segments moved from the original
LVs into the temporary pvmove LV.
The current implementation goes on from here to mirror the temporary
pvmove LV by segment. Further, as the pvmove LV is activated, only
one of its segments is actually mirrored (i.e. "moving") at a time.
The rest are either complete or not addressed yet. If the pvmove
is aborted, those segments that are completed will remain on the
destination and those that are not yet addressed or in the process
of moving will stay on the source PV. Thus, it is possible to have
a partially completed move - some LVs (or certain segments of LVs)
on the source PV and some on the destination.
Example 2:
What 'example 1' might look if it was half-way
through the move.
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
| -------------------------
source PV | | 256 - 511 | 512 - 767 |
| -------------------------
| ||
-------------------------
dest PV | 000 - 255 | 256 - 511 |
-------------------------
This update allows the user to specify that they would like the
pvmove mirror created "by LV" rather than "by segment". That is,
the pvmove LV becomes an image in an encapsulating mirror along
with the allocated copy image.
Example 3:
A pvmove that is performed "by LV" rather than "by segment".
--------- ---------
| LV1s0 | | LV2s0 |
--------- ---------
| |
-------------------------
pvmove0 | * LV-level mirror * |
-------------------------
/ \
pvmove_mimage0 / pvmove_mimage1
------------------------- -------------------------
| seg 0 | seg 1 | | seg 0 | seg 1 |
------------------------- -------------------------
| | | |
------------------------- -------------------------
| 000 - 255 | 256 - 511 | | 000 - 255 | 256 - 511 |
------------------------- -------------------------
source PV dest PV
The thing that differentiates a pvmove done in this way and a simple
"up-convert" from linear to mirror is the preservation of the
distinct segments. A normal up-convert would simply allocate the
necessary space with no regard for segment boundaries. The pvmove
operation must preserve the segments because they are the critical
boundary between the segments of the LVs being moved. So, when the
pvmove copy image is allocated, all corresponding segments must be
allocated. The code that merges ajoining segments that are part of
the same LV when the metadata is written must also be avoided in
this case. This method of mirroring is unique enough to warrant its
own definitional macro, MIRROR_BY_SEGMENTED_LV. This joins the two
existing macros: MIRROR_BY_SEG (for original pvmove) and MIRROR_BY_LV
(for user created mirrors).
The advantages of performing pvmove in this way is that all of the
LVs affected can be moved together. It is an all-or-nothing approach
that leaves all LV segments on the source PV if the move is aborted.
Additionally, a mirror log can be used (in the future) to provide tracking
of progress; allowing the copy to continue where it left off in the event
there is a deactivation.
The list of strings is used quite frequently and we'd like to reuse
this simple structure for report selection support too. Make it part
of libdevmapper for general reuse throughout the code.
This also simplifies the LVM code a bit since we don't need to
include and manage lvm-types.h anymore (the string list was the
only structure defined there).
When creating a cache LV with a RAID origin, we need to ensure that
the sub-LVs of that origin properly change their names to include
the "_corig" extention of the top-level LV. We do this by first
performing a 'lv_rename_update' before making the call to
'insert_layer_for_lv'.
- When defining configuration source, the code now uses separate
CONFIG_PROFILE_COMMAND and CONFIG_PROFILE_METADATA markers
(before, it was just CONFIG_PROFILE that did not make the
difference between the two). This helps when checking the
configuration if it contains correct set of options which
are all in either command-profilable or metadata-profilable
group without mixing these groups together - so it's a firm
distinction. The "command profile" can't contain
"metadata profile" and vice versa! This is strictly checked
and if the settings are mixed, such profile is rejected and
it's not used. So in the end, the CONFIG_PROFILE_COMMAND
set of options and CONFIG_PROFILE_METADATA are mutually exclusive
sets.
- Marking configuration with one or the other marker will also
determine the way these configuration sources are positioned
in the configuration cascade which is now:
CONFIG_STRING -> CONFIG_PROFILE_COMMAND -> CONFIG_PROFILE_METADATA -> CONFIG_FILE/CONFIG_MERGED_FILES
- Marking configuration with one or the other marker will also make
it possible to issue a command context refresh (will be probably
a part of a future patch) if needed for settings in global profile
set. For settings in metadata profile set this is impossible since
we can't refresh cmd context in the middle of reading VG/LV metadata
and for each VG/LV separately because each VG/LV can have a different
metadata profile assinged and it's not possible to change these
settings at this level.
- When command profile is incorrect, it's rejected *and also* the
command exits immediately - the profile *must* be correct for the
command that was run with a profile to be executed. Before this
patch, when the profile was found incorrect, there was just the
warning message and the command continued without profile applied.
But it's more correct to exit immediately in this case.
- When metadata profile is incorrect, we reject it during command
runtime (as we know the profile name from metadata and not early
from command line as it is in case of command profiles) and we
*do continue* with the command as we're in the middle of operation.
Also, the metadata profile is applied directly and on the fly on
find_config_tree_* fn call and even if the metadata profile is
found incorrect, we still need to return the non-profiled value
as found in the other configuration provided or default value.
To exit immediately even in this case, we'd need to refactor
existing find_config_tree_* fns so they can return error. Currently,
these fns return only config values (which end up with default
values in the end if the config is not found).
- To check the profile validity before use to be sure it's correct,
one can use :
lvm dumpconfig --commandprofile/--metadataprofile ProfileName --validate
(the --commandprofile/--metadataprofile for dumpconfig will come
as part of the subsequent patch)
- This patch also adds a reference to --commandprofile and
--metadataprofile in the cmd help string (which was missing before
for the --profile for some commands). We do not mention --profile
now as people should use --commandprofile or --metadataprofile
directly. However, the --profile is still supported for backward
compatibility and it's translated as:
--profile == --metadataprofile for lvcreate, vgcreate, lvchange and vgchange
(as these commands are able to attach profile to metadata)
--profile == --commandprofile for all the other commands
(--metadataprofile is not allowed there as it makes no sense)
- This patch also contains some cleanups to make the code handling
the profiles more readable...
When quering for dmeventd monitoring status, check first
if lvm2 is configured to monitor to avoid unwanted start
of dmeventd process for answering monitoring status.
Given a named mirror LV, vgsplit will look for the PVs that compose it
and move them to a new VG. It does this by first looking at the log
and then the legs. If the log is on the same device as one of the mirror
images, a problem occurs. This is because the PV is moved to the new VG
as the log is processed and thus cannot be found in the current VG when
the image is processed. The solution is to check and see if the PV we are
looking for has already been moved to the new VG. If so, it is not an
error.
Perform two allocation attempts with cling if maximise_cling is set,
first with then without positional fill.
Avoid segfaults from confusion between positional and sorted sequential
allocation when number of stripes varies as reported here:
https://www.redhat.com/archives/linux-lvm/2014-March/msg00001.html
Set A_POSITIONAL_FILL if the array of areas is being filled
positionally (with a slot corresponding to each 'leg') rather
than sequentially (with all suitable areas found, to be sorted
and selected from).
When pvmove0 is finished, it replaces temporarily pvmove0
with error segment, however in this case, pvmove0 remains
unremovable in case pvmove --abort is interrupted in this
moment - since it's not a pvmove anymore and normal
lvremove can't be used to remove LOCKED lv.
When down-converting a RAID1 LV, if the user specifies too few devices,
they will get a confusing message.
Ex:
[root]# lvcreate -m 2 --type raid1 -n raid -L 500M taft
Logical volume "raid" created
[root]# lvconvert -m 0 taft/raid /dev/sdd1
Unable to extract enough images to satisfy request
Failed to extract images from taft/raid
This patch makes the error message a bit clearer by telling the user
the count they are trying to remove and the number of devices they
supplied.
[root@bp-01 lvm2]# lvcreate --type raid1 -m 3 -L 200M -n lv vg
Logical volume "lv" created
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sdb1
Unable to remove 3 images: Only 1 device given.
Failed to extract images from vg/lv
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bc]1
Unable to remove 3 images: Only 2 devices given.
Failed to extract images from vg/lv
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcd]1
[root@bp-01 lvm2]# lvs -a -o name,attr,devices vg
LV Attr Devices
lv -wi-a----- /dev/sde1(1)
This patch doesn't work in all cases. The user can specify the right
number of devices, but not a sufficient amount of devices from the LV.
This will produce the old error message:
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcf]1
Unable to extract enough images to satisfy request
Failed to extract images from vg/lv
However, I think this error message is sufficient for this case.
Since the usability problem were fixed, we can use this function.
Cleanup orphan LVs with TEMPORARY flags
(reduces couple blkid error reports, but couple of them
is still left...)
Since cache segment is purely virtual mapping, it has nothing for
discard. Discardable is cache origin here which is now
properly removed on 'delete' phase.
Plain lv_empty() call needs to only detach cache origin and leave
origin unchanged.
Drop unused passed cmd pointer from function.
TODO:
We have two similar functions (though not identical)
lv_manip.c: for_each_sub_lv()
metadata.c: _lv_each_dependency()
They seem to not always match - we should probably convert
to use only a single function.
Use proper vgmem memory pool for allocation of LV name in the vg
and check if new renamed LV is a valid name.
TODO: validation should really use also VG name, othewise we are not
able to tell "vgname-lvname" will be valid.
Create a separate function to validation snapshot min chunk value
and relocate code into snapshot_manip file.
This function will be shared with lvconvert then.
When we create thin-pool we have used trick to keep
volume active, but since we now support TEMPORARY flag,
we could just localy active & deactive metadata LV,
and let the thinpool through normal activation process.
When pool_has_message() is queried with NULL lv and 0 device_id
it should just return 'true' when there is any message queued.
So it needs to return negative value dm_list_empty().
Since there is no user for this code path in code currently,
this bug has not been triggered.
The same as for allocation/thin_pool_chunk_size - the default value
used is just a starting point. The calculation continues using the
properties of the devices actually used.
The allocation/thin_pool_chunk_size is a bit more complex. It's default
value is evaluated in runtime based on selected thin_pool_chunk_size_policy.
But the value is just a starting point. The calculation then continues
with dependency on the properties of the devices used. Which means for
such a default value, we know only the starting value.
Move flags for segments to segtype header where it seems more closely
related as the features are related to segtype and not activation.
Use unsigned #define - since it's more common in lvm2 source code
for bit flags.
Condition was swapped - however since it's been based on 'random'
memory content it's been missed as attribute has not been set.
So now we have quite a few possible results when testing.
We have old status without separate metadata and
we have kernels with fixed snapshot leak bug.
(in-release update)
Code uses target driver version for better estimation of
max size of COW device for snapshot.
The bug can be tested with this script:
VG=vg1
lvremove -f $VG/origin
set -e
lvcreate -L 2143289344b -n origin $VG
lvcreate -n snap -c 8k -L 2304M -s $VG/origin
dd if=/dev/zero of=/dev/$VG/snap bs=1M count=2044 oflag=direct
The bug happens when these two conditions are met
* origin size is divisible by (chunk_size/16) - so that the last
metadata area is filled completely
* the miscalculated snapshot metadata size is divisible by extent size -
so that there is no padding to extent boundary which would otherwise
save us
Signed-off-by:Mikulas Patocka <mpatocka@redhat.com>
While stripe size is twice the physical extent size,
the original code will not reduce stripe size to maximum
(physical extent size).
Signed-off-by: Zhiqing Zhang <zhangzq.fnst@cn.fujitsu.com>
Start to convert percentage size handling in lvresize to the new
standard. Note in the man pages that this code is incomplete.
Fix a regression in non-percentage allocation in my last check in.
This is what I am aiming for:
-l<extents>
-l<percent> LV/ORIGIN
sets or changes the LV size based on the specified quantity
of logical logical extents (that might be backed by
a higher number of physical extents)
-l<percent> PVS/VG/FREE
sets or changes the LV size so as to allocate or free the
desired quantity of physical extents (that might amount to a
lower number of logical extents for the LV concerned)
-l+50%FREE - Use up half the remaining free space in the VG when
carrying out this operation.
-l50%VG - After this operation, this LV should be using up half the
space in the VG.
-l200%LV - Double the logical size of this LV.
-l+100%LV - Double the logical size of this LV.
-l-50%LV - Reduce the logical size of this LV by half.
Parsing vg structure during supend/commit/resume may require a lot of
memory - so move this into vg_write.
FIXME: there are now multiple cache layers which our doing some thing
multiple times at different levels. Moreover there is now different
caching path with and without lvmetad - this should be unified
and both path should use same mechanism.
Several fixes for the recent changes that treat allocation percentages
as upper limits.
Improve messages to make it easier to see what is happening.
Fix some cases that failed with errors when they didn't need to.
Fix crashes when first_seg() returns NULL.
Remove a couple of log_errors that were actually debugging messages.
Remove 'skip' argument passed into the function.
We always used '0' - as this is the only supported
option (-K) and there is no complementary option.
Also add some testing for behaviour of skipping.