IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
I've changed build_parallel_areas_from_lv to take a new parameter
that allows the caller to build parallel areas by LV vs by segment.
Previously, the function created a list of parallel areas for each
segment in the given LV. When it came time for allocation, the
parallel areas were honored on a segment basis. This was problematic
for RAID because any new RAID image must avoid being placed on any
PVs used by other images in the RAID. For example, if we have a
linear LV that has half its space on one PV and half on another, we
do not want an up-convert to use either of those PVs. It should
especially not wind up with the following, where the first portion
of one LV is paired up with the second portion of the other:
------PV1------- ------PV2-------
[ 2of2 image_1 ] [ 1of2 image_1 ]
[ 1of2 image_0 ] [ 2of2 image_0 ]
---------------- ----------------
Previously, it was possible for this to happen. The change makes
it so that the returned parallel areas list contains one "super"
segment (seg_pvs) with a list of all the PVs from every actual
segment in the given LV and covering the entire logical extent range.
This change allows RAID conversions to function properly when there
are existing images that contain multiple segments that span more
than one PV.
In 'find_pvmove_lv', separate the code that searches the atomic
pvmove LVs from the code that searches the normal pvmove LVs. This
cleans up the segment iterator code a bit.
pvmove can be used to move single LVs by name or multiple LVs that
lie within the specified PV range (e.g. /dev/sdb1:0-1000). When
moving more than one LV, the portions of those LVs that are in the
range to be moved are added to a new temporary pvmove LV. The LVs
then point to the range in the pvmove LV, rather than the PV
range.
Example 1:
We have two LVs in this example. After they were
created, the first LV was grown, yeilding two segments
in LV1. So, there are two LVs with a total of three
segments.
Before pvmove:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
After pvmove inserts the temporary pvmove LV:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
Each of the affected LV segments now point to a
range of blocks in the pvmove LV, which purposefully
corresponds to the segments moved from the original
LVs into the temporary pvmove LV.
The current implementation goes on from here to mirror the temporary
pvmove LV by segment. Further, as the pvmove LV is activated, only
one of its segments is actually mirrored (i.e. "moving") at a time.
The rest are either complete or not addressed yet. If the pvmove
is aborted, those segments that are completed will remain on the
destination and those that are not yet addressed or in the process
of moving will stay on the source PV. Thus, it is possible to have
a partially completed move - some LVs (or certain segments of LVs)
on the source PV and some on the destination.
Example 2:
What 'example 1' might look if it was half-way
through the move.
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
| -------------------------
source PV | | 256 - 511 | 512 - 767 |
| -------------------------
| ||
-------------------------
dest PV | 000 - 255 | 256 - 511 |
-------------------------
This update allows the user to specify that they would like the
pvmove mirror created "by LV" rather than "by segment". That is,
the pvmove LV becomes an image in an encapsulating mirror along
with the allocated copy image.
Example 3:
A pvmove that is performed "by LV" rather than "by segment".
--------- ---------
| LV1s0 | | LV2s0 |
--------- ---------
| |
-------------------------
pvmove0 | * LV-level mirror * |
-------------------------
/ \
pvmove_mimage0 / pvmove_mimage1
------------------------- -------------------------
| seg 0 | seg 1 | | seg 0 | seg 1 |
------------------------- -------------------------
| | | |
------------------------- -------------------------
| 000 - 255 | 256 - 511 | | 000 - 255 | 256 - 511 |
------------------------- -------------------------
source PV dest PV
The thing that differentiates a pvmove done in this way and a simple
"up-convert" from linear to mirror is the preservation of the
distinct segments. A normal up-convert would simply allocate the
necessary space with no regard for segment boundaries. The pvmove
operation must preserve the segments because they are the critical
boundary between the segments of the LVs being moved. So, when the
pvmove copy image is allocated, all corresponding segments must be
allocated. The code that merges ajoining segments that are part of
the same LV when the metadata is written must also be avoided in
this case. This method of mirroring is unique enough to warrant its
own definitional macro, MIRROR_BY_SEGMENTED_LV. This joins the two
existing macros: MIRROR_BY_SEG (for original pvmove) and MIRROR_BY_LV
(for user created mirrors).
The advantages of performing pvmove in this way is that all of the
LVs affected can be moved together. It is an all-or-nothing approach
that leaves all LV segments on the source PV if the move is aborted.
Additionally, a mirror log can be used (in the future) to provide tracking
of progress; allowing the copy to continue where it left off in the event
there is a deactivation.
The list of strings is used quite frequently and we'd like to reuse
this simple structure for report selection support too. Make it part
of libdevmapper for general reuse throughout the code.
This also simplifies the LVM code a bit since we don't need to
include and manage lvm-types.h anymore (the string list was the
only structure defined there).
Introduce a new parameter called "approx_alloc" that is set when the
desired size of a new LV is specified in percentage terms. If set,
the allocation code tries to get as much space as it can but does not
fail if can at least get some.
One of the practical implications is that users can now specify 100%FREE
when creating RAID LVs, like this:
~> lvcreate --type raid5 -i 2 -l 100%FREE -n lv vg
Replacement of pv_read by find_pv_by_name in commit
651d5093ed caused spurious
error messages when running pvcreate or vgextend against an
unformatted device.
Physical volume /dev/loop4 not found
Physical volume "/dev/loop4" successfully created
Physical volume /dev/loop4 not found
Physical volume /dev/loop4 not found
Physical volume "/dev/loop4" successfully created
Volume group "vg1" successfully extended
Optimize and cleanup recently introduced new function wipe_lv.
Use compound literals to get nicely initialized wipe_params struct.
Pass in lv as explicit argument for wipe_lv.
Use cmd from lv structure.
Initialize only non-null members so it's easy to see what
is the special arg.
Use common wipe_lv (former set_lv) fn to do zeroing as well as signature
wiping if needed. Provide new struct wipe_lv_params to define the
functionality.
Bind "lvcreate -W/--wipesignatures y" with proper wipe_lv call.
Also, add "yes" and "force" to lvcreate_params so it's possible
to apply them for the prompt: "WARNING: %s detected on %s. Wipe it? [y/n]".
commit d00d45a8b6 introduced changes
that are causing cluster mirror tests to fail. Ultimately, I think
the change was right, but a proper clean-up will have to wait.
The portion of the commit we are reverting correlates to the
following commit comment:
2) lib/metadata/mirror.c:_delete_lv() - should have been calling
_activate_lv_like_model() with 'mirror_lv'. This is because
'mirror_lv' is the LV that the overall operation is being
performed on. We need to use this LV as the basis for
determining whether to activate locally, or across the
cluster, etc.
It appears that when legs or logs are removed from a mirror, they
are being activated before they are deleted in order to make them
top-level LVs that can be acted upon. When doing this, it appears
they are not activated based on the characteristics of the mirror
from which they came. IOW, if the mirror was exclusively active,
the sub-LVs are activated globally. This is a no-no. This then
made it impossible to activate_lv_like_model if the model was
"mirror_lv" instead of "lv" in _delete_lv(). Thus, at some point
this change should probably be put back and those location where
the sub-LVs are being improperly activated "shared" instead of
EX should be corrected.
Three fixme's addressed in this commit:
1) lib/metadata/lv_manip.c:_calc_area_multiple() - this could be
safely changed to a comment explaining that currently because
RAID10 can only have a 2-way mirror, we don't need to know the
number of stripes. However, we will need to know that in the
future if RAID10 is to support more than 2-way mirroring.
2) lib/metadata/mirror.c:_delete_lv() - should have been calling
_activate_lv_like_model() with 'mirror_lv'. This is because
'mirror_lv' is the LV that the overall operation is being
performed on. We need to use this LV as the basis for
determining whether to activate locally, or across the
cluster, etc.
3) tools/lvcreate.c:_lvcreate_params() - Minor clean-up. If
'-m 0' is given, treat it as though the mirroring argument
was not given (i.e. as though the requested segment type
was 'stripe' and not mirror).
There are places where 'lv_is_active' was being used where it was
more correct to use 'lv_is_active_locally'. For example, when checking
for the existance of a kernel instance before asking for its status.
Most of the time these would work correctly. (RAID is only allowed on
non-clustered VGs at the moment, which means that 'lv_is_active' and
'lv_is_active_locally' would give the same result.) However, it is
more correct to use the proper variant and it helps with future
scenarios where targets might be allowed exclusively (or clustered) in
a cluster VG.
Before, the find_pv_by_name call always failed if the PV found was orphan.
However, we might use this function even for a PV that is not part of any VG.
This patch adds 'allow_orphan' arg to find_pv_by_name fn that allows that.
For example, the old call and reference:
find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)
...now becomes:
find_config_tree_str(cmd, devices_dir_CFG)
So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
The 'copy_percent' function takes the 'extents_copied' field from each
segment in an LV to create the numerator for the ratio that is to
become the copy_percent. (Otherwise known as the 'sync' percent for
non-pvmove uses, like mirror LVs and RAID LVs.) This function safely
works on RAID - not just mirrors - so it is better to have it in
lv_manip.c rather than mirror.c.
There's a lot of different functions that do a lot of different things
in lv_manip.c, so I placed the function near a function in lv_manip.c
that it was close to in metadata-exported.h. Different placement in the
file or a different name for the function may be useful.
Accept -q as the short form of --quiet.
Suppress non-essential standard output if -q is given twice.
Treat log/silent in lvm.conf as equivalent to -qq.
Review all log_print messages and change some to
log_print_unless_silent.
When silent, the following commands still produce output:
dumpconfig, lvdisplay, lvmdiskscan, lvs, pvck, pvdisplay,
pvs, version, vgcfgrestore -l, vgdisplay, vgs.
[Needs checking.]
Non-essential messages are shifted from log level 4 to log level 5
for syslog and lvm2_log_fn purposes.
This patch adds support for RAID10. It is not the default at this
stage. The user needs to specify '--type raid10' if they would like
RAID10 instead of stacked mirror over stripe.
Update release_lv_segment_area not to discard any PV extents,
as it also gets used when moving extents between LVs.
Instead, call a new function release_and_discard_lv_segment_area() in
the two places where data should be discarded - lv_reduce() and
remove_mirrors_from_segments().
In this case we should allow to use local mirror, check for cmirror
should apply only for lvconvert/lvcreate.
Introduced in 2.02.86 by removing !(lv->status & ACTIVATE_EXCL).
(Partially workaround, it is minimalistic patch for now.)
Git commit ID 0864378250 was meant to disallow
'mirrored' logs for cluster mirrors. However, when add_mirror_log is used
to create the log (as is now the case when using 'lvcreate' or converting only
the log) the check is bypassed.
This patch adds the check to add_mirror_log.
This patch also does some clean-up of the splitmirrors code.
I've attempted to clean-up the splitmirrors code to make it easier to
understand with fewer operations. I've tried to reduce the number of
metadata operations without compromising the intermediate stages which
are necessary for easy clean-up in the even of failure.
These changes now correctly handle cluster situations - including exclusive
cluster mirrors. Whereas before, a splitmirror operation would result in
remote nodes having LVM commands report the newly split LV with a proper
name while DM commands would report the old (pre-split) names of the device.
IOW, there was a kernel/userspace mismatch.
The original commit comments can be located via this git commit ID:
7d8e615c0b
There were three possible solutions to the original problem proposed in the
initial check-in. The one chosen was as follows:
2) Do like _remove_mirror_images does and suspend the original, then suspend
the sub-lv (the error target), then resume the sub-lv, and finally resume the
original LV. This seems like extra pointless operations to me, but it doesn't
produce the error message (although, I'm not sure why) and it allows us to
leave the visible flag in place.
Turns out, the cluster also views the extra suspend/resume operations as
pointless too and ignores them. So, this solution doesn't work in a cluster.
Further, I've noticed that in addition to the remote cluster nodes still getting
I/O errors from scanning the error target, they also have a different LVM and
DM views of the same LV. IOW, while the LVM level (gotten from the LVM metadata)
sees the correct name for the newly split LV, device-mapper still maintains the
old names.
Because the original fix failed to completely fix the problem (or work-around it)
and because a better solution must be found to address the additional cluster
issue of device renaming, I am reverting the above mentioned commit.
Compiler says variable may be used uninitialized. It can't be, but we
initialize the variable to NULL anyway. Also, remove the double initialization
of another variable.
to settle udev before calling deactivate_lv.
This is an intra-release regression (no WHATS_NEW entry required). It is
part of the fix for the current WHATS_NEW entry:
Work around resume_lv causing error LV scanning during splitmirror operation.
Changing lv_mirror_count to only count the AREA_LVs made the function
stop working for PVMOVE mirrors. A conditional has been added to fix
that problem. Additionally, when counting the images in a mirror stack,
we don't need to subtract 1 from the count we get back from the
lv_mirror_count call on the temporary mirror layer. (This is because we
are no falsely counting the top layer of the temporary mirror.)
lv_mirror_count was not able to handle mirrors of stripes properly. When a
failed device is removed, the MIRRORED status flag is removed from the LV
conditionally based on the results of lv_mirror_count. However, lv_mirror_count
trusted the MIRRORED flag - thinking any such LV must be mirrored. It would
happily assign first_seg(lv)->area_count as the number of mirrors, but when
a mirrored striped LV was reduced to a simple striped LV area_count would be
the number of /stripes/ not the number of /mirrors/. A result higher than 1
would be returned from lv_mirror_count, the MIRRORED flag would not be cleared,
and the LV would fail to be up-converted properly in lvconvert_mirrors_aux
because of it.
The operation of deactivating the residual error target LV after removing a
mirror layer can cause a "device in-use" conflict with udev. Giving udev a
poke before calling deactivate_lv eliminates the conflict. The stick used
to poke udev is 'sync_local_dev_names'.
WHATS_NEW entry:
Fix log size calculation when only a log is being added to a mirror.
The original fix pass the mirror LV to allocate_extents (rather than
passing NULL) so that _alloc_init could correctly determine the necessary
size of the mirror log. In the previous check-in, I noted:
In order to get a decent value computed, we need to pass in the 'lv' argument
to allocate_extents. This would normally imply a desire for cling/contiguous
allocation to the given LV, but since we are not allocating any parallel
extents and only log extents, it works fine.
However, passing in the LV did have unintended consequences on the placement of
the log. The better solution is to pass in the number of extext that are in
the mirror LV instead of the LV itself. This will not cause the allocator to
reserve that number of extents, because 'stripes' and 'mirrors' are specified
as 0. Thus, 'extents' is used to calculate the size of the log, but won't
affect how much is allocated.
_alloc_init calculates the number of necessary log extents via
'mirror_log_extents'. 'mirror_log_extents' takes 3 arguments: region_size,
pe_size, and size of the mirror LV. Unfortunately, _alloc_init is guessing at
the mirror size by using 'ah->new_extents / ah->area_multiple' - the number of
extents that the mirror images have. However, this is /always/ wrong when
allocating the log separately. Further, the log is always allocated separately
unless we are up-converting the mirror at the same time. It was by luck alone
that a default value of '1' reflects what we want in most cases.
In order to get a decent value computed, we need to pass in the 'lv' argument
to allocate_extents. This would normally imply a desire for cling/contiguous
allocation to the given LV, but since we are not allocating any parallel
extents and only log extents, it works fine.
When an image is split from a 2-way mirror, the original mirror is converted to
a linear device. To do this, the top "layer" must be removed. The segments
are transferred from the sub-lv to the top-level LV and the link is severed.
The former sub-lv - having its segments transferred - now contains a temporary
error target.
When the original LV is resumed, the old sub-lv that now contains an error
segment is activated and scanned. This is what causes the I/O error messages.
There are three ways to fix this problem:
1) Do not set the sub-lv which contains the error target as "visible" before
suspending the original LV. This way, when the original is resumed, the sub-lv
device node is not created and it is not scanned - avoiding the error messages.
The problem with this approach is that if the machine crashes after the
resume, it leaves the *hidden* LV in place and the user has a more difficult
time noticing that it needs to be cleaned up. Thus, this type of processing is
frowned upon.
2) Do like _remove_mirror_images does and suspend the original, then suspend
the sub-lv (the error target), then resume the sub-lv, and finally resume the
original LV. This seems like extra pointless operations to me, but it does not
produce the error message (although, I'm not sure why) and it allows us to
leave the visible flag in place.
3) Flag the sub-lv (error target) with a "do not scan" flag. This seems like
the cleanest approach, but I have been unable to find the method for doing
this. LVs get tagged in such a way by _get_udev_flags, but in this case the
resume of the original LV also resumes the error target LV without running it
through _get_udev_flags (likely because they are no longer linked). Could
there be something wrong in resume_lv?
Option #2 was chosen to fix this bug, but it seems like more of a workaround
for now.