IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
A previous commit (b6bfddcd0a) which
was designed to prevent segfaults during lvextend when trying to
extend striped logical volumes forgot to include calculations for
RAID4/5/6 parity devices. This was causing the 'contiguous' and
'cling_by_tags' allocation policies to fail for RAID 4/5/6.
The solution is to remember that while we can compare
ah->area_count == prev_lvseg->area_count
for non-RAID, we should compare
(ah->area_count + ah->parity_count) == prev_lvseg->area_count
for a general solution.
Older gcc is giving misleading warning:
metadata/lv_manip.c:4018: warning: ‘seg’ may be used uninitialized in
this function
But warning free compilation is better.
Creation, deletion, [de]activation, repair, conversion, scrubbing
and changing operations are all now available for RAID LVs in a
cluster - provided that they are activated exclusively.
The code has been changed to ensure that no LV or sub-LV activation
is attempted cluster-wide. This includes the often overlooked
operations of activating metadata areas for the brief time it takes
to clear them. Additionally, some 'resume_lv' operations were
replaced with 'activate_lv_excl_local' when sub-LVs were promoted
to top-level LVs for removal, clearing or extraction. This was
necessary because it forces the appropriate renaming actions the
occur via resume in the single-machine case, but won't happen in
a cluster due to the necessity of acquiring a lock first.
The *raid* tests have been updated to allow testing in a cluster.
For the most part, this meant creating devices with '-aey' if they
were to be converted to RAID. (RAID requires the converting LV to
be EX because it is a condition of activation for the RAID LV in
a cluster.)
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps. For
example, if there is a 3-way mirror:
[0][1][2]
and we remove device#0, the devices will be shifted down
[1][2]
and renamed.
[0][1]
This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though. This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name. The solution is to check for a conflicting name before
attempting to rename. If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.
Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV. This "hack"
only worked for single machine use-cases of LVM. Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
When the pool is created from non-linear target the more complex rules
have to be used and stacking needs to properly decode args for _tdata
LV. Also proper allocation policies are being used according to those
set in lvm2 metadata for data and metadata LVs.
Also properly check for active pool and extra code to active it
temporarily.
With this fix it's now possible to use:
lvcreate -L20 -m2 -n pool vg --alloc anywhere
lvcreate -L10 -m2 -n poolm vg --alloc anywhere
lvconvert --thinpool vg/pool --poolmetadata vg/poolm
lvresize -L+10 vg/pool
The pool metadata LV must be accounted for when determining what PVs
are in a thin-pool. The pool LV must also be accounted for when
checking thin volumes.
This is a prerequisite for pvmove working with thin types.
The function 'get_pv_list_for_lv' will assemble all the PVs that are
used by the specified LV. It uses 'for_each_sub_lv' to traverse all
of the sub-lvs which may compose it.
When creating a new thin pool and there's no profile requested
via "lvcreate --profile ...", inherit any VG profile if it's attached.
Currently this applies to these settings:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
The PREFERRED allocation mechanism requires the number of areas in the
previous LV segment to match the number in the new segment being
allocated. If they do not match, the code may crash.
E.g. https://bugzilla.redhat.com/989347
Introduce A_AREA_COUNT_MATCHES and when not set avoid referring
to the previous segment with the contiguous and cling policies.
commit d00d45a8b6 introduced changes
that are causing cluster mirror tests to fail. Ultimately, I think
the change was right, but a proper clean-up will have to wait.
The portion of the commit we are reverting correlates to the
following commit comment:
2) lib/metadata/mirror.c:_delete_lv() - should have been calling
_activate_lv_like_model() with 'mirror_lv'. This is because
'mirror_lv' is the LV that the overall operation is being
performed on. We need to use this LV as the basis for
determining whether to activate locally, or across the
cluster, etc.
It appears that when legs or logs are removed from a mirror, they
are being activated before they are deleted in order to make them
top-level LVs that can be acted upon. When doing this, it appears
they are not activated based on the characteristics of the mirror
from which they came. IOW, if the mirror was exclusively active,
the sub-LVs are activated globally. This is a no-no. This then
made it impossible to activate_lv_like_model if the model was
"mirror_lv" instead of "lv" in _delete_lv(). Thus, at some point
this change should probably be put back and those location where
the sub-LVs are being improperly activated "shared" instead of
EX should be corrected.
Three fixme's addressed in this commit:
1) lib/metadata/lv_manip.c:_calc_area_multiple() - this could be
safely changed to a comment explaining that currently because
RAID10 can only have a 2-way mirror, we don't need to know the
number of stripes. However, we will need to know that in the
future if RAID10 is to support more than 2-way mirroring.
2) lib/metadata/mirror.c:_delete_lv() - should have been calling
_activate_lv_like_model() with 'mirror_lv'. This is because
'mirror_lv' is the LV that the overall operation is being
performed on. We need to use this LV as the basis for
determining whether to activate locally, or across the
cluster, etc.
3) tools/lvcreate.c:_lvcreate_params() - Minor clean-up. If
'-m 0' is given, treat it as though the mirroring argument
was not given (i.e. as though the requested segment type
was 'stripe' and not mirror).
Activation is needed only for clustered VG.
For non-clustered VG skip activation, since deactivate_lv()
is called without problems (no testing for lock presence).
(updates f6ded62291)
When the merging of snapshot is finished, we need to clean dm table
intries for snapshot and -cow device. So for merging snapshot
we have to activate_lv plain 'cow' LV and let the table
resolver to its work - shortly deactivation_lv() request
will follow - in cluster this needs LV lock to be held by clvmd.
Also update a test - add small wait - if lvremove is not 'fast enough'
and merging process has not been stopped and $lv1 removed in background.
Ortherwise the following lvcreate occasionally finds name $lv1 still in use.
(in release fix)
Add --poolmetadataspare option and creates and handles
pool metadata spare lv when thin pool is created.
With default setting 'y' it tries to ensure, spare has
at least the size of created LV.
Pool creation involves clearing of metadata device
which triggers udev watch rule we cannot udev synchronize with
in current code.
This metadata devices needs to be activated localy,
so in cluster mode deactivation and reactivation
is always needed.
However for non-clustered mode we may reload table
via suspend/resume path which avoids collision with
udev watch rule which was occasionaly triggering
retry deactivation loop.
Code has been also split into 2 separate code paths
for thin pools and thin volumes which improved readability
of the code as well.
Deactivation has been moved out of extend_pool() and
decision is now in _lv_create_an_lv() which knows
the change mode.
Since we vg_write&commit metadata LV inside lv_extend() call,
proper restore is needed in case something fails.
So add bad: section which deactivates activated LV
and removes it from VG.
Also check early for metadata LV name lengh fail.
Remove backup() call from update_pool_lv() as it's been there
duplicated and preperly order backup() call after lvresize,
so there is just one such call.
If the thin pool is known to be active, messages can be passed
to the pool even when the created thin volume is not going to be
activated.
So we do not need to stack large list of message and validate
and catch creation errors earlier in this case.
Replace the test for valid activation combination with simpler list of
deactivation combinations.
The activation/auto_set_activation_skip enables/disables automatic
adding of the ACTIVATION_SKIP LV flag. By default thin snapshots
are flagged to be skipped during activation.
And by default, the auto_set_activation_skip is enabled.
Also add -k/--setactivationskip y/n and -K/--ignoreactivationskip
options to lvcreate.
The --setactivationskip y sets the flag in metadata for an LV to
skip the LV during activation. Also, the newly created LV is not
activated.
Thin snapsots have this flag set automatically if not specified
directly by the --setactivationskip y/n option.
The --ignoreactivationskip overrides the activation skip flag set
in metadata for an LV (just for the run of the command - the flag
is not changed in metadata!)
A few examples for the lvcreate with the new options:
(non-thin snap LV => skip flag not set in MDA + LV activated)
raw/~ $ lvcreate -l1 vg
Logical volume "lvol0" created
raw/~ $ lvs -o lv_name,attr vg/lvol0
LV Attr
lvol0 -wi-a----
(non-thin snap LV + -ky => skip flag set in MDA + LV not activated)
raw/~ $ lvcreate -l1 -ky vg
Logical volume "lvol1" created
raw/~ $ lvs -o lv_name,attr vg/lvol1
LV Attr
lvol1 -wi------
(non-thin snap LV + -ky + -K => skip flag set in MDA + LV activated)
raw/~ $ lvcreate -l1 -ky -K vg
Logical volume "lvol2" created
raw/~ $ lvs -o lv_name,attr vg/lvol2
LV Attr
lvol2 -wi-a----
(thin snap LV => skip flag set in MDA (default behaviour) + LV not activated)
raw/~ $ lvcreate -L100M -T vg/pool -V 1T -n thin_lv
Logical volume "thin_lv" created
raw/~ $ lvcreate -s vg/thin_lv -n thin_snap
Logical volume "thin_snap" created
raw/~ $ lvs -o name,attr vg
LV Attr
pool twi-a-tz-
thin_lv Vwi-a-tz-
thin_snap Vwi---tz-
(thin snap LV + -K => skip flag set in MDA (default behaviour) + LV activated)
raw/~ $ lvcreate -s vg/thin_lv -n thin_snap -K
Logical volume "thin_snap" created
raw/~ $ lvs -o name,attr vg/thin_lv
LV Attr
thin_lv Vwi-a-tz-
(thins snap LV + -kn => no skip flag in MDA (default behaviour overridden) + LV activated)
[0] raw/~ # lvcreate -s vg/thin_lv -n thin_snap -kn
Logical volume "thin_snap" created
[0] raw/~ # lvs -o name,attr vg/thin_snap
LV Attr
thin_snap Vwi-a-tz-
Start separating the validation from the action in the basic lvresize
code moved to the library.
Remove incorrect use of command line error codes from lvresize library
functions. Move errors.h to tools directory to reinforce this,
exporting public versions of the error codes in lvm2cmd.h for dmeventd
plugins to use.
Fix and improve handling on sigint.
Always check for signal presence *before* calling of command,
so it will not call the command when break was hit.
If the command has been finished succesfully there is
no problem to mark the command ok and not report interrupt at all.
Fix cuple related stack; reports and assignments.
The pv resize code required that a lvm_vg_write be done
to commit the change. When the method to add the ability
to list all PVs, including ones that are not assocated with
a VG we had no way for the user to make the change persistent.
Thus additional resize code was move and now liblvm calls into
a resize function that does indeed write the changes out, thus
not requiring the user to explicitly write out he changes.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Code move and changes to support calling code from
command line and from library interface.
V2 Change lock_vol call
Signed-off-by: Tony Asleson <tasleson@redhat.com>
As locks are held, you need to call the included function
to release the memory and locks when done transversing the
list of physical volumes.
V2: Rebase fix
V3: Prevent VGs from getting cached and then write protected.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Simplified version of lv resize.
v3: Rebase changes to make work. Needed to set sizeargs = 1
to indicate to resize that we are asking for a size based
resize.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
These settins are customizable by profiles:
allocation/thin_pool_zero
allocation/thin_pool_discards
allocation/thin_pool_chunk_size
activation/thin_pool_autoextend_threshold
activation/thin_pool_autoextend_percent
If "vgcreate/lvcreate --profile <profile_name>" is used, the profile
name is automatically stored in metadata for making it possible to
load it automatically next time the VG/LV is used.
This is per VG/LV profile loading on demand. The profile itself is saved
in struct volume_group/logical_volume as "profile" field so we can
reference it whenever needed.
Do not keep multiple archives for the executed command.
Reuse the ALLOCATABLE_PV from pv status for
ARCHIVED_VG vg status. Mark VG with the bit with the
first archivation.
Since reduce the message has informational character and doesn't lead
to exit of the command - reduce the log level to info print as we
use for other similar types.
Reindent next print message.
Changes:
- move device type registration out of "type filter" (filter.c)
to a separate and new dev-type.[ch] for common use throughout the code
- the structure for keeping the major numbers detected for available
device types and available partitioning available is stored in
"dev_types" structure now
- move common partitioning detection code to dev-type.[ch] as well
together with other device-related functions bound to dev_types
(see dev-type.h for the interface)
The dev-type interface contains all common functions used to detect
subsystems/device types, signature/superblock recognition code,
type-specific device properties and other common device properties
(bound to dev_types), including partitioning support.
- add dev_types instance to cmd context as cmd->dev_types for common use
- use cmd->dev_types throughout as a central point for providing
information about device types
Giving volume type information about being 'metadata' type of volume
has higher priority then i.e. 'mirror' or 'thin' flag - for those
type we have 'target attr' (7th. field).
The special suspend/resume code in lv_remove for LVM1 snapshots was interpsersed
with a vg_commit call. However, while with LVM1 metadata, vg_commit is
technically a no-op, the activation code relied on the ondisk and incore
metadata being the same, since on LVM1, a "commit" happens in vg_write
already. Since the "ondisk" metadata was previously not available with format1
(and incore was silently used instead, via lvmcache), the problem was masked.
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
This allows us to get the current on-disk version of the metadata whenever we
have the current in-flight version, without a recourse to scanning or lvmcache.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
There is no point in creation of 2chunks snapshot,
since the snapshot is invalidated immeditelly with the first write
as there is no free chunk for COW blocks
(2 chunks are used by the snap header and the 1st. metadata chunk).
Enhance error message about the lowest usable size.
Avoid hitting memory corruption (double free) in code path,
where PV FID has been already destroyed and the released pointer
was left in PV structure and could have been tried to be released
from there 2nd. time with final context destruction.