IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Instead of clearing multiple rmeta device with sequential activation
process and waiting for udev for every _rmeta device separately,
activate all _rmeta devices first and then clear them and deactivate
afterwards.
Also update some tracing messages.
When anyhing goes wrong during clearing process, always try to
deactivate as much _rmeta devices as possible before fail.
Unconditionally guard there is at least 1/4 of metadata volume
free (<16Mib) or 4MiB - whichever value is smaller.
In case there is not enough free space do not let operation proceed and
recommend thin-pool metadata resize (in case user has not
enabled autoresize, manual 'lvextend --poolmetadatasize' is needed).
The existing code doesn't understand that mirror logs should cling to
parallel LVs (like extending them) instead of avoiding them.
As a quick workaround to avoid lvcreate failures, hard-code
--alloc normal for mirror logs even if the rest of the allocation
used a stricter policy.
https://bugzilla.redhat.com/show_bug.cgi?id=1376532
General RAID and RAID segment type specific checks are added
to merge.c. New static _check_raid_seg() is called on each segment
of a RaidLV (which have just one) from check_lv_segments().
New checks caught some unititialized segment members
which are addressed here as well:
- initialize seg->region_size to 0 in lvcreate.c for raid0/raid0_meta
- initialize list seg->origin_list in lv_manip.c
When volume was lvconvert-ed to a thin-volume with external origin,
then in case thin-pool was in non-zeroing mode
it's been printing WARNING about not zeroing thin volume - but
this is wanted and expected - so nothing to warn about.
So in this particular use case WARNING needs to be suppressed.
Adding parameter support for lvcreate_params.
So now lvconvert creates 'normal thin LV' in read-only mode
(so any read will 'return 0' for a moment)
then deactivate regular thin LV and reacreate in 'final R/RW' mode
thin LV with external origin and activate again.
Commit ca878a3426 changed behavior
or resize operation. Later the code has been futher changed
to skip fs resize completely when size of LV is already matching
and finaly at the most recent resize changeset for resize the
check for matching size has been eliminated as well so we ended
with a request call to resize fs to 0 size in some cases.
This commit reoders some test so the prompt happens just once before
resize of possibly 2 related volumes.
Also extra test for having LV already given size is added, and
whole metadata update is skipped for this case as the only
result would be an increment of seqno.
However the filesystem is still resized when requested,
so if the LV has some size and the resize is resolved to
the same size, the filesystem resize is called so in case FS
would not match, the resize will happen.
A livelock occurs on extension in lv_manip when adjusting the region size,
which doesn't apply to any raid0/raid0_meta LVs (these don't have a bitmap).
Fix by prohibiting the region size adjustment on any such LVs.
- resolves rhbz1354604
Add code to support more LVs to be resized through a same code path
using a single lvresize_params struct.
(Now it's used for thin-pool metadata resize,
next user will be snapshot virtual resize).
Update code to adjust percent amount resize for use_policies.
Properly activate inactive thin-pool in case of any pool resize
as the command should not 'deffer' this operation to next activation.
Use common API design and pass just LV pointer to lv_manip.c functions.
Read cmd struct via lv->vg->cmd when needed.
Also do not try to return EINVALID_CMD_LINE error when we
have already openned VG - this error code can only be returned before
locking VG.
lv_refresh_suspend_resume() has escaped with fail ret code
after failing suspend and could have left many volumes in suspend state.
So always unconditionally call resume also when suspend has failed.
When mirror/raid called copy_percent function to return,
when 100% was supposed to be returned, wrong float 100.0 value
could have been reported back instead of dm_percent_t DM_PERCENT_100.
There is broken API somewhere, since the function here rely on
actively being modifid VG content even when doing 'lvs' operation.
(extents_copies)
Support parsing --chunksize option also when converting.
Now user can use cache pool created with i.e. 32K chunksize,
while in caching user can select 512K blocks.
Tool is supposed to validate cache metadata size is big enough
to support such chunk size. Otherwise error is shown.
When creating LV - in some case we change created segment type
(ATM for cache and snapshot) and we then manipulate with
lv segment according to 'lp' segtype.
Fix this by checking for proper type before accessing segment members.
This makes command like:
lvcreate --type cache-pool -L10 vg/cpool
lvcreate -H -L10 --cachesettings migtation_threshold=10000 vg/cpool
to pass since now tool correctly selects default cache policy.
If there's an activation volume_filter, it might not be possible
to activate the rmeta LVs to wipe them. At least inherit any
LV tags from the parent LV while attempting this.
Commit b64703401d cause regression
when handling stacked resize of pool metadata volume that would
be a raid LV.
Fix it by properly setting up size also for layer extension.
Report proper values for historical LVs in lv_layout and lv_role fields.
Any historical LV doesn't have any layout anymore and the role is "history".
For example:
$ lvs -H -o name,lv_attr,lv_layout,lv_role vg/-lvol1
LV Attr Layout Role
-lvol1 ----h----- none public,history
Add support for making an interconnection between thin LV segment and
its indirect origin (which may be historical or live LV) - add a new
"indirect_origin" argument to attach_pool_lv function.
The add_glv_to_indirect_glvs is a helper function that registers a
volume represented by struct generic_logical_volume instance ("glv")
as an indirect user of another volume ("origin_glv") and vice versa -
it also registers the other volume ("origin_glv") as indirect_origin
of user volume ("glv").
The remove_glv_from_indirect_glvs does the opposite.
The get_or_create_glv is helper function that retrieves any existing
generic_logical_volume wrapper for the LV. If the wrapper does not exist
yet, it's created.
The get_org_create_glvl is the same as get_or_create_glv but it creates
the glv_list wrapper in addition so it can be added to a list.
Add new structures and new fields in existing structures to support
tracking history of LVs (the LVs which don't exist - the have been
removed already):
- new "struct historical_logical_volume"
This structure keeps information specific to historical LVs
(historical LV is very reduced form of struct logical_volume +
it contains a few specific fields to track historical LV
properties like removal time and connections among other LVs).
- new "struct generic_logical_volume"
Wrapper for "struct historical_logical_volume" and
"struct logical_volume" to make it possible to handle volumes
in uniform way, no matter if it's live or historical one.
- new "struct glv_list"
Wrapper for "struct generic_logical_volume" so it can be
added to a list.
- new "indirect_glvs" field in "struct logical_volume"
List that stores references to all indirect users of this LV - this
interconnects live LV with historical descendant LVs or even live
descendant LVs.
- new "indirect_origin" field in "struct lv_segment"
Reference to indirect origin of this segment - this interconnects
live LV (segment) with historical ancestor.
- new "this_glv" field in "struct logical_volume"
This references an existing generic_logical_volume wrapper for this
LV, if used. It can be NULL if not needed - which means we're not
handling historical LVs at all.
- new "historical_lvs" field in "struct volume group
List of all historical LVs read from VG metadata.
Simplify calculation of extents rounding needed for
segment size.
Segment size has to divisible by 'extent count' needed to contain
whole stripe. LVM currently does not support stripes across segment.
In case the stripe size is bigger then extent size,
require bigger rounding.
Internal _alloc_init() is only called from allocate_extents(),
which already does prevent usage of virtual segments.
So mark as internal error early and do not process it any further.
Since we mark cache-pool as 'hidden/private' while it is in-use,
we may still allow user to change it's name.
It should not cause any harm and user may prefer better naming
for a cache-pool in use.
Currently the code creates the log separately after allocating space for
the data and as no data allocation is needed this second time,
total_extents ends up holding zero so use new_extents directly instead.
This option could never have been printed in lvm2 metadata, so it could
be safely removed as it could have been set only as 0.
These configurable setting is supported via metadata profile.
Use single code to evaluate if the percentage value has
crossed threshold.
Recalculate amount value to always fit bellow
threshold so there are not need any extra reiterations
to reach this state in case policy amount is too small.
Before this patch:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear public
With this patch applied:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear private,lockd,sanlock
Certain stacks of cached LVs may have unexpected consequences.
So add a warning function called when LV is cached to detect
such caces and WARN user about them - the best we could do ATM.
When we insert layer we also move status flag-bits for certain LV types,
so internal volume_group structure remains consistent.
(Perhaps it's misuse of 'insert_layer' function and we should have
another similar function for this.)
Basically we aim to maintain the same state as after reading fresh
metadata out of volume group.
Currently we when i.e. cache 'raid' LV - this should transfer 'raidLV' flag
to _corigin LV and cache is no longer a raid.
TODO: bits for stacked devices needs more exact rules.
Change logic and naming of some internal API functions.
cache_set_mode() and cache_set_policy() both take segment.
cache mode is now correctly 'masked-in'.
If the passed segment is 'cache' segment - it will automatically
try to find 'defaults' according to profiles if the are NOT
specified on command line or they are NOT already set for cache-pool.
These defaults are never set for cache-pool.
lvrename should not be done if the LV is active on another host.
This check was mistakenly removed when the code was changed to
use LV uuids in locks rather than LV names.
A segfault was reported when extending an LV with a smaller number of
stripes than originally used. Under unusual circumstances, the cling
detection code could successfully find a match against the excess
stripe positions and think it had finished prematurely leading to an
allocation being pursued with a length of zero.
Rename ix_offset to num_positional_areas and move it to struct
alloc_state so that _is_condition() can obtain access to it.
In _is_condition(), areas_size can no longer be assumed to match the
number of positional slots being filled so check this newly-exposed
num_positional_areas directly instead. If the slot is outside the
range we are trying to fill, just ignore the match for now.
(Also note that the code still only performs cling detection against
the first segment of the LV.)
Keep policy name separate from policy settings and avoid
to mangling and demangling this string from same config tree.
Ensure policy_name is always defined.
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
Synchronize with udev logic before reusing device as snapshot.
This patch tries to fix the problem with udev, where we manage
to 'active' LV for clearing, then we deactivate such device and
active again as member of 'origin&snapshot' tree all in 1 step.
There needs to be a sync point where udev has time to remove all links,
otherwise we race with scans and we may end-up with mysterious 'free'
links in the system pointing to wrong dm names.
This patch tries to fix failing topology cluster tests..
When performing initial allocation (so there is nothing yet to
cling to), use the list of tags in allocation/cling_tag_list to
partition the PVs. We implement this by maintaining a list of
tags that have been "used up" as we proceed and ignoring further
devices that have a tag on the list.
https://bugzilla.redhat.com/983600
Add A_PARTITION_BY_TAGS set when allocated areas should not share tags
with each other and allow _match_pv_tags to accept an alternative list
of tags. (Not used yet.)
Do not keep dangling LVs if they're removed from the vg->lvs list and
move them to vg->removed_lvs instead (this is actually similar to already
existing vg->removed_pvs list, just it's for LVs now).
Once we have this vg->removed_lvs list indexed so it's possible to
do lookups for LVs quickly, we can remove the LV_REMOVED flag as
that one won't be needed anymore - instead of checking the flag,
we can directly check the vg->removed_lvs list if the LV is present
there or not and to say if the LV is removed or not then. For now,
we don't have this index, but it may be implemented in the future.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
cmirror uses the CPG library to pass messages around the cluster and maintain
its bitmaps. When a cluster mirror starts-up, it must send the current state
to any joining members - a checkpoint. When mirrors are large (or the region
size is small), the bitmap size can exceed the message limit of the CPG
library. When this happens, the CPG library returns CPG_ERR_TRY_AGAIN.
(This is also a bug in CPG, since the message will never be successfully sent.)
There is an outstanding bug (bug 682771) that is meant to lift this message
length restriction in CPG, but for now we work around the issue by increasing
the mirror region size. This limits the size of the bitmap and avoids any
issues we would otherwise have around checkpointing.
Since this issue only affects cluster mirrors, the region size adjustments
are only made on cluster mirrors. This patch handles cluster mirror issues
involving pvmove, lvconvert (from linear to mirror), and lvcreate. It also
ensures that when users convert a VG from single-machine to clustered, any
mirrors with too many regions (i.e. a bitmap that would be too large to
properly checkpoint) are trapped.
Dop unused value assignments.
Unknown is detected via other combination
(!linear && !striped).
Also change the log_error() message into a warning,
since the function is not really returning error,
but still keep the INTERNAL_ERROR.
Ret value is always set later.
Before, we refreshed filters and we did full rescan of devices if
we passed through wiping (wipe_known_signatures fn call). However,
this fn returns success even if no signatures were found and so
nothing was wiped. In this case, it's not necessary to do the
filter refresh/rescan of devices as nothing changed clearly.
This patch exports number of wiped signatures from all the
wiping functions below. The caller (_pvcreate_check) then checks
whether any wiping was done at all and if not, no refresh/rescan
is done, saving some time and resources.
for_each_sub_lv() now scans in depth also pools, however for
rename we actually do want to skip pools.
So add a new for_each_sub_lv_except_pools() to be used by rename,
every other user of for_each_sub_lv() scans every sub LV with pools
included.
This is i.e. necessary for properly working preload of pools
that are using raid arrays.
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
When creating cluster mirrors while they're not supposed to be activated
immediately after creation, we don't need to check for cmirrord availability.
We can just create these mirrors and let the check to be done on activation
later on. This is addendum for commit cba6186325.
When creating/activating clustered mirrors, we should have cmirrord
available and running. If it's not, we ended up with rather cryptic
errors like:
$ lvcreate -l1 -m1 --type mirror vg
Error locking on node 1: device-mapper: reload ioctl on failed: Invalid argument
Failed to activate new LV.
$ vgchange -ay vg
Error locking on node node 1: device-mapper: reload ioctl on failed: Invalid argument
This patch adds check for cmirror availability and it errors out
properly, also giving a more precise error messge so users are able
to identify the source of the problem easily:
$ lvcreate -l1 -m1 --type mirror vg
Shared cluster mirrors are not available.
$ vgchange -ay vg
Error locking on node 1: Shared cluster mirrors are not available.
Exclusively activated cluster mirror LVs are OK even without cmirrord:
$ vgchange -aey vg
1 logical volume(s) in volume group "vg" now active
Since we support device stack of pools over pool
(thin-pool with cache data volume) the existing code
is no longer able to detect orphan _pmspare.
So instead do a _pmspare check after volume removal,
and remove spare afterwards.
More efficient spare volume creation. Save 1 extra commit
and properly activate this volume according to our cluster
activation rules (using lv_active_change() for this).
Since we 'layer' for cache origin which and we support dropping
cache layer - we need to restore origin name in case
the origin LV is more complex target - i.e. raid.
Drop _corig from name
Cleanup and rename parent -> parent_lv.
Revert part of commit 51a29e6056,
it's probably bad idea to continue with any recovery, when
vg_write() or vg_commit() fail - so it's better to leave it as it is.
Let's use this function for more activations in the code.
'needs_exlusive' will enforce exlusive type for any given LV.
We may want to activate LV in exlusive mode, even when we know
the LV (as is) supports non-exlusive activation as well.
lvcreate -ay -> exclusive & local
lvcreate -aay -> exclusive & local
lvcreate -aly -> exclusive & local
lvcreate -aey -> exclusive (might be on any node).
Call check_new_thin_pool() to detect in-use thin-pool.
Save extra reactivation of thin-pool when thin pool is not active.
(it's now a bit more expensive to invoke thin_check for new pools.)
For new pools:
We now active locally exclusively thin-pool as 'public' LV.
Validate transaction_id is till 0.
Deactive.
Prepare create message for thin-pool and exclusively active pool.
Active new thin LV.
And deactivate thin pool if it used to be inactive.
Unlike with thin-pool - with cache we support all args also
directly when create cache volume.
So the result of 'separate' cache-pool creation and setting its
options should give same result as specifying those args
during cache creation.
Cache-pool values are used as defaults if the params are
not specified with cache creation.
Move code for creation of thin volume into a single place
out of lv_extend(). This allows to drop extra pool arg
for alloc_lv_segment() && lv_extend() and makes code
more easier to read and follow.
When we create volumes with chunk size bigger then extent size
we try to round up to some nearest chunk boundary.
Until now we did this for thins - use same logic for
cache volumes.
Refactor lvcreate code.
Prefer to use arg_outside_list_is_set() so we get automatic 'white-list'
validation of supported options with different segment types.
Drop used lp->cache, lp->cache and use seg_is_cache(), seg_is_thin()
Draw clear border where is the last moment we could change create
segment type.
When segment type is given with --type - do not allow it to be changed
later.
Put together tests related to individual segment types.
Finish cache conversion at proper part of lv_manip code after
the vg_metadata are written - so we could correcly clean-up created
stripe LV for cache volume.
We want to print smarter warning message only when
the zeroing was not provided on the first zeroable segment
of newly created LV.
Put warning within _should_wipe_lv function to avoid reevaluation
of same conditions twice.
Hide creation of temporary LVs and print them only in verbose mode.
e.g. hides confusing message about creation of _pmspare
device during creation of pool.
Instead of segtype->ops->name() introduce lvseg_name().
This also allows us to leave name() function 'empty' for default
return of segtype->name.
TODO: add functions for rest of ops->
When we are given an existing LV name - it needs to be allowed
to pass in even restricted name as the LV could have existed
long before we introduced some new restriction on prefix/suffix.i
Fix the regression on name limits and drop restriction to be applied
on any existing LVs - only the new created LV names have to be
complient with current name restrictions.
FIXME: we are currently using restricted names incorrectly in few
other places - device_is_usable() skips restricted names,
and udev flags are also incorrectly set for restricted names
so these LVs are not getting links properly.
Move code to better locations.
Improve test and remove invalid ones
(i.e. no reason to require cache size to be >= then origin).
Correctly comment where the code is doing actual conversion
of other existing volume - we do already a similar thing with
external origins.
Lots of new command line options and combinations is now supported.
Hopefully older syntax still works as well.
lvcreate --cache --cachepool vg/pool -l1
lvcreate --type cache --cachepool vg/pool -l1
lvcreate --type cache-pool vg/pool -l1
lvcreate --type cache-pool --name pool vg -l1
... and many many more ...