IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Better than previous patch which changed log_warn to log_error -
we can have multiple MDAs and if one of them fails to be written,
we can still continue with other MDAs if we're in a mode where
we can handle missing PVs - so keep the log_warn for single
failed MDA write as it was before.
However, add log_error with "Failed to write VG <vg_name>." in
case we're not handling missing PVs or no MDA was written at all
during VG write process. This also prevents an internal error in
which the vg_write fails and we're not issuing any other log_error
in vg_write caller or above, so we end up with:
"Internal error: Failed command did not use log_error".
At first, all snapshot-origins where marked as unusable unconditionally
here, but we can't cut off whole snapshot-origin use in a stack just
because of this possible mirror state. This whole "device_is_usable"
check was even incorrectly part of persistent filter before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (where filter cleanup was
done).
The persistent filter is used only if obtain_device_list_from_udev=0,
which means that the former check for snapshot-origin here had not even
been hit with default configuration for a few years before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (the check for snapshot-origin and
skipping of this LV was introduced with commit a71d6051ed
back in 2010).
The obtain_device_list_from_udev=1 (and hence not using persistent
filter and hence not hitting this check for snapshot-origins and skipping) has been
in action since commit edcda01a1e (that is 2011).
So for 3 years this condition was not even checked with default configuration,
making it superfluous.
This all changed in 2014 with commit 8a843d0d97
where "filter-usable" is introduced and since then all snapshot-origins
have been marked as unusable more often than before and making snapshot-origins
practically unusable in a stack.
This patch removes this incorrect check from commit a71d6051ed
which caused snapshot-origins to be unusable more often recently.
If we want to fix this eventually in a correct way, we need to look
down the stack and if snapshot-origin is hit and there's a blocked
mirror underneath, only then mark the device as unusable. But mirrors
in stack are not supported anymore so it's questionable whether it's
worth spending more time on this at all...
$ lvcreate -l1 -m1 --type mirror vg
Logical volume "lvol0" created.
$ lvconvert --type raid1 vg/lvol0
Before:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_mimage_0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_mimage_1_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Incorrect name: lvol0_mimage_0_rimage_0
With this patch applied:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Proper name: lvol0_rimage_0
When mirror has missing PVs and there are mirror images on those missing
PVs, we delete the images and during this delete operation, we also
reactivate the LV. But if we're trying to reactivate the LV in cluster
which is not active and at the same time cmirrord is not running (which
is OK since we may have created the mirror LV as inactive), we end up
with:
"Error locking on node <node_name>: Shared cluster mirrors are not available."
That is because we're trying to activate the mirror LV without cmirrord.
However, there's no need to do this reactivation if the mirror LV (and
hence it's sub LVs) were not activated before.
This issue caused failure in mirror-vgreduce-removemissing.sh test
recently with this sequence (excerpt from the test script):
prepare_lvs_
lvcreate -an -Zn -l2 --type mirror -m1 --nosync -n $lv1 $vg "$dev1" $dev2" "$dev3":$BLOCKS
mimages_are_on_ $lv1 "$dev1" "$dev2"
mirrorlog_is_on_ $lv1 "$dev3"
aux disable_dev "$dev2"
vgreduce --removemissing --force $vg
The important thing about that test is that we're not running cmirrord,
we're activating the mirror with "-an" so it's inactive and then
vgreduce --removemissing tries to reactivate the mirror images
as part of the _delete_lv function call inside and since cmirrord
is not running, we end up with the "Shared cluster mirrors are not
available." error.
When creating cluster mirrors while they're not supposed to be activated
immediately after creation, we don't need to check for cmirrord availability.
We can just create these mirrors and let the check to be done on activation
later on. This is addendum for commit cba6186325.
When creating/activating clustered mirrors, we should have cmirrord
available and running. If it's not, we ended up with rather cryptic
errors like:
$ lvcreate -l1 -m1 --type mirror vg
Error locking on node 1: device-mapper: reload ioctl on failed: Invalid argument
Failed to activate new LV.
$ vgchange -ay vg
Error locking on node node 1: device-mapper: reload ioctl on failed: Invalid argument
This patch adds check for cmirror availability and it errors out
properly, also giving a more precise error messge so users are able
to identify the source of the problem easily:
$ lvcreate -l1 -m1 --type mirror vg
Shared cluster mirrors are not available.
$ vgchange -ay vg
Error locking on node 1: Shared cluster mirrors are not available.
Exclusively activated cluster mirror LVs are OK even without cmirrord:
$ vgchange -aey vg
1 logical volume(s) in volume group "vg" now active
Since GET_FIELD_RESERVED_VALUE always returns a pointer, don't reference
it with "&" when used - we already have that pointer value (this is an
addendum to recent commit 028ff30947).
Only GET_TYPE_RESERVED_VALUE needs to be referenced with "&" as it
returns directly the value of that type.
We have to use empty list, not NULL if we want to denote that the list
has no items. Otherwise, the code further can segfault as it expects
there's always a sane value (= some list), including empty list,
but never NULL.
Use helper macros to handle reserved values and also define "undefined"
reserved value as:
FIELD_RESERVED_VALUE(cache_policy, cache_policy_undef, "", "", "undefined")
Which means:
- print "" if the cache_policy value is undefined (the first name for this reserved value is "")
- recognize "undefined" reserved name as synonym to ""
(so statements like "lvs -S cache_policy=undefined" are still recognized)
Avoid making a copy of the keyword which is already registered in
values.h for "unmanaged" (vg_mda_copies field) and "auto" reserved
value (lv_read_ahead field). Also use helper macros to handle these
reserved - this is the correct approach - just do not copy the same
thing again and do not mix it! The GET_FIELD_RESERVED_VALUE and
GET_FIRST_RESERVED_NAME macros guarantees this - use it!
In addition to that, rename reserved values:
vg_mda_copies --> vg_mda_copies_unmanaged
lv_read_ahead --> lv_read_ahead_auto
So the field reserved values follows this scheme:
"<field_name>_<reserved_value_name>".
The same applies for type reserved values with this scheme:
"<report type name in lowercase>_<reserved_value_name>"
Add a comment about this scheme for others to follow as well
when adding new fields and their reserved values. This makes
it a bit easier to read the code then.
RESERVED(id) --> GET_TYPE_RESERVED_VALUE(id)
FIRST_NAME(id) --> GET_FIRST_RESERVED_NAME(id)
Also add GET_FIELD_RESERVED_VALUE(id) macro to get per-field reserved value.
This makes it much more readable and hopefully it'll make it
easier to use these helper macros when adding new reporting
fields with reserved values if needed.
The cache policy name taken as LV segment property must be duped
for report as the VG/LV/seg structure is destroyed after processing,
reporting happens later:
$ valgrind lvs -o+cache_policy
...
==16589== Invalid read of size 1
==16589== at 0x54ABCC3: dm_report_compact_fields
(libdm-report.c:1739)
==16589== by 0x153FC7: _report (reporter.c:619)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
==16589== Address 0x7d465f2 is 8,338 bytes inside a block of size
16,384 free'd
==16589== at 0x4C2ACE9: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==16589== by 0x54B8C85: _free_chunk (pool-fast.c:318)
==16589== by 0x54B84FB: dm_pool_destroy (pool-fast.c:78)
==16589== by 0x1E59C7: _free_vg (vg.c:78)
==16589== by 0x1E5A6D: release_vg (vg.c:95)
==16589== by 0x159B6E: _process_lv_vgnameid_list (toollib.c:1967)
==16589== by 0x159DD7: process_each_lv (toollib.c:2030)
==16589== by 0x153ED8: _report (reporter.c:598)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
When we split leg from raid - we take a proper new lock for a new LV.
However for now activation checks only 'existince' of device UUID,
but it's not validating device has a proper name.
As a quick fix call suspend()/resume() to rename after split mirror.
Just call return 0 directly on error path, without using
"goto" - the code is short, no need to use it this way
(the dead code appeared as part of further changes in this
function).
When chunk size needs to be estimated, the code missed to round
to proper 64kb boundaries (or power of 2 for older thin pool driver).
So for some data and metadata size (i.e. 10GB and 4MB) it resulted
in incorrect chunk size (not being a multiple of 64KB)
Fix it by adding proper rounding and also use 1 routine for 2 places
where the same calculation is made.
Fix also incorrect printed warning that has used 'ffs()'
(which returns first 'least significant' bit in word)
and it was not really giving any useful size info and replace it
with properly estimated chunk size.
Fix regression introduced with a2c1024f6a
_setup_task(mknodes ? name : NULL...
has been replaced with:
_setup_task(type != MKNODES ? name : NULL....
Use '=='
Use log_warn when we are effectively not creating an error -
we 'allowed' inconsistent read for a reason - so it's just warning
level we process inconsistent VG - it's upto caller later to decide
error level of command return value and in case of error it needs
to use log_error then.
Failed recovery provides different (NULL) VG then FAILED_INCONSISTENT.
Mark it with different failure bit - since FAILED_INCONSISTENT is
supposed to contain something 'usable' (thought inconsistent).
- Add separate lv_status fn (if we're interested only in seg status,
but not lv info at the same time as it is with existing
lv_info_with_seg_status fn). So we 3 fns:
- lv_info (existing one, runs only info ioctl, fills in struct lvinfo only)
- lv_status (new one, runs status ioctl, fills in struct lv_seg_status only)
- lv_info_with_seg_status (existing one, runs status ioctl, fills
in struct lvinfo as well as lv_seg_status)
- Add more comments in the code explaining the difference between lv_info,
lv_status and lv_info_with_seg_status and their return values.
- Move decision whether lv_info_with_seg_status needs to call only
status ioctl (in case the segment for which we require status is from
the LV for which we require info) or separate status and info ioctl
(in case the segment for which we require status is from different
LV that the one for which we require info) into
lv_info_with_seg_status fn so caller doesn't need to bother about
this at all.
- Cleanup internal interface for this seg status so it's more readable.
Since we support device stack of pools over pool
(thin-pool with cache data volume) the existing code
is no longer able to detect orphan _pmspare.
So instead do a _pmspare check after volume removal,
and remove spare afterwards.
We need to stop guessing deleted names - so rather collect
deleted UUID into a string list - and then remove them properly
in _clean_tree. Restore origin _clean_tree behaviour them for
currently unconverted removal of snapshots.
Pending delete feature now properly tracks whole subtree of cache
(so i.e. data or metadata as raid volumes).
It properly replaces all related volumes with 'errors' in suspend
preload, then resume them as error and remove collected UUIDs
from root - since they are not longer part of any volume deps.
This would be in case the pool segment was not found.
LVM2.2.02.112/lib/metadata/pool_manip.c:238:36: warning: Access to field 'segtype' results in a dereference of a null pointer (loaded from variable 'pool_seg')
LVM2.2.02.112/lib/metadata/cache_manip.c:73: overflow_before_widen: Potentially overflowing expression "*pool_metadata_extents *vg->extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->len * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->le * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:196:5: warning: 'dmtask' may be used uninitialized in this function [-Wmaybe-uninitialized]
In _info_run fn:
switch (type) {
case INFO:
...
case STATUS:
...
case MKNODES:
...
}
The "type" is enum and currently only those three types are supported,
but if we added a new type in the future, this would end up with a bug
(if we forgot to add the new "case" in that "switch"). So let's make
sure proper internal error is printed:
default:
log_error(INTERNAL_ERROR "_info_run: unhandled info type");
return 0;
LVM2.2.02.112/tools/toollib.c:1991: leaked_storage: Variable "iter" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/filters/filter-usable.c:89: leaked_storage: Variable "f" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/activate/dev_manager.c:1874: leaked_handle: Handle variable "fd" going out of scope leaks the handle.
When getting status for LV segment types, we need to be sure
that proper segment is selected for the status ioctl.
When reporting fields that require status ioctl,
the "_choose_lv_segment_for_status_report" fn in tools/reporter.c
must be completed properly to choose the proper segment for all
the LV types (at the moment, it just takes the first LV segment
by default).
This works fine with cache LVs surely. The other segment types
need more auditing. We use this status ioctl only for cache status
fields at the moment only, so restrict it to the cache only.
Once the _choose_lv_segment_for_status_report is completed
properly, release the restriction in _get_segment_status_from_target_params.
Similar to LVSINFO type which gathers LV + its DM_DEVICE_INFO, the
new LVSSTATUS/SEGSSTATUS report type will gather LV/segment + its
DM_DEVICE_STATUS.
Since we can report status only for certain segment, in case
of LVSSTATUS we need to choose which segment related to the LV
should be processed that represents the "LV status". In case of
SEGSSTATUS type it's clear - the status is reported for the
segment just processed.
The former struct lv_with_info is renamed to lv_with_info_and_seg_status as it can
hold more than just "info", there's lv's segment status now in addition:
struct lv_with_info_and_seg_status {
struct logical_volume *lv;
struct lvinfo *info;
struct lv_seg_status *seg_status;
}
Where struct lv_seg_status is:
struct lv_seg_status {
struct dm_pool *mem;
struct lv_segment lv_seg;
lv_seg_status_type_t type;
void *status; /* struct dm_status_* */
}
Where lv_seg points to lv's segment that is being reported or
processed in general.
New struct lv_seg_status keeps the information about segment status -
the status retrieved via DM_DEVICE_STATUS ioctl. This information will
be used for reporting dm device target status for the LV segment
specified.
So this patch introduces third level of LV information that is
kept for reuse while reporting fields within one reporting line,
causing only one DM_DEVICE_STATUS ioctl call per LV segment line
reported (otherwise we'd need to call the DM_DEVICE_STATUS for each
segment status field in one LV segment/reporting line which is not
efficient).
This is following exactly the same principle as already introduced
by commit ecb2be5d16.
So currently we have three levels of information that can be used
to report an LV/LV segment:
- LV metadata itself (struct logical_volume *lv)
- LV's DM_DEVICE_INFO ioctl result (struct lvinfo *info)
- LV's segment DM_DEVICE_STATUS ioctl result (this status must be
bound to a segment, not the whole LV as the whole LV may be
composed of several segments of course)
(this is the new struct lv_seg_status *seg_status)
Do not use 'any' policy name as a value in config tree - so we stick
with 'policy_settings' and extra 'policy_name' for libdm params.
Update lvm2 API as well.
Example of supported metadata:
policy = "mq"
policy_settings {
migration_threshold = 2048
sequential_threshold = 512
random_threshold = 4
read_promote_adjustment = 10
}
Support new PASSTHROUGH 'feature' flag.
Add dm_config_node to pass in policy args.
Really use origin_uuid instead of using extra call
to pass seg_areas.
Switch to 64bit feature flag bit set so there is
enough space in future for new bits...
More efficient spare volume creation. Save 1 extra commit
and properly activate this volume according to our cluster
activation rules (using lv_active_change() for this).
Since we 'layer' for cache origin which and we support dropping
cache layer - we need to restore origin name in case
the origin LV is more complex target - i.e. raid.
Drop _corig from name
Cleanup and rename parent -> parent_lv.
Revert part of commit 51a29e6056,
it's probably bad idea to continue with any recovery, when
vg_write() or vg_commit() fail - so it's better to leave it as it is.
When deactivating origin, we may have possibly left table in broken state,
where origin is not active, but snapshot volume is still present.
Let's ensure deactivation of origin detects also all associated
snapshots are inactive - otherwise do not skip deactivation.
(so i.e. 'vgchange -an' would detect errors)
Let's use this function for more activations in the code.
'needs_exlusive' will enforce exlusive type for any given LV.
We may want to activate LV in exlusive mode, even when we know
the LV (as is) supports non-exlusive activation as well.
lvcreate -ay -> exclusive & local
lvcreate -aay -> exclusive & local
lvcreate -aly -> exclusive & local
lvcreate -aey -> exclusive (might be on any node).
Activate of new/unused/empty thin pool volume skips
the 'overlay' part and directly provides 'visible' thin-pool LV to the user.
Such thin pool still gets 'private' -tpool UUID suffix for easier
udev detection of protected lvm2 devices, and also gets udev flags to
avoid any scan.
Such pool device is 'public' LV with regular /dev/vgname/poolname link,
but it's still 'udev' hidden device for any other use.
To display proper active state we need to do few explicit tests
for this condition.
Before it's used for any lvm2 thin volume, deactivation is
now needed to avoid any 'race' with external usage.
Call check_new_thin_pool() to detect in-use thin-pool.
Save extra reactivation of thin-pool when thin pool is not active.
(it's now a bit more expensive to invoke thin_check for new pools.)
For new pools:
We now active locally exclusively thin-pool as 'public' LV.
Validate transaction_id is till 0.
Deactive.
Prepare create message for thin-pool and exclusively active pool.
Active new thin LV.
And deactivate thin pool if it used to be inactive.
Allowing 'external' use of thin-pools requires to validate even
so far 'unused' new thin pools.
Later we may have 'smarter' way to resolve which thin-pools are
owned by lvm2 and which are external.
Show some stats with 'lvs'
Display same info for active cache volume and cache-pool.
data% - #used cache blocks/#total cache blocks
meta% - #used metadata blocks/#total metadata blocks
copy% - #dirty/#used cache blocks
TODO: maybe there is a better mapping
- should be seen as first-try-and-see.
When the cache pool is unused, lvm2 code will internally
allow to activate such cache-pool.
Cache-pool is activate as metadata LV, so lvm2 could easily
wipe such volume before cache-pool is reused.
Replace lv_cache_block_info() and lv_cache_policy_info()
with lv_cache_status() which directly returns
dm_status_cache structure together with some calculated
values.
After use mem pool stored inside lv_status_cache structure
needs to be destroyed.
Add init of no_open_count into _setup_task().
Report problem as warning (cannot happen anyway).
Also drop some duplicated debug messages - we have already
printed the info about operation so make log a bit shorter.