IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
Refactor the recent metadata-reading optimisation patches.
Remove the recently-added cache fields from struct labeller
and struct format_instance.
Instead, introduce struct lvmcache_vgsummary to wrap the VG information
that lvmcache holds and add the metadata size and checksum to it.
Allow this VG summary information to be looked up by metadata size +
checksum. Adjust the debug log messages to make it clear when this
shortcut has been successful.
(This changes the optimisation slightly, and might be extendable
further.)
Add struct cached_vg_fmtdata to format-specific vg_read calls to
preserve state alongside the VG across separate calls and indicate
if the details supplied match, avoiding the need to read and
process the VG metadata again.
Since we take a lock inside vg_lock_newname() and we do a full
detection of presence of vgname inside all scanned labels,
there is no point to do this for second time to be sure
there is no such vg.
The only side-effect of such call would be a full validation of
some already exising VG metadata - but that's not the task for
vgcreate when create a new VG.
This call noticable reduces number of scans during 'vgcreate'.
When reading VG mda from multiple PVs - do all the validation only
when mda is seen for the first time and when mda checksum and length
is same just return already existing VG pointer.
(i.e. using 300PVs for a VG would lead to create and destroy 300 config trees....)
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
The seg_monitor did not display monitored status for thick snapshots
and mirrors (with mirror log *not* mirrored). The seg monitor did work
correctly even before for other segtypes - thins and raids.
Before (mirrors and snapshots, only mirrors with mirrored log properly displayed monitoring status):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot
With this patch applied (monitoring status displayed for all mirrors and snapshots):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public monitored
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot monitored
Set ACCESS_NEEDS_SYSTEM_ID VG status flag whenever there is
a non-lvm1 system_id set. Prevents concurrent access from
older LVM2 versions.
Not set on VGs that bear a system_id only due to conversion
from lvm1 metadata.
format_text processes both lvm2 on-disk metadata and metadata read
from other sources such as backup files. Add original_fmt field
to retain the format type of the original metadata.
Before this patch, /etc/lvm/archives would contain backups of
lvm1 metadata with format = "lvm2" unless the source was lvm1 on-disk
metadata.
The vg->lvm1_systemd_id needs to be initialized as all the code around
counts with that. Just like we initialize lvm1_system_id in vg_create
(no matter if it's actually LVM1 or LVM2 format), this patch adds this
init in alloc_vg as well so the rest of the code does not segfaul
when trying to access vg->lvm1_system_id.
In log messages refer to it as system ID (not System ID).
Do not put quotes around the system_id string when printing.
On the command line use systemid.
In code, metadata, and config files use system_id.
In lvmsystemid refer to the concept/entity as system_id.
The only realistic way for a host to have active LVs in a
foreign VG is if the host's system_id (or system_id_source)
is changed while LVs are active.
In this case, the active LVs produce an warning, and access
to the VG is implicitly allowed (without requiring --foreign.)
This allows the active LVs to be deactivated.
In this case, rescanning PVs for the VG offers no benefit.
It is not possible that rescanning would reveal an LV that
is active but wasn't previously in the VG metadata.
cmirror uses the CPG library to pass messages around the cluster and maintain
its bitmaps. When a cluster mirror starts-up, it must send the current state
to any joining members - a checkpoint. When mirrors are large (or the region
size is small), the bitmap size can exceed the message limit of the CPG
library. When this happens, the CPG library returns CPG_ERR_TRY_AGAIN.
(This is also a bug in CPG, since the message will never be successfully sent.)
There is an outstanding bug (bug 682771) that is meant to lift this message
length restriction in CPG, but for now we work around the issue by increasing
the mirror region size. This limits the size of the bitmap and avoids any
issues we would otherwise have around checkpointing.
Since this issue only affects cluster mirrors, the region size adjustments
are only made on cluster mirrors. This patch handles cluster mirror issues
involving pvmove, lvconvert (from linear to mirror), and lvcreate. It also
ensures that when users convert a VG from single-machine to clustered, any
mirrors with too many regions (i.e. a bitmap that would be too large to
properly checkpoint) are trapped.
A foreign VG should be silently ignored by a reporting/display
command like 'vgs'. If the reporting/display command specifies
a foreign VG by name on the command line, it should produce an
error message.
Scanning commands pvscan/vgscan/lvscan are always allowed to
read and update caches from all PVs, including those that belong
to foreign VGs.
Other non-report/display/scan commands always ignore a foreign
VG, or report an error if they attempt to use a foreign VG.
vgimport should always invalidate the lvmetad cache because
lvmetad likely holds a pre-vgexported copy of the VG.
(This is unrelated to using foreign VGs; the pre-vgexported
VG may have had no system_id at all.)
When checking whether the system ID permits access to a VG, check for
each permitted situation first, and only then issue the appropriate
error message. Always issue a message for now. (We'll try to
suppress some of those later when the VG concerned wasn't explicitly
requested.)
Add more messages to try to ensure every return code is checked and
every error path (and only an error path) contains a log_error().
Add self-correction to vgchange -c to deal with situations where
the cluster state and system ID state are out-of-sync (e.g. if
old tools were used).
Move the lvm1 sys ID into vg->lvm1_system_id and reenable the #if 0
LVM1 code. Still display the new-style system ID in the same
reporting field, though, as only one can be set.
Add a format feature flag FMT_SYSTEM_ON_PVS for LVM1 and disallow
access to LVM1 VGs if a new-style system ID has been set.
Treat the new vg->system_id as const.
Dop unused value assignments.
Unknown is detected via other combination
(!linear && !striped).
Also change the log_error() message into a warning,
since the function is not really returning error,
but still keep the INTERNAL_ERROR.
Ret value is always set later.
The dev ext source must be reset for the dev_cache_get call
(which evaluates filters), not lvmcache_label_scan - so fix
original commit 727c7ff85d.
Also, add comments in _pvcreate_check fn explaining why
refresh filter and rescan is needed and exactly in which
situations.
Before, we refreshed filters and we did full rescan of devices if
we passed through wiping (wipe_known_signatures fn call). However,
this fn returns success even if no signatures were found and so
nothing was wiped. In this case, it's not necessary to do the
filter refresh/rescan of devices as nothing changed clearly.
This patch exports number of wiped signatures from all the
wiping functions below. The caller (_pvcreate_check) then checks
whether any wiping was done at all and if not, no refresh/rescan
is done, saving some time and resources.
pvcreate code path executes signature wiping if there are any signatures
found on device to prepare the device for PV. When the signature is wiped,
the WATCH udev rule triggers the event which then updates udev database
with fresh info, clearing the old record about previous signature.
However, when we're using udev db as dev-ext source, we'd need to wait
for this WATCH-triggered event. But we can't synchronize against such
events (at least not at this moment). Without this sync, if the code
continues, the device could still be marked as containing the old
signature if reading udev db. This may end up even with the device
to be still filtered, though the signature is already wiped.
This problem is then exposed as (an example with md components):
$ mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb --run
$ mdadm -S /dev/md0
$ pvcreate -y /dev/sda
Wiping linux_raid_member signature on /dev/sda.
/dev/sda: Couldn't find device. Check your filters?
$ echo $?
5
So we need to temporarily switch off "udev" dev-ext source here
in this part of pvcreate code until we find a way how to sync
with WATCH events.
(This problem does not occur with signature wiping which we do
on newly created LVs since we already handle this properly with
our udev flags - the LV_NOSCAN/LV_TEMPORARY flag. But we can't use
this technique for non-dm devices to keep WATCH rule under control.)
for_each_sub_lv() now scans in depth also pools, however for
rename we actually do want to skip pools.
So add a new for_each_sub_lv_except_pools() to be used by rename,
every other user of for_each_sub_lv() scans every sub LV with pools
included.
This is i.e. necessary for properly working preload of pools
that are using raid arrays.
This is a regression from v115 where some of the fields/properties
were converted to using the common "struct lvinfo" and
"struct lv_seg_status" so we don't need to issue info and status
ioctl several times per one reported line. Not all fields are
converted yet, but one that *is* converted is the lv_attr field
with the lv_attr_dup counterpart used in lvm_lv_get_attr lvm2app fn.
These changes were introduced with e34b004422
and later - this patch introduced the "info_ok" field in the
lv_with_info_and_seg_status structure which encapsulates the lvinfo
and lv_seg_status struct.
For the lv_attr_dup, the lv_attr_dup code missed the
assignment for the "info_ok" flag which saves the result of the
lv_info_with_seg_status call. Hence such info was marked
as unusable - unknown and it was returned as such via lvm_lv_get_attr
lvm2app fn.
When raid leg is extracted, now the preload code handles this state
correctly and put proper new table entry into dm tree,
so the activation of extracted leg and removed metadata works
after commit.
Rename original lv_error_when_full field to lv_when_full and also
convert it from binary field to string field displaying three
possible values: "error", "queueu" or "" (blank for undefined).
$ lvs vg/pool vg/pool1 vg/linear_lv -o+lv_when_full
LV VG Attr LSize Data% Meta% WhenFull
linear_lv vg -wi-a----- 4.00m
pool vg twi-aotz-- 4.00m 0.00 0.98 queue
pool1 vg twi-a-tz-- 4.00m 0.00 0.88 error
For -S|--select these synonyms are recognized:
"error" -> "error when full", "error if no space"
"queue" -> "queue when full", "queue if no space"
"" -> "undefined"
Recently the single 'status' code has been used for number of cache
features.
Extend the API a little bit to allow usage also for lv_attr_dup.
As the function itself is used in lvm2api - add a new function:
lv_attr_dup_with_info_and_seg_status() that is able to use
grabbed info & status information.
report_init() is now using directly passed lvdm struct pointer
which holds the infomation whether lv_info() was correctly obtained or
there was some error when trying to read it.
Move 'healt' attribute to status.
TODO convert raid function to use the already known status.
The previous patch felt short WRT disabling allocation on PVs holding other
legs of the RAID LV persistently; this patch introduces an internal,
transient PV flag PV_ALLOCATION_PROHIBITED to address this very problem.
General problem description for completeness:
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
Better than previous patch which changed log_warn to log_error -
we can have multiple MDAs and if one of them fails to be written,
we can still continue with other MDAs if we're in a mode where
we can handle missing PVs - so keep the log_warn for single
failed MDA write as it was before.
However, add log_error with "Failed to write VG <vg_name>." in
case we're not handling missing PVs or no MDA was written at all
during VG write process. This also prevents an internal error in
which the vg_write fails and we're not issuing any other log_error
in vg_write caller or above, so we end up with:
"Internal error: Failed command did not use log_error".
$ lvcreate -l1 -m1 --type mirror vg
Logical volume "lvol0" created.
$ lvconvert --type raid1 vg/lvol0
Before:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_mimage_0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_mimage_1_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Incorrect name: lvol0_mimage_0_rimage_0
With this patch applied:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Proper name: lvol0_rimage_0