IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Commit b00711e312 improperly
convert _area_missing() replacment and moved check for
AREA_PV seg_type() into same if() section.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
If lvmetad is not used, we generate lvm2-activation{-early,-net}.service
systemd services to activate any VGs found on the system. So far we used
--sysinit during this activation as polling was still forked off of the
lvm activation command.
This has changed with lvmpolld - we have proper lvmpolld systemd
service now (activated via its socket unit). As such, we don't need
to use --sysinit anymore during activation in systemd environment
as polling was the only barrier to remove the need for --sysinit.
as of now lvmpolld works as client utility for
querying running instance of lvmpolld server
on metadata, state, etc.
Currently the only request implemented is the '--dump'.
It prints out full lvmpolld state (mimics lvmdump -p command).
Configure provides proper settings for
use_lvmetad and use_lvmpolld conf setttings.
When the build of polld & lvmetad, these settings
are enabled by default unless explicitelly disabled
with --disable-use-lvmetad/--disable-use-lvmpolld.
Use 64bit arithmentic for PV size calculation (Coverity).
Also remove sector shift for compared PV size, since all
values are already held in sectors.
This fixes validatio of PV size when restoring PV
from vg metadata backup file.
Reinstate config settings matching the last release until every
case where the generator produces different output has been reviewed
and fresh decisions made about which defaults to expose as protection
against changes in newer releases. We should be trying to reduce, not
increase, this number.
This patch adds supporting code for handling deprecated settings.
Deprecated settings are not displayed by default in lvmconfig output
(except for --type current and --type diff). There's a new
"--showdeprecated" lvmconfig option to display them if needed.
Also, when using lvmconfig --withcomments, the comments with info
about deprecation are displayed for deprecated settings and with
lvmconfig --withversions, the version in which the setting was
deprecated is displayed in addition to the version of introduction.
If using --atversion with a version that is lower than the one
in which the setting was deprecated, the setting is then considered
as not deprecated (simply because at that version it was not
deprecated).
For example:
$ lvmconfig --type default activation
activation {
...
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated
activation {
...
mirror_region_size=512
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withversions
activation {
...
# Available since version 1.0.0.
# Deprecated since version 2.2.99.
mirror_region_size=512
# Available since version 2.2.99.
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withcomments
activation {
...
# Configuration option activation/mirror_region_size.
# This has been replaced by the activation/raid_region_size
# setting.
# Size (in KB) of each copy operation when mirroring.
# This configuration option is deprecated.
mirror_region_size=512
# Configuration option activation/raid_region_size.
# Size in KiB of each raid or mirror synchronization region.
# For raid or mirror segment types, this is the amount of
# data that is copied at once when initializing, or moved
# at once by pvmove.
raid_region_size=512
...
}
$ lvmconfig --type default activation --withcomments --atversion 2.2.98
activation {
...
# Configuration option activation/mirror_region_size.
# Size (in KB) of each copy operation when mirroring.
mirror_region_size=512
...
}
These settings are in the "unsupported" group:
devices/loopfiles
log/activate_file
metadata/disk_areas (section)
metadata/disk_areas/<disk_area> (section)
metadata/disk_areas/<disk_area>/size
metadata/disk_areas/<disk_area>/id
These settings are in the "advanced" group:
devices/dir
devices/scan
devices/types
global/proc
activation/missing_stripe_filler
activation/mlock_filter
metadata/pvmetadatacopies
metadata/pvmetadataignore
metadata/stripesize
metadata/dirs
Also, this patch causes the --ignoreunsupported and --ignoreadvanced
switches to be honoured for all config types (lvmconfig --type).
By default, the --type current and --type diff display unsupported
settings, the other types ignore them - this patch also introduces
--showunsupported switch for all these other types to display even
unsupported settings in their output if needed.
lvmconfig --type list displays plain list of configuration settings.
Some of the existing decorations can be used (--withsummary and
--withversions) as well as existing options/switches (--ignoreadvanced,
--ignoreunsupported, --ignorelocal, --atversion).
For example (displaying only "config" section so the list is not long):
$lvmconfig --type list config
config/checks
config/abort_on_errors
config/profile_dir
$ lvmconfig --type list --withsummary config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig -l config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig --type list --withsummary --withversions config
config/checks - If enabled, any LVM configuration mismatch is reported. [2.2.99]
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found. [2.2.99]
config/profile_dir - Directory where LVM looks for configuration profiles. [2.2.99]
Example with --atversion (displaying global section):
$ lvmconfig --type list global
global/umask
global/test
global/units
global/si_unit_consistency
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/etc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/prioritise_write_locks
global/library_dir
global/locking_library
global/abort_on_internal_errors
global/detect_internal_vg_cache_corruption
global/metadata_read_only
global/mirror_segtype_default
global/raid10_segtype_default
global/sparse_segtype_default
global/lvdisplay_shows_full_device_path
global/use_lvmetad
global/thin_check_executable
global/thin_dump_executable
global/thin_repair_executable
global/thin_check_options
global/thin_repair_options
global/thin_disabled_features
global/cache_check_executable
global/cache_dump_executable
global/cache_repair_executable
global/cache_check_options
global/cache_repair_options
global/system_id_source
global/system_id_file
$ lvmconfig --type list global --atversion 2.2.50
global/umask
global/test
global/units
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/library_dir
global/locking_library
'lvm dumpconfig' now does a lot more than just dumping configuration
information and is no longer only a support tool. Users now need
to run it to find out about configuration information that has been
removed from the lvm.conf man page so we need to promote this to full
command line status as 'lvmconfig'. Also accept 'lvm config' and mention
it in the usage information of lvmconf (which should also get merged in
eventually).
Show full chain of ancestors and descendants for snapshots
(both thick and thin - in case of thick, the "ancestor" field
is actually equal to "origin" field as snapshots can't be
chained for thick snapshots).
These fields display current state as it is, they do not
display any history! If the snapshot chain is broken in
the middle, we don't report the historical origin (this
is going to be a part of another patch and a different
set of fields or just a switch for existing fields to
show ancestors and descendants with history included).
For example:
(origin --> snapshot)
lvol1 --> lvol2 --> lvol3 --> lvol4
\
--> lvol5 --> lvol6 --> lvol7 --> lvol8
$ lvs -o name,pool_lv,origin,ancestors,descendants vg
LV Pool Origin Ancestors Descendants
lvol1 pool lvol2,lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol2 pool lvol1 lvol1 lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol3 pool lvol2 lvol2,lvol1 lvol4
lvol4 pool lvol3 lvol3,lvol2,lvol1
lvol5 pool lvol2 lvol2,lvol1 lvol6,lvol7,lvol8
lvol6 pool lvol5 lvol5,lvol2,lvol1 lvol7,lvol8
lvol7 pool lvol6 lvol6,lvol5,lvol2,lvol1 lvol8
lvol8 pool lvol7 lvol7,lvol6,lvol5,lvol2,lvol1
add info for various commits, most significant were:
- toollib: close connection to lvmetad after fork
(fe30658a4d)
- toollib: do not spawn polling in lv_change_activate
(c26d81d6e6)
- pvmove: split pvmove_update_metadata function
(65623b63a2)
- lvconvert: move poll code in before refactoring
(5190f56605)
- pvmove: move poll code in before refactoring
(a098aa419f)
--withfullcomments prints all comment lines for each config option.
--withcomments prints only the first comment line, which should be
a short one-line summary of the option.
This removes dependency on lvm binary - if it's not present, all LVM
processing is skipped (shouldn't normally happen because if lvm binary
is missing then there's obviously nothing that would activate it, but
let's make sure).
Without this tight dependency on lvm, the blkdeactivate script can
be packaged with libdevmapper/dmsetup (in contrast to lvm as it was
before) and as such the script can still be used to handle other DM
devices.
This patch adds new options to lvmconf:
--enable-halvm (just like --enable-cluster, but configure LVM
for use in HA LVM - meaning disabling lvmetad and
making sure we have locking_type=1)
--disable-halvm (just like --disable-cluster, but configure LVM
back from HA LVM - meaning enabling lvmetad if
it's enabled by default and making sure we have
default locking type set)
--services (causes clvmd and lvmetad services to be enabled or
disabled appropriately and conforming to the changes
in lvm configuration we've just made with lvmconf)
--mirrorservice (in addition to clvmd and lvmetad services, also
enable or disable cmirrord service appropriately;
this is a separate option because cmirrord is
optional and it doesn't need to be always enabled
when clvmd is enabled)
--startstopservices (in addition to enabling or disabling services,
start and stop these services immediately)
These options are supposed to help users to make their system ready
for cluster with clvmd (active-active) or HA LVM (active-passive) use
while lvmconf script can handle services as well so users don't need
to bother about setting them manually.
Also, before this patch, we hardcoded global/use_lvmetad=0 as default
value in lvmconf script. Howeverm this default may change by just
flipping the value in config_settings.h and we may forget to edit
the lvmconf. It's better to use lvm dumpconfig --type default global/use_lvmetad
to get the actual default value and use this one instead of hardcoded one.
When performing initial allocation (so there is nothing yet to
cling to), use the list of tags in allocation/cling_tag_list to
partition the PVs. We implement this by maintaining a list of
tags that have been "used up" as we proceed and ignoring further
devices that have a tag on the list.
https://bugzilla.redhat.com/983600
Add A_PARTITION_BY_TAGS set when allocated areas should not share tags
with each other and allow _match_pv_tags to accept an alternative list
of tags. (Not used yet.)
pv_write is called both to write orphans and to rewrite PV headers
of PVs in VGs. It needs to select the correct VG id so that the
internal cache state gets updated correctly.
It only affected commands that involved further steps after
the pv_write and was often masked because the metadata would
be re-read off disk and correct itself.
"Incorrect metadata area header checksum" warnings appeared.
Example:
Create vg1 containing dev1, dev2 and dev3.
Hide dev1 and dev2 from the system.
Fix up vg1 with vgreduce --removemissing.
Bring back dev1 and dev2.
In a single operation reinstate dev1 and dev2 into vg1 (vgextend).
Done as separate operations (automatically fix-up dev1 and dev2 as orphans,
then vgextend) it worked, but done all in one go the internal cache got
corrupted and warnings about checksum errors appeared.
There is no benefit in waking-up all the waiters
when there is no actual change in lock state.
This avoid some unnecessarily ping-pong effects like:
Resource V_LVMTEST15724vg retrying lock in mode:WRITE...
Resource V_LVMTEST15724vg already locked lockid=40, mode:WRITE
Resource V_LVMTEST15724vg retrying lock in mode:WRITE...
Resource V_LVMTEST15724vg already locked lockid=40, mode:WRITE
Commit 80f4b4b803
introduced undesirable side-effects for lvm2app user
which happens to be our own python binding.
It appear obtaing pvs list keeps global lock.
So restricting this to VG_GLOBAL READ locks and skip
the drop skip if WRITE lock is held.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
Refactor the recent metadata-reading optimisation patches.
Remove the recently-added cache fields from struct labeller
and struct format_instance.
Instead, introduce struct lvmcache_vgsummary to wrap the VG information
that lvmcache holds and add the metadata size and checksum to it.
Allow this VG summary information to be looked up by metadata size +
checksum. Adjust the debug log messages to make it clear when this
shortcut has been successful.
(This changes the optimisation slightly, and might be extendable
further.)
Add struct cached_vg_fmtdata to format-specific vg_read calls to
preserve state alongside the VG across separate calls and indicate
if the details supplied match, avoiding the need to read and
process the VG metadata again.
Fixes segfault when 'pvs' encounters two different PVs sharing the same
uuid but one an orphan, the other in a VG.
If VG_GLOBAL is held, there seems no point in doing a full scan more
than once.
If undesirable side-effects show up, we can try restricting this to
VG_GLOBAL READ locks. The original code dates back to 2.02.40.
When pvscan --cache --major --minor command is issued from
udev REMOVE event, it basically resulted into a whole device
scan since the device was missing. So avoid such scan
and first check via /sysfs (when available) if such device actually
exists.
There is no reason to support persistent major/minor numbers
for pool volumes - it's only meant to be supported for filesystems
(since i.e. nfs may need to keep volume on a persistent device node.)
Support for pools is now explicitely disabled and documented.
Metadata areas which are marked as ignored should not be scanned
and read during pvscan --cache. Otherwise, this can cause lvmetad
to cache out-of-date metadata in case other PVs with fresh metadata
are missing by chance.
Make this to work like in non-lvmetad case where the behaviour would
be the same as if the PV was orphan (in case we have no other PVs
with valid non-ignored metadata areas).
The iscsi-shutdown.service is the one responsible for logging out
iscsi sessions so blk-availability.service (running the blkdeactivate
script) should be run before that on shutdown (so we need to use
After=iscsi-shutdown.service because "After" relates to starting
the service and the opposite order is automatically applied on
stopping the service at shutdown).
Since we take a lock inside vg_lock_newname() and we do a full
detection of presence of vgname inside all scanned labels,
there is no point to do this for second time to be sure
there is no such vg.
The only side-effect of such call would be a full validation of
some already exising VG metadata - but that's not the task for
vgcreate when create a new VG.
This call noticable reduces number of scans during 'vgcreate'.
Use similar logic as with text_vg_import_fd() and avoid repeated
parsing of same mda and its config tree for vgname_from_mda().
Remember last parsed vgname, vgid and creation_host in labeller
structure and if the metadata have the same size and checksum,
return this stored info.
TODO: The reuse of labeller struct is not ideal, some lvmcache API for
this functionality would be nicer.
When reading VG mda from multiple PVs - do all the validation only
when mda is seen for the first time and when mda checksum and length
is same just return already existing VG pointer.
(i.e. using 300PVs for a VG would lead to create and destroy 300 config trees....)
The seg_monitor did not display monitored status for thick snapshots
and mirrors (with mirror log *not* mirrored). The seg monitor did work
correctly even before for other segtypes - thins and raids.
Before (mirrors and snapshots, only mirrors with mirrored log properly displayed monitoring status):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot
With this patch applied (monitoring status displayed for all mirrors and snapshots):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public monitored
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot monitored
Set ACCESS_NEEDS_SYSTEM_ID VG status flag whenever there is
a non-lvm1 system_id set. Prevents concurrent access from
older LVM2 versions.
Not set on VGs that bear a system_id only due to conversion
from lvm1 metadata.
format_text processes both lvm2 on-disk metadata and metadata read
from other sources such as backup files. Add original_fmt field
to retain the format type of the original metadata.
Before this patch, /etc/lvm/archives would contain backups of
lvm1 metadata with format = "lvm2" unless the source was lvm1 on-disk
metadata.
Two new functions added in the init script: rh_status and rh_status_q.
First one to be used in status() and second one to be used in start(),
stop(), force_stop(). Check for 'dmeventd' added and print list of
lvs being monitored in status().
Two problems fixed by this patch:
- PV tags were not recognized at all when using them with pvs
report that has only label fields (regression since 2.02.105)
- incorrect persistent .cache file to be generated after pvs
report that has only label fields (regression since 2.02.106)
These bugs come from the transition from process_each_pv to
process_each_label introduced by commit
67a7b7a87d and commit
490226fc47 and related.
cmirror uses the CPG library to pass messages around the cluster and maintain
its bitmaps. When a cluster mirror starts-up, it must send the current state
to any joining members - a checkpoint. When mirrors are large (or the region
size is small), the bitmap size can exceed the message limit of the CPG
library. When this happens, the CPG library returns CPG_ERR_TRY_AGAIN.
(This is also a bug in CPG, since the message will never be successfully sent.)
There is an outstanding bug (bug 682771) that is meant to lift this message
length restriction in CPG, but for now we work around the issue by increasing
the mirror region size. This limits the size of the bitmap and avoids any
issues we would otherwise have around checkpointing.
Since this issue only affects cluster mirrors, the region size adjustments
are only made on cluster mirrors. This patch handles cluster mirror issues
involving pvmove, lvconvert (from linear to mirror), and lvcreate. It also
ensures that when users convert a VG from single-machine to clustered, any
mirrors with too many regions (i.e. a bitmap that would be too large to
properly checkpoint) are trapped.
Add --foreign to the remaining reporting and display commands plus
vgcfgbackup.
Add a NEEDS_FOREIGN_VGS flag for vgimport to always set --foreign.
If lvmetad is being used with --foreign, scan foreign VGs (currently
implemented as a full PV scan).
Handle these things centrally in lvmcmdline.c.
Also allow lvchange and vgchange -an/-aln to deactivate any foreign
LVs that happen to be active if something went wrong.
Remember to set the system ID when creating a new VG in vgsplit.
Move the lvm1 sys ID into vg->lvm1_system_id and reenable the #if 0
LVM1 code. Still display the new-style system ID in the same
reporting field, though, as only one can be set.
Add a format feature flag FMT_SYSTEM_ON_PVS for LVM1 and disallow
access to LVM1 VGs if a new-style system ID has been set.
Treat the new vg->system_id as const.
In 2.02.99, _init_tags() inadvertently began to ignore the
dm_config_tree struct passed to it. "tags" sections are not
merged together, so the "tags" section in the main config file was
being processed repeatedly and other "tags" sections were ignored.
Before, we refreshed filters and we did full rescan of devices if
we passed through wiping (wipe_known_signatures fn call). However,
this fn returns success even if no signatures were found and so
nothing was wiped. In this case, it's not necessary to do the
filter refresh/rescan of devices as nothing changed clearly.
This patch exports number of wiped signatures from all the
wiping functions below. The caller (_pvcreate_check) then checks
whether any wiping was done at all and if not, no refresh/rescan
is done, saving some time and resources.
pvcreate code path executes signature wiping if there are any signatures
found on device to prepare the device for PV. When the signature is wiped,
the WATCH udev rule triggers the event which then updates udev database
with fresh info, clearing the old record about previous signature.
However, when we're using udev db as dev-ext source, we'd need to wait
for this WATCH-triggered event. But we can't synchronize against such
events (at least not at this moment). Without this sync, if the code
continues, the device could still be marked as containing the old
signature if reading udev db. This may end up even with the device
to be still filtered, though the signature is already wiped.
This problem is then exposed as (an example with md components):
$ mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb --run
$ mdadm -S /dev/md0
$ pvcreate -y /dev/sda
Wiping linux_raid_member signature on /dev/sda.
/dev/sda: Couldn't find device. Check your filters?
$ echo $?
5
So we need to temporarily switch off "udev" dev-ext source here
in this part of pvcreate code until we find a way how to sync
with WATCH events.
(This problem does not occur with signature wiping which we do
on newly created LVs since we already handle this properly with
our udev flags - the LV_NOSCAN/LV_TEMPORARY flag. But we can't use
this technique for non-dm devices to keep WATCH rule under control.)
Invalid devices no longer included in the counters printed at the end.
May now need to use --ignoreskippedcluster if relying upon exit status.
If more than one change is requested per-PV, attempt to perform them
all. Note that different arguments still handle exit status
differently.
When lvmetad is used and at the same time we're getting list of all
PV-capable devices, we can't use cmd->filter (which is used to filter
out lvmetad responses - so we're sure that the devices are PVs already).
To get the list of PV-capable devices, we're bypassing lvmetad (since
lvmetad only caches PVs, not all the other devices which are not PVs).
For this reason, we have to use the "full_filter" filter chain (just
like we do when we're running without lvmetad).
Example scenario:
- sdo and sdp components of MD device md0
- sdq, sdr and sds components of mpatha multipath device
- mpatha multipath device partitioned
- vda device partitioned
=> sdo,sdp,sdr,sds, mpatha and vda should be filtered!
$ lsblk -o NAME,TYPE
NAME TYPE
sdn disk
sdo disk
`-md0 raid0
sdp disk
`-md0 raid0
sdq disk
`-mpatha mpath
`-mpatha1 part
sdr disk
`-mpatha mpath
`-mpatha1 part
sds disk
`-mpatha mpath
`-mpatha1 part
vda disk
|-vda1 part
`-vda2 part
|-fedora-swap lvm
`-fedora-root lvm
Before this patch:
==================
use_lvmetad=0 (correct behaviour!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
use_lvmetad=1 (incorrect behaviour - sdo,sdp,sdq,sdr,sds and mpatha not filtered!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/sdo --- 0 0
/dev/sdp --- 0 0
/dev/sdq --- 0 0
/dev/sdr --- 0 0
/dev/sds --- 0 0
/dev/vda --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
With this patch applied:
========================
use_lvmetad=1
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
This makes a difference when using selection criteria based on
these fields - if those fields are defined as DM_REPORT_FIELD_TYPE_SIZE
(in contrast to DM_REPORT_FIELD_TYPE_NUMBER), units are also
recognize in selection clause.
For example:
$ lvs -o+seg_start vg1/lv2
LV VG Attr LSize Start
lv2 vg1 -wi-a----- 12.00m 0
lv2 vg1 -wi-a----- 12.00m 8.00m
Before this patch:
$ lvs -o+seg_start --select 'seg_start=8m'
Found size unit specifier but numeric value expected for selection field seg_start.
Selection syntax error at 'seg_start=8m'.
Use 'help' for selection to get more help.
With this patch applied:
$lvs -o+seg_start --select 'seg_start=8m'
LV VG Attr LSize Start
lv2 vg1 -wi-a----- 12.00m 8.00m
(the same applies for ba_start and vg_free fields)
We already allowed -S|--select with {vg,lv,pv}display -C (which
was then equal to {vg,lv,pv}s command. Since we support selection
in toolib now, we can support -S also without using -C in *display
commands now.
We have 3 input report types:
- LVS (representing "_select_match_lv")
- VGS (representing "_select_match_vg")
- PVS (representing "_select_match_pv")
The input report type is saved in struct selection_handle's "orig_report_type"
variable.
However, users can use any combination of fields of different report types in
selection criteria - the resulting report type can thus differ. The struct
selection_handle's "report_type" variable stores this resulting report type.
The resulting report_type can end up as one of:
- LVS
- VGS
- PVS
- SEGS
- PVSEGS
This patch adds logic to report_for_selection based on (sensible) combination
of orig_report_type and report_type and calls appropriate reporting functions
or iterates over multiple items that need reporting to determine the selection
result.
Once LVM_COMMAND_PROFILE environment variable is specified, the profile
referenced is used just like it was specified using "<lvm command> --commandprofile".
If both --commandprofile cmd line option and LVM_COMMAND_PROFILE env
var is used, the --commandprofile cmd line option gets preference.
all sockets opened by a daemon or handed over by systemd
have to have CLOEXEC flag set. Otherwise we get nasty
warnings about leaking descriptors in processes spawned by
daemon.
for_each_sub_lv() now scans in depth also pools, however for
rename we actually do want to skip pools.
So add a new for_each_sub_lv_except_pools() to be used by rename,
every other user of for_each_sub_lv() scans every sub LV with pools
included.
This is i.e. necessary for properly working preload of pools
that are using raid arrays.
This is a regression from v115 where some of the fields/properties
were converted to using the common "struct lvinfo" and
"struct lv_seg_status" so we don't need to issue info and status
ioctl several times per one reported line. Not all fields are
converted yet, but one that *is* converted is the lv_attr field
with the lv_attr_dup counterpart used in lvm_lv_get_attr lvm2app fn.
These changes were introduced with e34b004422
and later - this patch introduced the "info_ok" field in the
lv_with_info_and_seg_status structure which encapsulates the lvinfo
and lv_seg_status struct.
For the lv_attr_dup, the lv_attr_dup code missed the
assignment for the "info_ok" flag which saves the result of the
lv_info_with_seg_status call. Hence such info was marked
as unusable - unknown and it was returned as such via lvm_lv_get_attr
lvm2app fn.
When cache_mode is undefined, the read of metadata will miss to
set a bit with mode and fails to process metadata on internal
error:
Internal error: LV vg/lvol1 has uknown feature flags 0.
Fix it by setting it to writethrough mode.
When repairing thin pool or swapping thin pool metadata,
preserve chunk_size property and avoid to be automatically changed
later in the code to better match thin pool metadata size.
When raid leg is extracted, now the preload code handles this state
correctly and put proper new table entry into dm tree,
so the activation of extracted leg and removed metadata works
after commit.
When raid is being splitted, extracted leg & metadata
is still floating in the table - and thus we need to
detect this case and properly preload their matching
table so consequent activation of extracted LVs properly
renames (and FREES) existing raid images, so ongoing
image name shifting will work.
For example, with dmeventd/executable set to "" which is not allowed for
this setting, the config validation now ends up with:
$ lvm dumpconfig --validate
Configuration setting "dmeventd/executable" invalid. It cannot be set to an empty value.
LVM configuration invalid.
This check for empty values for string config settings was not
done before (we only checked empty arrays, but not scalar strings).
Rename original lv_error_when_full field to lv_when_full and also
convert it from binary field to string field displaying three
possible values: "error", "queueu" or "" (blank for undefined).
$ lvs vg/pool vg/pool1 vg/linear_lv -o+lv_when_full
LV VG Attr LSize Data% Meta% WhenFull
linear_lv vg -wi-a----- 4.00m
pool vg twi-aotz-- 4.00m 0.00 0.98 queue
pool1 vg twi-a-tz-- 4.00m 0.00 0.88 error
For -S|--select these synonyms are recognized:
"error" -> "error when full", "error if no space"
"queue" -> "queue when full", "queue if no space"
"" -> "undefined"
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
Normally, if there are partitions defined on top of device-mapper
device, there should be a device-mapper device created for each
partiton on top of the old one and once the underlying DM device
is used by another devices (partition mappings in this case),
it can't be used as a PV anymore.
However, sometimes, it may happen the partition mappings are
missing - either the partitioning tool is not creating them if
it does not contain full support for device-mapper devices or
the mappings were removed.
Better safe than sorry - check for partition header on DM devs
and filter them out as unsuitable for PVs in case the check is
positive. Whatever the user is doing, let's do our best to prevent
unwanted corruption (...by running pvcreate on top of such device
that would corrupt the partition header).
If pvscan is run with device path instead of major:minor pair and this
device still exists in the system and the device is not visible anymore
(due to a filter that is applied), notify lvmetad properly about this.
This makes it more consistent with respect to existing pvscan with
major:minor which already notifies lvmetad about device that is gone
due to filters.
However, if the device is not in the system anymore, we're not able
to translate the original device path into major:minor pair which
lvmetad needs for its action (lvmetad_pv_gone fn). So in this case,
we still need to use major:minor pair only, not device path. But at
least make "pvscan --cache DevicePath" as near as possible to "pvscan
--cahce <major>:<minor>" functionality.
Also add a note to pvscan man page about this difference when using
pvscan --cache with DevicePath and major:minor pair.
No need to use awk now to get appropriate VGs/LVs, use LVM's
own --select - it's quicker, it removes a need for external
dependency on awk and it's also more readable.
When creating/activating clustered mirrors, we should have cmirrord
available and running. If it's not, we ended up with rather cryptic
errors like:
$ lvcreate -l1 -m1 --type mirror vg
Error locking on node 1: device-mapper: reload ioctl on failed: Invalid argument
Failed to activate new LV.
$ vgchange -ay vg
Error locking on node node 1: device-mapper: reload ioctl on failed: Invalid argument
This patch adds check for cmirror availability and it errors out
properly, also giving a more precise error messge so users are able
to identify the source of the problem easily:
$ lvcreate -l1 -m1 --type mirror vg
Shared cluster mirrors are not available.
$ vgchange -ay vg
Error locking on node 1: Shared cluster mirrors are not available.
Exclusively activated cluster mirror LVs are OK even without cmirrord:
$ vgchange -aey vg
1 logical volume(s) in volume group "vg" now active
The {pv,vg,lv}display *do* use reporting in case "-C|--columns" is used.
The man page was correct, the recognition for the --binary was missing
in the code though!
All the LVM commands are run in mode without lvmetad use (since lvmetad
can't handle duplicates). When we're finished with vgimportclone, we
need to notify lvmetad about changes.
Before this patch (/dev/sda and /dev/sdb contains a copy VG called "vg"):
$ vgimportclone --basevgname vg_snap /dev/sdb
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Physical volume "/tmp/snap.zcJ8LCmj/vgimport0" changed 1 physical volume changed / 0 physical volumes not changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Reading all physical volumes. This may take a while...
Found volume group "vg" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
$ vgs
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 9.50g 0
vg 1 1 0 wz--n- 124.00m 120.00m
(...lvmetad doesn't see the new "vg_snap"!)
With this patch applied:
$ vgimportclone --basevgname vg_snap /dev/sdb
...
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Notifying lvmetad about changes since it was disabled temporarily.
Reading all physical volumes. This may take a while...
Found volume group "vg_snap" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
Found volume group "vg" using metadata type lvm2
$ vgs
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 9.50g 0
vg 1 1 0 wz--n- 124.00m 120.00m
vg_snap 1 1 0 wz--n- 124.00m 120.00m
The "restart lvmetad before enabling it" message is a bit misleading
here - we should probably suppress this one, but we can't suppress
warning messages selectively at the moment and we don't want to lose
other warning/error messages printed...
With current dumpconfig, we can generate lvm.conf easily - we can merge
current lvm.conf with the config given on cmd line:
lvm dumpconfig --mergedconfig --config "..."
This is a bit simpler than using awk and it also avoids problems when some of
the configuration is missing in existing lvm.conf file and hardcoded defaults
are used instead. The dumpconfig handles this transparently.
Under certain circumstances, the selection code can segfault:
$ vgs --select 'pv_name=~/dev/sda' --unbuffered vg0
VG #PV #LV #SN Attr VSize VFree
vg0 6 3 0 wz--n- 744.00m 588.00m
Segmentation fault (core dumped)
The problem here is the use of --ubuffered together with regex used in
selection criteria. If the report output is not buffered, each row is
discarded as soon as it is reported. The bug is in the use of report
handle's memory - in the example above, what happens is:
1) report handle is initialized together with its memory pool
2) selection tree is initialized from selection criteria string
(using the report handle's memory pool!)
2a) this also means the regex is initialized from report handle's mem pool
3) the object (row) is reported
3a) any memory needed for output is intialized out of report handle's mem pool
3b) selection criteria matching is executed - if the regex is checked the
very first time (for the very first row reported), some more memory
allocation happens as regex allocates internal structures "on-demand",
it's allocating from report handle's mem pool (see also step 2a)
4) the report output is executed
5) the object (row) is discarded, meaning discarding all the mem pool
memory used since step 3.
Now, with step 5) we have discarded the regex internal structures from step 3b.
When we execute reporting for another object (row), we're using the same
selection criteria (step 3b), but tihs is second time we're using the regex
and as such, it's already initialized completely. But the regex is missing the
internal structures now as they got discarded in step 5) from previous
object (row) reporting (because we're using "unbuffered" reporting).
To resolve this issue and to prevent any similar future issues where each
object/row memory is discarded after output (the unbuffered reporting) while
selection tree is global for all the object/rows, use separate memory pool
for report's selection.
This patch replaces "struct selection_node *selection_root" in struct
dm_report with new struct selection which contains both "selection_root"
and "mem" for separate mem pool used for selection.
We can change struct dm_report this way as it is not exposed via libdevmapper.
(This patch will have even more meaning for upcoming patches where selection
is used even for non-reporting commands where "internal" reporting and
selection criteria matching happens and where the internal reporting is
not buffered.)
Fix incorrect test in configure which sets --enable-udev-systemd-background-jobs
automatically if proper systemd version is available.
The UDEV_SYSTEMD_BACKGROUND_JOBS variable was not properly set to "yes" in
case systemd is available and we had "maybe" for this variable before.
When we split leg from raid - we take a proper new lock for a new LV.
However for now activation checks only 'existince' of device UUID,
but it's not validating device has a proper name.
As a quick fix call suspend()/resume() to rename after split mirror.
Free (and clear) h.protocol string on daemon_open() error paths
so it's OK for caller to skip calling daemon_close() if returned
h.socket_fd is -1.
Close h.socket_fd in daemon_close() to avoid possible leak.
https://bugzilla.redhat.com/1164234
When chunk size needs to be estimated, the code missed to round
to proper 64kb boundaries (or power of 2 for older thin pool driver).
So for some data and metadata size (i.e. 10GB and 4MB) it resulted
in incorrect chunk size (not being a multiple of 64KB)
Fix it by adding proper rounding and also use 1 routine for 2 places
where the same calculation is made.
Fix also incorrect printed warning that has used 'ffs()'
(which returns first 'least significant' bit in word)
and it was not really giving any useful size info and replace it
with properly estimated chunk size.
Fix regression introduced with a2c1024f6a
_setup_task(mknodes ? name : NULL...
has been replaced with:
_setup_task(type != MKNODES ? name : NULL....
Use '=='
Commit d2c116058e introduced regression
with CLVMD_PATH.
+ CLVMD_PATH="$clvmd_prefix/sbin/clvmd"
test "$prefix" != NONE && clvmd_prefix=$prefix
It has set CLVMD_PATH before clvmd_prefix got its final value.
Move it one line below.
systemd-run is available in systemd>=205. Also, this fix prevents
systemd-specific udev rules in 69-dm-lvm-metad.rules to appear in
case systemd environment is not available - make configure to check
this automatically and use these systemd specific rules only if it
is applicable.
Rework ignore_vg() API so it properly handles
multiple kind of vg_read_error() states.
Skip processing only otherwise valid VG.
Always return ECMD_FAILED when break is detected.
Check sigint_caught() in front of dm iterator loop.
Add stack for _process failing ret codes.
Failed recovery provides different (NULL) VG then FAILED_INCONSISTENT.
Mark it with different failure bit - since FAILED_INCONSISTENT is
supposed to contain something 'usable' (thought inconsistent).