IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pair kernel_cache_settings with kernel_cache_policy.
There is very small chance in error case that the value in table
might be differnet from the value stored in metadata
so make it 'checkable'.
Showing 'u' in the pv_attr reporting field is mostly unnecessary because
most PVs are allocatable, and being allocatable implies it is (u)sed,
and this is already obvious from other fields in the default 'pvs'
output like the VG name.
So move the new (u)sed pv_attr from character position 4 to 1, and only
show it in those rare cases when the PV is not (a)llocatable or the
relevant metadata is missing.
(Scripts should not be using pv_attr, but rather pv_allocatable,
pv_exported, pv_missing, pv_in_use etc.)
Make the data_alignment variable 64 bits so it
can hold the invalid command line arg used in
pvreate-usage.sh pvcreate --dataalignment 1e.
On 32 bit arches, the smaller variable wouldn't
hold the invalid value so the error would not
trigger as expected by the test.
"pvcreate_each_params" was a temporary name used
to transition from the old "pvcreate_params".
Remove the old pvcreate_params struct and rename the
new pvcreate_each_params struct to pvcreate_params.
Rename various pvcreate_each_params terms to simply
pvcreate_params.
Use the new pvcreate_each_device() function from
toollib, previously added for pvcreate, in place
of the old pvcreate_vol().
This also requires shifting the location where the
lock is acquired for the new VG name. The lock for
the new VG is supposed to be acquired before pvcreate.
This means splitting the vg_lock_newname() out of
vg_create(), and calling vg_lock_newname() directly
before pvcreate, and then calling the remainder of
vg_create() after pvcreate.
The new function vg_lock_and_create() now does
vg_lock_newname() + vg_create(), like the previous
version of vg_create().
The lock on the new VG name is released before the
pvcreate and reacquired after the pvcreate because
pvcreate needs to reset lvmcache, which doesn't work
when locks are held. An exception could likely be
made for the new VG name lock, which would allow
vgcreate to hold the new VG name lock across the
pvcreate step.
This is common code for handling PV create/remove
that can be shared by pvcreate/vgcreate/vgextend/pvremove.
This does not change any commands to use the new code.
- Pull out the hidden equivalent of process_each_pv
into an actual top level process_each_pv.
- Pull the prompts to the top level, and do not
run any prompts while locks are held.
The orphan lock is reacquired after any prompts are
done, and the devices being created are checked for
any change made while the lock was not held.
Previously, pvcreate_vol() was the shared function for
creating a PV for pvcreate, vgcreate, vgextend.
Now, it will be toollib function pvcreate_each_device().
pvcreate_vol() was called effectively as a helper, from
within vgcreate and vgextend code paths.
pvcreate_each_device() will be called at the same level
as other process_each functions.
One of the main problems with pvcreate_vol() is that
it included a hidden equivalent of process_each_pv for
each device being created:
pvcreate_vol() -> _pvcreate_check() ->
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
pvcreate_each_device() reorganizes the code so that
each-VG-each-PV loop is done once, and uses the standard
process_each_pv function at the top level of the function.
This uses the vg->pv_write_list in place of the
vg->pvs_to_write list, and eliminates the use of
pvcreate_params. The label remove and zeroing
steps are shifted out of vg_write() to the higher
level like pvcreate will do.
The vg->pv_write_list contains pv_list structs for which
vg_write() should call pv_write().
The new list will replace vg->pvs_to_write that contains
vg_to_create structs which are used to perform higher-level
pvcreate-related operations. The higher level pvcreate
operations will be moved out of vg_write() to higher levels.
Since we already check in few other places 'info' is not NULL,
do the same for others - however when info would be NULL
it more or less looks like internal error.
Reshuffle messages during pvremove.
Always print WARNING: when PV is in use so using options
--force --force doesn't make this important user
notification go away.
Simplify variable 'used' usage (so older gcc doesn't warn
about the use of unitilizied variable).
Add some '.' into messages.
Currently it's been checked for 'zero' header for thin-pool,
but lets use it always for cache as well - since it's relatively 'cheap'
detection of read 'error' problems as thin/cache tools
currently do not work fast enough in this case.
When update fails in suspend() (sending of messages
fails because metadata space is full) call resume(),
so the locking sequence works properly for clustering.
Also failing deactivation should unlock memory.
Fix reporting of Fail thin-pool target status
as attr[8] letter 'F'.
Report 'needs_check' status from thin-pool target via
attr field [4] (letter 'c'/'C'), and also via CheckNeeded field.
TODO: think about better name here?
TODO: lots of prop_not_implemented_set
Ask for confirmation when using pvcreate/pvremove on a PV which is
marked as belonging to a VG, just like we do in case of a PV which
belongs to known VG:
$ pvcreate -ff /dev/sda
Really INITIALIZE physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n
/dev/sda: physical volume not initialized
$ pvremove -ff /dev/sda
Really WIPE LABELS from physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n
/dev/sda: physical volume label not removed
The host that owns foreign VGs is responsible for fixing up PV_EXT_USED
flag - the same already applies to repairing any inconsistent VG.
This patch also moves the iteration over vg->pvs inside
_check_or_repair_pv_ext fn - it's cleaner this way.
pv->vg is not set yet during pvcreate processing. Use pv->fmt instead to
check for these fake PVs (all normal PVs have format defined, devices
which are not PVs don't have this set).
This fixes commit 0000db7f98.
Some of the PVs are not even orphan PVs - they're fake PVs - this can
happen if we're listing all devices with "pvs -a". Such PV must not
be marked as used.
The backup_restore_vg is used directly for restoring the VG from backup.
It's also used to do the VG conversions from one metadata format to
another which means vgconvert calls backup_restore_vg too.
When restoring VG from backup, we need to rewrite/write PV headers as
PVs may have been orphans before and now they're becoming part of some
VG - we need to write the PV_EXT_USED flag at least.
When using the backup_restore_vg for vgconvert, we need to write
completely new PV header in different format.
Avoid the special "pv_write" call and handling that was used before
this patch in vgconvert (vgconvert_single function to be more precise)
and reuse existing internal interface to register PV header for writing
(or rewriting) via vg->pvs_to_write list instead like we do it elsewhere
in the code.
This patch also resolves a problem in which PV headers with target
format were written in the vgconvert_single fn as orphans and VG
metadata were added later on - this was a tiny hack actually.
We can't do this now - we need to write the PV as belonging
to a VG because otherwise the PV_EXT_USED flag won't be written
properly (if the PV header is written as orphan, the PV_EXT_USED
is set to 0, of course, even though metadata are attached later).
So this patch removes this tiny inconsistency which was passing
just fine before because we didn't have any relation to the VG
in PV header before. Now we have the PV_EXT_USED flag which says
the "PV is used in some VG".
The same check as we already do for orphan PVs, just the other way
round now: if the PV is surely part of some VG and any PV the VG
contains does not have the PV_EXT_USED flag set, repair it.
For example - /dev/sda here is in VG vg and it's incorrectly not
marked as used by PV_EXT_USED flag:
pvs --binary -o pv_ext_vsn,pv_in_use
WARNING: Volume Group vg is not consistent.
WARNING: Repairing Physical Volume /dev/sda that is in Volume Group vg but not marked as used.
PV VG Fmt Attr PSize PFree ExtVsn PInUse
/dev/sda vg lvm2 a-- 124.00m 124.00m 2 1
PV header extension versions:
0 - the original PV without any extensions
1 - bootloader area support added
2 - PV_EXT_USED flag support added
So do the associated checks related to PV_EXT_USED flag only if
PV header extension found is of version 2 and higher.
If we know that the PV is orphan, meaning there's at least one MDA on
that PV which does not reference any VG and at the same time there's
PV_EXT_USED flag set, we're certainly in an inconsistent state and we
need to fix this.
For example, such situation can happen during vgremove/vgreduce if we
removed/reduced the VG, but we haven't written PV headers yet because
vgremove stopped abruptly for whatever reason just before writing new
PV headers with updated state, including PV extension flags (and so the
PV_EXT_USED flag).
However, in case the PV has no MDAs at all, we can't double-check
whether the PV_EXT_USED is correct or not - if that PV is marked
as used, it's either:
- really used (but other disks with MDAs are missing)
- or the error state as described above is hit
User needs to overwrite the PV header directly if it's really clear
the PV having no MDAs does not belong to any VG and at the same time
it's still marked as being in use (pvcreate -ff <dev_name> will fix this).
For example - /dev/sda here has 1 MDA, orphan and is incorrectly marked
with PV_EXT_USED flag:
$ pvs --binary -o+pv_in_use
WARNING: Found inconsistent standalone Physical Volumes.
WARNING: Repairing flag incorrectly marking Physical Volume /dev/sda as used.
PV VG Fmt Attr PSize PFree InUse
/dev/sda lvm2 --- 128.00m 128.00m 0
For example:
$ pvs -o pv_name,vg_name,pv_in_use
PV VG InUse
/dev/sda vg used
/dev/sdb
/dev/sdc used
(sda is part of vg - it's used
sdb is not part of vg - it's not used
sdc is part of vg, but MDAs missing - it's used)
Scenario:
$ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
We're adding the PV to a VG.
Before this patch:
$ vgcreate vg /dev/sda
Physical volume "/dev/sda" successfully created
Volume group "vg" successfully created
With this path applied:
$ vgcreate vg /dev/sda
Volume group "vg" successfully created
...and verbose log containing: "Physical volume "/dev/sda" successfully written"
Make sure we won't use a PV that is already marked as used. Normally,
VG metadata would stop us from doing that, but we can run into a
situation where such metadata is missing because PVs with MDAs
are missing and the PVs left are the ones with 0 MDAs.
(/dev/sda in this example has 0 MDAs and it belongs to a VG,
but other PVs with MDA are missing)
$ pvs -o pv_name,pv_mda_count /dev/sda
PV #PMda
/dev/sda 0
$ pvcreate /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't initialize PV '/dev/sda' without -ff.
$ pvchange -u /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't change PV '/dev/sda' without -ff.
Physical volume /dev/sda not changed
0 physical volumes changed / 1 physical volume not changed
$ pvremove /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
(If you are certain you need pvremove, then confirm by using --force twice.)
$ vgcreate vg /dev/sda
Physical volume '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Unable to add physical volume '/dev/sda' to volume group 'vg'.
We'll use this struct in subsequent patches for PVs which should
be rewritten, not just created. So rename struct pv_to_create to
struct pv_to_write for clarity.
Address this gcc warning:
metadata/lv.c:243: warning: initialized field overwritten
metadata/lv.c:243: warning: (near initialization for 'status.seg_status')
Present with e.g.: gcc version 4.3.2 (Debian 4.3.2-1.1)
Simplify calculation of extents rounding needed for
segment size.
Segment size has to divisible by 'extent count' needed to contain
whole stripe. LVM currently does not support stripes across segment.
In case the stripe size is bigger then extent size,
require bigger rounding.
'verbose' was marked as a boolean option while it
takes integer args - so it has limited usage to 0 or 1,
but we supported 0-4 at least.
Fix it by switching to corrent int type.
(Hopefully noone was trying to use this variable as true/yes/false/no
way - as the would be unsupported/undocumented).
Reporter noticed lvm2 incorrectly translated
lvm2 threshold value to water mark in commit:
99237f0908
Fix it by properly translating size to number of
blocks in thin-pool and then calc for free blocks
matching configured lvm2 threshold value.
Reported-by: Ming-Hung Tsai <mingnus@gmail.com>
Normally, we generate and provide lvm.conf file where use_blkid_wiping
is set based on whether support for this is compiled in or not. This was
generated properly based on configure.
However, if lvm.conf is not used at all (someone deletes it) or the value
in lvm.conf is commented out (user edited it), we still need to use
proper default value that is based on DEFAULT_USE_BLKID_WIPING taken
from configure script - we used hardcoded value of "1" in this case
by mistake.
We already do check for suspended devs within udev rules where
the pvscan is to update lvmetad. So the check for suspended devs
in "pre-lvmetad" chain is not useful here - remove it - it may
be a source of hardly to detect races anyway (if udev rule detects
the device is not suspended and then the pvscan instance sees the
dev as suspended, we may end up not reacting to the event properly).
lvm1 and pool format do not support bootloader areas and we need to
remove any existing associated bootloader areas when we read lvm1 and
pool labels.
This has its importance if we're converting from one format to another
and we're reusing lvmcache in long-running commands (e.g. clvmd or lvm
shell) and we need to make lvmcache consistent and valid for current format.
Non-dm devices have ID_PART_TABLE_TYPE variable exported in
udev db from blkid scan for *both* whole devices and partitions.
We used ID_PART_ENTRY_DISK in addition to decide whether this
is the whole device or partition and then we filtered out only
whole devices where the partition table really is.
However, ID_PART_ENTRY_DISK was added in blkid 2.20 so we need
to use a different set of variables to decide on whole devices
and partitions on systems where older blkid is still used.
Now, we use ID_PART_TABLE_TYPE to detect that there's something
related to partitioning with this device and we use DEVTYPE variable
instead to decide between whole device (DEVTYPE="disk") and partition
(DEVTYPE="partition").
For dm devices it's simpler, we have ID_PART_TABLE_TYPE variable\
set in udev db for whole devices. It's not set for partitions,
hence we don't need more variable in addition to make the decision
on whole device vs. partition (dm devices do not have regular
partitions, hence DEVTYPE can't be used anyway, it's always set
to "disk" for whole disks and partitions).
Add "size" and "size_seqno" to struct device to cache device's size
and also to control its lifetime - the cached value is valid as long
as the global _dev_size_seqno is equal to the device's size_seqno,
otherwise we need to get the size again and cache the new value.
This patch also adds new dev_size_seqno_inc() fn for the appropriate
parts of the code to increment current global value of _dev_size_seqno
and hence to cause all currently cached values for device sizes to
be invalidated.
The device size is now cached because we're planning to reuse this
information for further checks and we want to avoid checking it more
than necessary to save resources.
The extent size must fits all blocks in 4294967295 sectors
(in 512b units) this is 1/2 KiB less then 2TiB.
So while previous statement 'suggested' 2TiB is still acceptable value,
make it clear it's not.
As now we support any multiples of 128KB as extent size -
values like 2047G will still 'flow-in' otherwise the largest power-of-2
supported value is 1TiB.
With 1TiB user needs 8388608 extents for 8EiB device.
(FYI such device is already unusable with todays glibc-2.22.90-27)
4GiB extent size is currently the smallest extent size which allows
a user to create 8EiB devices (with 2GiB it's less then 8EiB).
TODO: lvm2 may possibly print amount of 'lost/unused space' on a PV,
since using such ridiculously sized extent size may result in huge
space being left unaccessible.
There are two basic groups of fields for LV segment device reporting:
- related to LV segment's devices: devices and seg_pe_ranges
- related to LV segment's metadata devices: metadata_devices and seg_metadata_le_ranges
The devices and metadata_devices report devices in this format:
"device_name(extent_start)"
The seg_pe_ranges and seg_metadata_le_ranges report devices in
this format:
"device_name:extent_start-extent_end"
This patch reverts partly what commit 7f74a99502
(v 2.02.140) introduced in this area - it added [] for
hidden devices to mark them for all four fields mentioned above.
We won't be marking hidden devices in devices and metadata_devices
fields.
The seg_metadata_le_ranges field will have hidden devices marked -
it's new enough that we don't need to care about compatibility much
yet.
The seg_pe_ranges is old enough that we shouldn't be changing this
one - so we're reverting to not marking hidden devices here.
Instead, there's going to be a new field "seg_le_ranges" which
is going to replace the seg_pe_ranges and it will mark hidden devices -
this is going to be introduced in a patch later.
So in the end we'll end up with:
(LV segment's devices)
devices field with "device_name(extent_start)" format, not marking hidden devices
seg_pe_ranges field with "device_name:extent_start-extent_end" format, not marking hidden devices (deprecated, new seg_le_ranges should be used instead for standardized format)
seg_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
(LV segment's metadata devices)
metadata_devices field with "device_name:extent_start-extent_end" format, not marking hidden devices
seg_metadata_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
Also, both seg_le_ranges and seg_metadata_le_ranges will honour the
report/list_item_separator setting which can be used to configure
the delimiter used for list items.
So, to sum it up, we will recommend using the new seg_le_ranges and
seg_metadata_le_ranges fields because they display devices with
standard extent range format, they can mark hidden devices and they
honour the report/list_item_separator setting.
We'll be keeping devices,seg_pe_ranges and metadata_devices fields
for compatibility.
The associated devices,metadata_devices,seg_pe_ranges and
seg_metadata_le_ranges are reported as genuine string lists now.
This allows for using the items separately in -S|--select
(so searching for subsets etc.) and also it allows for
configuring the separator using report/list_item_separator
which may be useful in scripts (however, we'll enable this
only for seg_le_metadata_ranges and not for devices,seg_pe_ranges
and seg_metadata_devices for compatibility reasons - see following
patch).
When reporting on LVs, take the end of the range from the size of the
underlying (hidden) LV rather than the logical size of the current
segment (that PVs use).
Existing cache_settings field displays the settings which are
saved in metadata. Add new kernel_cache_settings fields to display
the settings which are currently used by kernel, including fields
for which default values are used.
This way users have complete view of the set of cache settings
supported (and which they can set) and their values which are used
at the moment by kernel.
For example:
$ lvs -o name,cache_policy,cache_settings,kernel_cache_settings vg
LV Cache Policy Cache Settings KCache Settings
cached1 mq migration_threshold=1024,write_promote_adjustment=2 migration_threshold=1024,random_threshold=4,sequential_threshold=512,discard_promote_adjustment=1,read_promote_adjustment=4,write_promote_adjustment=2
cached2 smq migration_threshold=1024 migration_threshold=1024
cached3 smq migration_threshold=2048
Fix lvm2app to return either 0 or 1 for lvm_vg_is_{clustered,exported},
including internal functions pvseg_is_allocated and vg_is_resizeable
which are not yet exposed in lvm2app but make them consistent with the
rest.
This reverts e28e22b9e1
The problem that that commit was fixing (pytest failure)
no longer appears with the current code, so the commit is
not needed.
That commit is a problem for pvchange, because it prevents
lvmcache from retaining VG metadata even while the global
lock is held. pvchange holds the global lock to ensure
that VG metadata is kept in lvmcache throughout processing.
If the cache is not kept, a PV with zero MDAs will appear
first in its actual VG and then appear again in the orphan VG.
It wrongly appears a second time in the orphan VG only if
the actual VG is dropped from lvmcache.
Thin pool discard mode set in metadata can be different from the one
actually used if any device underneath does not support that mode. Add
kernel_discard report field to make it possible to see this difference.
Internal _alloc_init() is only called from allocate_extents(),
which already does prevent usage of virtual segments.
So mark as internal error early and do not process it any further.
Add new test for lv_is_snapshot().
Also move few other bitchecks into same place as remaining bit tests.
TODO: drop lv_is_merging_origin() and keep using lv_is_merging().
Include brackets for the name if the dev is invisible.
This change applies to all callers of _format_pvsegs fn:
- lvseg_devices (the "lvs -o devices")
- lvseg_metadata_devices (the "lvs -o metadata_devices)
- lvseg_seg_pe_ranges (the "lvs -o seg_pe_ranges")
- lvseg_seg_metadata_le_ranges (the "lvs -o seg_metadata_le_ranges")
The common lv_pool_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_metadata_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_data_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_mirror_log_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_origin_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_convert_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
Use common _lvname_disp to report lv_parent. The _lvname_disp
takes care of properly marking LVs which are not visible - such
LVs are always enclosed in brackets when reported within any
other field.
For example, thin pool over RAID.
Before:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] cache_pool_tdata
[cache_pool_tdata_rimage_1] cache_pool_tdata
[cache_pool_tdata_rmeta_0] cache_pool_tdata
[cache_pool_tdata_rmeta_1] cache_pool_tdata
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] cache_pool_tmeta
[cache_pool_tmeta_rimage_1] cache_pool_tmeta
[cache_pool_tmeta_rmeta_0] cache_pool_tmeta
[cache_pool_tmeta_rmeta_1] cache_pool_tmeta
[lvol0_pmspare]
With this patch applied:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] [cache_pool_tdata]
[cache_pool_tdata_rimage_1] [cache_pool_tdata]
[cache_pool_tdata_rmeta_0] [cache_pool_tdata]
[cache_pool_tdata_rmeta_1] [cache_pool_tdata]
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] [cache_pool_tmeta]
[cache_pool_tmeta_rimage_1] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_0] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_1] [cache_pool_tmeta]
[lvol0_pmspare]
Do not mix dm_report_field_set_value and _field_set_value and
use single function call throughout for clarity. The same applies
for dm_report_field_string and _string_disp.
Fix regression caused by commit c2d4330f27
which removed the dm_pool_strdup for the cache policy name in
_cache_policy_disp report function.
This regression was hit with buffered reporting only (which is
used by default). The reason is that for buffered reporting, we're
iterating over LVs in VG (process_each_lv) while gathering
all the information that is needed for the report. In this case,
the LV's cache policy name has not been duped, but only the pointer
to the original VG buffer was stored. When the LV iteration finished,
the VG buffer was freed and any report to output called later
(dm_report_output call) accessed already freed VG data.
This didn't appear if unbuffered reporting was used (--unbuffered)
because in this case, the data were reported to output as
soon as they were processed, hence it was reported to output
before the VG data was freed.
Have commands send lvmlockd the update message
in vg_write instead of vg_commit, so that it's
not done while LVs are suspended. If the vg_write
is not committed, and the seqno sent to lvmlockd
is not used, then lvmlockd can detect this when
the next update uses the same seqno.
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
Before commit c1f246fedf,
_get_all_devices() did a full device scan before
get_vgnameids() was called. The full scan in
_get_all_devices() is from calling dev_iter_create(f, 1).
The '1' arg forces a full scan.
By doing a full scan in _get_all_devices(), new devices
were added to dev-cache before get_vgnameids() began
scanning labels. So, labels would be read from new devices.
(e.g. by the first 'pvs' command after the new device appeared.)
After that commit, _get_all_devices() was called
after get_vgnameids() was finished scanning labels.
So, new devices would be missed while scanning labels.
When _get_all_devices() saw the new devices (after
labels were scanned), those devices were added to
the .cache file. This meant that the second 'pvs'
command would see the devices because they would be
in .cache.
Now, the full device scan is factored out of
_get_all_devices() and called by itself at the
start of the command so that new devices will
be known before get_vgnameids() scans labels.
Since we mark cache-pool as 'hidden/private' while it is in-use,
we may still allow user to change it's name.
It should not cause any harm and user may prefer better naming
for a cache-pool in use.
It's getting a bit more complex here.
Basic idea behind is - check_current_backup() should not
log error when a user is using a read-only filesystem,
so e.g. vgscan will not report any error when it tries
to take missing backup.
We still have cases when error could be reported though,
e.g. the backup this would be a symbolic link, but these
are rather misconfiguration and unexpected case.
We have to modes of 'archive()' usage -
1. compulsory - fail stops command and user may try '-An' option
to do a command.
2. non-compulsory - some fails in archiving are ignorable (i.e.
read-only filesystem where archive dir is located).
Those 2 cases needs to be properly handle - i.e. the non-compulsory
logging should not be tampering error logging message production.
So more work here is needed
Pass full buffer size to printf() function - no reason to make
buffer 1 char smaller.
Also rename locn buffer to message buffer directly since it's
not used for anything else.
TODO: we may use same buffer also for 'buf[]' since there is
no collision - so may safe 1K on stack usage.
When two different VGs with the same name exist,
they are both stored in lvmcache using the vginfo->next
list. Previously, the code would print warnings (sometimes)
when adding VGs to this list. Now the duplicate VG names
are handled by higher level code, so this list no longer
needs to print warnings about duplicate VG names being found.
After recent changes to process_each, vg_read() is usually
given both the vgname and vgid for the intended VG.
However, in some cases vg_read() is given a vgid with
no vgname, or is given a vgname with no vgid.
When given a vgid with no vgname, vg_read() uses lvmcache
to look up the vgname using the vgid. If the vgname is
not found, vg_read() fails.
When given a vgname with no vgid, vg_read() should also
use lvmcache to look up the vgid using the vgname.
If the vgid is not found, vg_read() fails.
If the lvmcache lookup finds multiple vgids for the
vgname, then the lookup fails, causing vg_read() to fail
because the intended VG is uncertain.
Usually, both vgname and vgid for the intended VG are passed
to vg_read(), which means the lvmcache translations
between vgname and vgid are not done.
When not using lvmetad, this uses the system_id field in
the cached vginfo structs that are populated during a scan.
When using lvmetad, this requests the VG from lvmetad, and
checks the system_id field in the returned metadata.
When the command already knows both the vgid and vgname,
it should send both to lvmetad for a more exact request,
and it can save lvmetad the work of a name lookup.
Remove long outstand unused code lines, which were already
been obsoleted by other code.
Statuses and snapshot tree creation is already handled differently.
Also drop some 'extra' log_error() and use only stack;
since error has already been reported.
Since we do not use dev_manager in a way we would have destroyed VG
content while in-use - we could safely keep just pointer.
So dropping strdup.
Also it seems we actually no longer use vg_name for anything
so it may possibly go away completely unless it would be useful
for debugging...
Just for convenience to display all new configuration settings
introduced since given version (before, there was only --atversion
to display settings introduced in concrete version).
For example:
$ lvmconfig --type new --sinceversion 2.2.120
allocation {
# cache_mode="writethrough"
# cache_settings {
# }
}
global {
use_lvmlockd=0
# lvmlockd_lock_retries=3
# sanlock_lv_extend=256
use_lvmpolld=1
}
activation {
}
# report {
# compact_output_cols=""
# time_format="%Y-%m-%d %T %z"
# }
local {
# host_id=0
}
Unifying terminology.
Since all the metadata in-use are ALWAYS on disk - switch
to terminology committed and precommitted.
Patch has no functional change inside.
lv preload for detached LVs started to be used also
for various other types which just happens to pass through
weak if() condition.
TODO: find here better solution to rather explicitly check
for types we really need to preload.
We do not won't to 'expose' internals of VG struct.
ATM we use lists to keep all LVs - we may want to switch
to better struct for quicker 'search'.
Since we do not need 'lists' but always actual LV,
switch find_lv_in_vg_by_lvid() to return LV,
and replaces some use case of find_lv_in_vg()
with 'better' working find_lv() which already
returns LV.
When 'lvextend -L+XX vg/thinpool' do not leave inactive table
loaded for 'wrapping' LV on top of resized thin-pool
(ATM we use linear LV for this with same size as thin-pool).
Udev recently start to 'link-in' major amount of useless libs.
(Seem to be faulty 'systemd' link-in all issue)
Anyway - avoid locking those libs in RAM.
Coverity here is a bit 'blind' here and cannot resolve which
code paths are actually able to hit this code path.
(It's using 'statistic' to resolve all possible paths,
and it's not scanning 'individual' code paths.)
This just cleans warns and add 'cheap' tests.
Use 'mda' instead of NULL to quite Coverity warn.
However this code seems to be actually not even possible to hit.
With proper analysis it may possibly be dropped from code to
simplify logic.
Skip testing target_pvs for NULL, we already
dereference it in many other places.
If check would ever be needed - it needs to be
in front of _raid_extract_images().
When reading older lvm2 metadata for cache-pool - we now handle more
extended syntax - basically we want to enter most setting when
actually creating cached LV.
For this new validation code has been added. However older metadata
without new settings set is now found as invalid.
Fix it by adding default settings for cache policy mq
and cache mode writethrough.
Here Coverity cannot see the pointer cannot be NULL in this
code path - opened coverity case #00531860.
We could make a model to avoid seeing related reports,
but then we loose coverage for modeled function.
So decided to add minor hint for this case.
The udev_device_get_is_initialized is available since libudev version
165. Older versions are still used somewhere (e.g. RHEL6). So better
check for this fn and use it only if it's available.
Udev db records are marked as not initialized (incomplete) on timeout.
Issue an error message whenever LVM finds such records so users are
aware that something's going wrong with udev db.
This is important in case we use devices/external_device_info_source="udev"
where udev database records are used to do various filtering decisions.
For example:
udev log of timed out worker:
Nov 11 13:02:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' is taking a long time
Nov 11 13:04:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' killed
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] terminated by signal 9 (Killed)
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] failed while handling '/devices/virtual/block/dm-2'
...
LVM also issues error message visibly if incomplete udev db record is found,
devices/external_device_info_source="udev" is set:
$ pvs
Udev database has incomplete information about device /dev/dm-2.
Failed to get external handle for device /dev/dm-2 [udev].
...
Coverity noticed this condition is always false and the error
path could never be visited.
So check for all mismatches of supported messages
and actually mark log_error as internal error.
Doing 'stat' checking first and later opening is racy.
And since we do not really care about any 'status' info
here and we read 'sysfs' here - just drop whole 'stat()'
call and directly handle error from failing 'fopen()'.
Currently the code creates the log separately after allocating space for
the data and as no data allocation is needed this second time,
total_extents ends up holding zero so use new_extents directly instead.
When reading a foreign VG we cannot write it, since
it belongs to another host. When reading a shared VG
we cannot write it because we may not have an ex lock.
(Or we may be reading the shared VG while not using
lvmlockd in which case it's like reading a foreign VG.)
Add the same checks for wiping outdated PVs. We may
read a foreign or shared VG, or see the PVs, while
another host is part way through writing a new version
of the VG to the PVs. This might cause us to think
some of the PVs are outdated. We do not want to
write another host's PVs, especially when we may
wrongly conclude they are outdated.
When the command gets a list of alternate devices
from lvmetad, log each one directly. This is not
the same as the warnings when adding lvmcache,
which are related to which duplicate is preferred.
The str_list_destroy function may be called to cleanup memory when
the list is not used anymore and the list itself was not allocated
from the memory pool.
When checking minimum mda size, make sure the mda_size after alignment
and calculation is more than 0 - if there's no place for an MDA at the
end of the disk, the _text_pv_add_metadata_area does not try to add it
there and it returns (because we already have the MDA at the start of
the disk at least).
Actually, we don't need extra condition as introduced in commit
00348c0a63. We should fix the last
condition:
(mdac->rlocn.size >= mdah->size)
...which should be:
(MDA_HEADER_SIZE + (rlocn ? rlocn->size : 0) + mdac->rlocn.size >= mdah->size))
Where the "mdac" is new metadata, the "rlocn" is old metadata.
So the main problem with the previous condition was that it
didn't count in MDA_HEADER_SIZE properly (and possible existing
metadata - the "rlocn"). This could have caused the error state
where metadata in ring buffer overlap to not be hit.
Replace the new condition introduced in 00348c0a63
with the improved one for the condition that existed there
already but it was just incomplete.
We're already checking whether old and new meta do not overlap in
ring buffer (as we need to keep both old and new meta during vg_write
up until vg_commit).
We also need to check whether the new metadata do not overlap
themselves in case we don't have old metadata yet (...because
we're in vgcreate). This could happen if we're creating a VG so
that the very first metadata written are long enough that it wraps
themselves in metadata ring buffer.
Although we limited the minimum metadata area size better with the
previous commit ccb8da404d which
makes the initial VG metadata overlap in ring buffer to be less
probable, the risk of hitting this overlap condition is still there
if we still manage to generate big enough metadata somehow.
For example, users can provide many and/or long VG tags during vgcreate
so that the VG metadata is long enough to start to wrap in the ring
buffer again...
This option could never have been printed in lvm2 metadata, so it could
be safely removed as it could have been set only as 0.
These configurable setting is supported via metadata profile.
Now with correctly functioning dmeventd enable usage of
low_water_mark for faster reaction on pool's threshold.
When user select e.g. 80% as a threshold value,
dmeventd doesn't need to wait 10 seconds till monitoring
timer expires, but nearly instantly resizes thin-pool
to fit bellow threshold.
The former patch(dab3ebce4c) is a little bit strict. For example, it is
OK to create PV on unpartitioned DASD devices with LDL formatted. So
after lvm version containing the patch, LVs created on those devices
could not be found.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Recognize the target only 'extends' and do not enforce
'flush' in this case. Only the size reduction
still requires flush (so disables usage of no_flush flag).
If some other targets do require flush before suspend,
they have to explicitly ask for it.
While the activation code tries to evaluate which target
really needs flush with suspend and which may go without flush,
it has stayed effectively disabled by original commit:
33f732c5e9 since here
it only allows to pass non-pvmoving 'mirrors'.
So remove check for mirror LV type and only disable
no_flush for 'pvmove'..
TODO: Looking into history - it also seemed like raid target
would have always required flushing but it's been later
removed without clean explanation.
If some more targets really do need 'no_flush' it should
been handle at their 'level' - since we now stack multiple
targets over itself.
Use single code to evaluate if the percentage value has
crossed threshold.
Recalculate amount value to always fit bellow
threshold so there are not need any extra reiterations
to reach this state in case policy amount is too small.
Since plugin's percentage compare has been fixed,
it's now revealed wrong compare here.
The logic for threshold is - to allow to go as high
as given value e.g. 80% - so if pool is exactlu 80%
full it's still allowed to use it (dmeventd will not
resize it).
Running "vgremove -f VG & pvs" results in the pvs
command reporting that the VG is not found or is
inconsistent. If the VG is gone or being removed,
the pvs command should just skip it and not print
errors about it.
"Not found" is because the pvs command created the
list of VGs to process, including VG, then vgremove
removed the VG, then the pvs command came to to read
the VG to process it and did not find it.
An "inconsistent" error could be reported if vgremove
had only partially completed removing VG when pvs did
vg_read on the VG to process it, causing pvs to find
the VG in a partially-removed state.
This fix adds a flag that pvs uses to ignore a VG
that can't be read or is inconsistent.
When lvmetad is used and lvmcache update function (lvmcache_update_vgname_and_id)
was called to update existing lvmcache records, a condition was met
which made to retun from the update function immediately, effectively
making it NOOP.
It seems there's no reason for such condition and lvmcache should be
update appropriately even when lvmetad used as lvmcache may be reused,
most notably in lvm shell.
It's possible this is a remnant of the lvmetad development code which
didn't get removed for some reason and the bug didn't get spotted
because lvm shell is not used often (the condition dates back to 2012
or so).
Example, lvmetad and lvm shell used:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 124.00m
Before this patch:
==================
lvm> vgremove vg
Volume group "vg" successfully removed
lvm> pvs
With this patch applied:
========================
lvm> vgremove vg
Volume group "vg" successfully removed
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
The lvmcache info might be resued, most notably in lvm shell.
We need to be sure that even lvmcache_info marked as invalid
is removed from the lvmcache so it does not confuse any subsequent
code/commands executed later on.
Problematic example with the lvm shell:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
Before this patch (/dev/sda still displayed in a way):
======================================================
lvm> pvremove /dev/sda
Labels on physical volume "/dev/sda" successfully wiped
(without lvmetad)
lvm> pvs
No physical volume label read from /dev/sda
(with lvmetad)
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
With this patch applied:
========================
lvm> pvremove /dev/sda
Labels on physical volume "/dev/sda" successfully wiped
(without lvmetad)
lvm> pvs
(with lvmetad)
lvm> pvs
Make lvm2_disable_dmeventd_monitoring() more explicit.
As memlock_inc_daemon() is also used by clvmd, which
does changes dmeventd and suspend ignore state at
some stages - make updates of these 2 variable
tied to the call of lvm2_disable_dmeventd_monitoring().
Once this call is made dmeventd monitoring
and suspended devices are ignored.
TODO: all lvm-global settings should really be moved
to command context.
The old code made two loops through the PVs: in the first
loop it found the max PV and VG name lengths, and in the
second loop it printed each PV using the name lengths as
field widths for aligning columns.
The new code uses process_each_pv() which makes one loop
through the PVs. In the *first* call to pvscan_single(),
the max name lengths are found by looping through the
lvmcache entries which have been populated by the generic
process_each code prior to calling any _single functions.
Subsequent calls to pvscan_single() reuse the max lengths
that were found by the first call.
The new report/compact_output_cols setting has exactly the same effect
as report/compact_output setting. The difference is that with the new
setting it's possible to define which cols should be compacted exactly
in contrast to all cols in case of report/compact_output.
In case both compact_output and compact_output_cols is enabled/set,
the compact_output prevails.
For example:
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols=""
$ lvs vg
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=1
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize
lvol0 vg -wi-a----- 4.00m
This reverts commit 1b1c01a27b.
This caused messages to get dropped instead of logged into the log file.
(The log file and log function are independent at the moment.)
Some signatures are spread around the disk in several copies, mainly for
backup. Make libblkid to detect these extra copies - there was missing
"blkid_probe_step_back" fn call after successful wipe of previous signature
copy.
An example with FAT table which has copies:
$ mkfs.vfat /dev/sda1
Before this patch:
$ pvcreate /dev/sda1
WARNING: vfat signature detected on /dev/sda1 at offset 54. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
With this patch applied:
$ pvcreate /dev/sda1
WARNING: vfat signature detected on /dev/sda1 at offset 54. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
WARNING: vfat signature detected on /dev/sda1 at offset 0. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
WARNING: vfat signature detected on /dev/sda1 at offset 510. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
If lvmlockd is running, lvmetad is configured (use_lvmetad=1),
but lvmetad is not running, then commands will seg fault
when trying to send a message to lvmetad.
The difference is lvmetad being "active", not just "used".
We already have pv_count to report number of PVs that a VG has based
on metadata.
This patch exposes the information about how many of these PVs are
missing which is also useful information for a VG. Wwe could count
the sum of pv_missing reporting fields for each PV in the VG before,
but the new field is practical when reporting VG as a whole and there's
no need to process each PV from VG alone.
If 'vgcreate --shared' finds both sanlock and dlm are running,
print a more accurate error message:
"Found multiple lock managers, select one with --lock-type."
When neither is running, we still print:
"Failed to detect a running lock manager to select lock type."
Also, leave out the note about "circular buffer" which is
an internal imeplementation detail anyway and not quite
informational for users:
Before this patch:
$ vgcreate vg1 /dev/sda
VG vg1 metadata too large for circular buffer
Failed to write VG vg1.
With this patch applied:
$ vgcreate vg1 /dev/sda
VG vg1 metadata too large: size of metadata to write is 691 bytes while PV metadata area size on /dev/sda is 512 bytes.
Failed to write VG vg1.
Before this patch:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear public
With this patch applied:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear private,lockd,sanlock
Add metadata_devices and seg_metadata_le_ranges report fields.
Currently only defined for raid, but should probably be extended
to all other segment types that don't report all their device
usage in the 'devices' field.
When lvmetad_pvscan_vg() reads VG metadata from each PV,
it compares it to the last one to verify it matches.
If the VG metadata does not match on the PVs, an error
is printed and it fails to read the VG. In this error
case, use log_debug to show the differences between
the two unmatching copies of the metadata.
One host changes a VG, making the cached VG on another
host invalid. The other host then rereads the VG from
disk to get the latest copy. If the first host removed
a PV from the VG, the second host attempts to reread the
VG from old PV when rescanning. Reading the VG from the
removed PV fails, causing vg_read to return "VG not found".
The fix is to simply not fail when a VG is not found while
rereading a PV and continue without it.
(This doesn't happen if the second host happens to first
run a command like 'vgs' that triggers a global revalidation
of metadata.)
vgchange --lock-type iterates through LVs to ensure
no LVs are active before changing the lock type of
the VG, but the loop was not checking that an LV
actually has a lock before trying it, so it would
fail if the VG had any LVs that don't use locks,
e.g it would fail on a tmeta LV from a pool.
When using lvm shell, some structures which are cached in memory may be
reused. This happens for the struct label (a part of lvmcache_info
structure) when lvmetad is used in which case the PV scan is not
done that would normally overwrite these label structures in memory
and making them up-to-date.
This is all consequence of the fact that struct lvmcache_info and
struct label are not always assigned in the same part of the code.
For example, if lvmetad *is not* used, parts of the struct label are
reassigned in label_read fn while struct lvmcache_info is created
elsewhere. No part of the code reused struct label (and its "dev"
field) before calling label_read fn. That's why the real bug is
hidden when using lvm shell without lvmetad.
However, with lvmetad and lvm shell, the situation is a bit different.
The label_read fn is not called if lvmetad *is* used, hence the
struct label may have ended up not initialized properly.
There was missing assignment for the dev field in struct label
in _text_pv_write fn which caused this problem to appear in
lvm shell with lvmetad, for example:
Before this patch:
lvm> pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
lvm> pvs /dev/sda
PV VG Fmt Attr PSize PFree
unknown device lvm2 --- 128.00m 128.00m
With this patch applied:
lvm> pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
lvm> pvs /dev/sda
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
Also, this problem had not appeared before changes introduced
by commits e1a63905d1 through
3a6f91d713 which, among other
things, added proper label field type reporting. Before, label
reporting was the same as using struct physical_volume which
has its own dev field assigned and so this problem was not exposed.
When a command does a sequence of
vg_write + vg_commit + vg_write + vg_commit,
initialization of non-PV devices happens during the
first vg_write, and does not need to be repeated by
the second vg_write.
When creating a lockd VG, this sequence occurs because
the VG is first created, then the lockd data is created,
then the lockd data is then written to the VG metadata.
This applies the same rule/logic to dlm VGs that has always
existed for sanlock VGs. Allowing a dlm VG to be removed
while its lockspace was still running on other hosts largely
worked, but there were difficult problems if another VG with
the same name was recreated. Forcing the VG lockspace to
be stopped, gives both sanlock and dlm VGs the same behavior.
The code was expecting the wrong return value from
compare_config, which returns 0 when equal.
This is a problem for a lockd VG using multiple PVs
when the VG needs to be rescanned.
Certain stacks of cached LVs may have unexpected consequences.
So add a warning function called when LV is cached to detect
such caces and WARN user about them - the best we could do ATM.
When we insert layer we also move status flag-bits for certain LV types,
so internal volume_group structure remains consistent.
(Perhaps it's misuse of 'insert_layer' function and we should have
another similar function for this.)
Basically we aim to maintain the same state as after reading fresh
metadata out of volume group.
Currently we when i.e. cache 'raid' LV - this should transfer 'raidLV' flag
to _corigin LV and cache is no longer a raid.
TODO: bits for stacked devices needs more exact rules.
Add a new arg to lockd_start_vg() that indicates
it is being called for a new lockd VG, so that
lvmlockd knows the lockspace being started is new.
(Will be used by a following commit.)
Ensure make clean cleans any left-over file from their previous
location so they are not in conflict with new ones.
Also hide error message when .commands file is not present.
The regex filter (controlled by devices/filter lvm.conf setting) was
evaluated as the very last filter. However, this is not optimal when
it comes to restricting disk access - users define devices/filter
as well as devices/global_filter to avoid this.
The devices/global_filter is already positioned at the beginning of the
filter chain. We need to do the same for devices/filter.
Filter chains before this patch:
A: when lvmetad is not used:
persistent_filter -> sysfs_filter -> global_regex_filter ->
type_filter -> usable->filter -> mpath_component_filter ->
partition_filter -> md_component_filter -> fw_raid_filter ->
regex_filter
B: when lvmetad is used:
B1: to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter ->
usable_filter -> mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B2: to retrieve info from lvmetad:
persistent_filter -> usable_filter -> regex_filter
From the chain list above we can see that particularly in case when
lvmetad is not used, the regex filter is the very last one that is
processed. If lvmetad is used, it doesn't matter much as there's
the global_regex_filter which is used instead when updating lvmetad
and when retrieving info from lvmetad, putting regex_filter in front
of usable_filter wouldn't change much since usabled_filter is not
reading disks directly.
This patch puts the regex filter to the front even in case lvmetad
is not used, hence reinstating the state as it was before commit
a7be3b12df (which moved the regex_filter
position in the chain). Still, the arguments for the commit
a7be3b12df still apply and they're
still satisfied since component filters (MD, mpath...) are evaluated
first just before updating lvmetad.
So with this patch, we end up with:
A: when lvmetad is not used:
persistent_filter -> sysfs_filter -> global_regex_filter ->
regex_filter -> type_filter -> usable->filter ->
mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B: when lvmetad is used:
B1: to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter ->
usable_filter -> mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B2: to retrieve info from lvmetad:
persistent_filter -> regex_filter -> usable_filter
This way, specifying the regex_filter in non-lvmetad case causes
the devices to be filtered based on regex first before processing
any other filters which can access disks (like md_component_filter).
This patch also streamlines the code for better readability.
Relocate generated configure.h and lvm-version.h outside
of compilable .c source tree.
The reason is behind - when compiling in builddir != srcdir
the generated file in lib/misc/configure.h was used for all compiled
source file except ones located in lib/misc dir - those would have used
configure.h file located in this dir - if there have existed one (i.e.
from some other build)
This problem was only visible, when srcdir == buildir was used before
trying to use srcdri != builddir (as configure.h appeared then in
srcdir).
As part of fix that came with cf700151eb,
I forgot to add the check whether the result of stat was successful or
not. This bug caused uninitialized buffer to be used for entries
from .cache file which are no longer valid.
This bug may have caused these uninitialized values to be used further,
for example (see the unreal (2567,590944) representing major:minor
pair):
$ pvs
/dev/abc: stat failed: No such file or directory
Path /dev/abc no longer valid for device(2567,590944)
PV VG Fmt Attr PSize PFree
/dev/mapper/test lvm2 --- 104.00m 104.00m
/dev/vda2 rhel lvm2 a-- 9.51g 0
Since we may easily get blocked when checking for percentage
of thin-pool - do not flush and just show current values.
This avoids holding VG locked when pool is overfilled.
Try to detect thin-pool which my block lvm2 command from furher
processing (i.e. lvextend).
Check if pool is read-only or out-of-space and in this case thins
will skipped from being scanned (so user may miss some PVs located
on thin volumes).
Fix regression introduced with commit:
2fc126b00d
This commit has moved pv_min_size() test in front
of device_is_usable(). However pv_min_size needs to open device,
so it may have actually get blocked.
So restore the original order and first validate
dm device to be usable for open.
It's worth to note that such check is not 'race-free',
but it usually eliminates 99.99% of problems ;).
Print [source:handler] in filters' debug messages only if external
device info source other than "none" is used.
$ lvmconfig --type full devices/external_device_info_source
external_device_info_source="none
Before this patch (from the -vvvv log):
filters/filter-usable.c:47 /dev/mapper/test: Skipping: Too small to hold a PV [none:(nil)]
filters/filter-md.c:33 /dev/sdb: Skipping md component device [none:(nil)]
filters/filter-partitioned.c:25 /dev/vda: Skipping: Partition table signature found [none:(nil)]
With this patch applied:
filters/filter-usable.c:44 /dev/mapper/test: Skipping: Too small to hold a PV
filters/filter-md.c:35 /dev/sdb: Skipping md component device
filters/filter-partitioned.c:27 /dev/vda: Skipping: Partition table signature found
This was only used to return two flags indicating specific
reasons for a lock failure so that a more specific error
message could be printed by the command (lockspace had been
stopped, or lockspace had an error starting.)
Remove the list, given its limited usefulness, the fact it
would easily become inaccurate, and the fact it was causing
misleading error messages. The error conditions it was meant
to help could be reported differently.
Previously, a command would only rescan a lockd VG
when lvmetad returned the "vg_invalid" flag indicating
that the cached copy was invalid (which is done by
lvmlockd.) This is still the only usual reason for
rescanning a lockd VG, but two new special cases are
added where we also do the rescan:
. When the --shared option is used to display lockd VGs
from hosts not using lvmlockd. This is the same case
as using --foreign to display foreign VGs, but --shared
was missing the corresponding bits to rescan the VGs.
. When a lockd VG is allowed to be read for displaying
after failing to acquire the lock from lvmlockd. In
this case, the usual mechanism for validating the
cache is missed, so assume the cache would have been
invalidated. (This had been a previous todo item
that was lost during other cleanup.)
These were long-standing todos that were lost track of.
This makes lvmlockd removal steps for dlm VGs closely match
sanlock VGs. Because dlm lockspaces are not required to be
stopped on all hosts before vgremove, there is an extra bit
for dlm lockspaces, where a flag is set in the VG lock lvb
indicating that the VG was removed. If other hosts happen
to use the VG lock they will see this flag and stop their
lockspace.
Remove the existing lock type using the same functions
used to remove the lockd components during vgremove.
This results in a "clean" VG and lvmlockd state after
the vgchange, i.e. no bits left over from previous
lock type.
Since cache-pool actualy keeps info about caching,
display this info for cache-pool LV as well
(matches info for cache LV when cache-pool is asociated with it).
Fix the version export macros to make it possible to export two
different DM_* versions of a symbol: currently it is only possible for a
DM_* symbol to override a symbol in Base. Attempting to export two
symbols at different DM_* version levels (e.g. DM_1_02_104 and
DM_1_02_106) leads to a linker error due to a duplicate symbol
definition.
This is because the DM_EXPORTED_SYMBOL macro makes each exported symbol
the default (@@VERSION):
__asm__(".symver " #func "_v" #ver ", " #func "@@DM_" #ver )
Fix the macro to use a single '@' for a symbols exported in multiple
versions and rename the macros to DM_EXPORT_*:
DM_EXPORT_SYMBOL(func,ver)
DM_EXPORT_SYMBOL_BASE(func,ver)
For functions that have multiple implementations these macros control
symbol export and versioning.
Function definitions that exist in only one version never need to use
these macros.
Backwards compatible implementations must include a version tag of
the form "_v1_02_104" as a suffix to the function name and use the
macro DM_EXPORT_SYMBOL to export the function and bind it to the
specified version string.
Since versioning is only available when compiling with GCC the entire
compatibility version should be enclosed in '#if defined(__GNUC__)',
for example:
int dm_foo(int bar)
{
return bar;
}
#if defined(__GNUC__)
// Backward compatible dm_foo() version 1.02.104
int dm_foo_v1_02_104(void);
int dm_foo_v1_02_104(void)
{
return 0;
}
DM_EXPORT_SYMBOL(dm_foo,1_02_104)
#endif
A prototype for the compatibility version is required as these
functions must not be declared static.
The DM_EXPORT_SYMBOL_BASE macro is only used to export the base
versions of library symbols prior to the introduction of symbol
versioning: it must never be used for new symbols.
This mainly makes the description text use 80 columns.
There are a few minor adjustments to wording to help
the text layout, and a couple minor improvements to
descriptions.
This reverts commit 70db1d523d.
Since we use 'strncpy' even for case where it exactly matches
the buffer size and \0 is not expected to be added there.
Before printing a commented automatic config value,
print a line describing what it is. Otherwise, the
commented value can look like it's a part of an
example preceding it.
Move code which runtime detects settings for cache_policy
out of config dir to cache seg handling code.
Also mark cache_mode as command profilable setting.
Revert back to already existing behavior which has been slightly
modified by a900d150e4.
At the end however it seem to be equal to change TID right with first
metadata write.
Existing code missed handling for 'unused' thin-pool which would
require to also check empty message list for TID==0.
So with the fix we now again preserve 'active' thin-pool volume
when first thin volume is created - this property was lost and caused
problems in cluster, where the lock was hold, but volume was no longer
active on the node.
Another missing part was the proper support for already increased,
but unfinished TID change.
So going back here with existing logic -
TID is increased with first MDA update.
Code allows start with either same TID or (TID-1).
If there are messages, TID must be lower by 1 for sending,
otherwise messages were already posted.
As cache_policy is evaluated in runtime, we no longer should use
CFG_COMMENTED, but have to switch to CFG_UNDEFINED.
So as long as the value is undefined, it's runtime evaluated.
Once it's set - it's always respected (no runtime fallback).
Also fix version of introduced settings to 2.2.128.
If the Linux timerfd interface to POSIX timers is available at compile
time use it for all report interval timekeeping. This gives more
accurate interval timing when the per-interval processing time is less
than the configured interval and simplifies the timestamp bookkeeping
required to keep accurate time.
For systems without timerfd support fall back to the simple usleep based
timer.
Change logic and naming of some internal API functions.
cache_set_mode() and cache_set_policy() both take segment.
cache mode is now correctly 'masked-in'.
If the passed segment is 'cache' segment - it will automatically
try to find 'defaults' according to profiles if the are NOT
specified on command line or they are NOT already set for cache-pool.
These defaults are never set for cache-pool.
Add code to detect available cache features.
Support policy_mq & policy_smq features which might be disabled.
Introduce global_cache_disabled_features_CFG.
Add new profilable configurables:
allocation/cache_policy
allocation/cache_settings
and mark allocation/cache_pool_chunk_size as profilable as well.
Obsolete allocation/cache_pool_cachemode and
introduce new allocation/cache_mode instead.
Rename DEFAULT_CACHE_POOL_POLICY to DEFAULT_CACHE_POLICY.
lvrename should not be done if the LV is active on another host.
This check was mistakenly removed when the code was changed to
use LV uuids in locks rather than LV names.
When vgremove is used to remove multiple VGs in one command,
e.g. vgremove foo bar, the first VG (foo) that is removed
may have held the sanlock global lock. In this case,
do not continue removing further VGs (bar) without the
global lock.
This adds the infrastructure, code paths, error reporting,
etc. to handle storage errors, or storage loss, under the
sanlock leases in a VG that is being used. The loss of
storage means sanlock cannot renew its leases, which means
that the host needs to stop using the shared VG before its
leases expire.
This still requires manually shutting down a VG that has
lost lease storage, e.g. unmounting file systems,
deactivating LVs in the VG. The next step is to
automatically use a command like blkdeactivate to do that.
/lib/log/log.c:88: warning[invalidScanfArgType_int]: %llu in format string (no. 2) requires 'unsigned long long *' but the argument type is 'long long *'.
daemons/lvmlockd/lvmlockd-core.c:791: error[uninitstring]: Dangerous usage of 'version' (strncpy doesn't always null-terminate it).
Use refresh_filters instead of destroy_filters and init_filters
in refresh_toolcontext fn which deals with cmd->initialized.filters
correctly on refresh.
Just shuffle the items and put them into logical groups so it's
visible at first sight what each group contains - it makes it a bit
easier to make heads and tails of the whole cmd_context monster.
When a command is flagged with NO_METADATA_PROCESSING flag, it means
such command does not process any metadata and hence it doens't require
lvmetad, lvmpolld and it can get away with no locking too. These are
mostly simple commands (like lvmconfig/dumpconfig, version, types,
segtypes and other builtin commands that do not process metadata
in any way).
At first, when lvm command is executed, create toolcontext without
initializing connections (lvmetad,lvmpolld) and without initializing
filters (which depend on connections init). Instead, delay this
initialization until we know we need this. That is, until the
lvm_run_command fn is called in which we know what the actual
command to run is and hence we can avoid any connection, filter
or locking initiliazation for commands that would not make use
of it anyway.
For all the other create_toolcontext calls, we keep the original
behaviour - the filters and connections are initialized together
with the toolcontext.
Make it possible to decide whether we want to initialize connections and
filters together with toolcontext creation.
Add "filters" and "connections" fields to struct
cmd_context_initialized_parts and set these in cmd_context.initialized
instance accordingly.
(For now, all create_toolcontext calls do initialize connections and
filters, we'll change that in subsequent patch appropriately.)
Move original lvmetad and lvmpolld initialization code from
_process_config fn to their own functions _init_lvmetad and
_init_lvmpolld (both covered with single _init_connections fn).
Add struct cmd_context_initialized_parts to wrap up information
about which cmd context pieces are initialized and add variable
of this struct type into struct cmd_context.
Also, move existing "config_initialized" variable that was directly
part of cmd_context into the new cmd_context.initialized wrapper.
We'll be adding more items into the struct cmd_context_initialized_parts
with subsequent patches...
This tries harder to avoid creating duplicate global locks in
sanlock VGs by refusing to create a new sanlock VG with a
global lock if other sanlock VGs exist that may have a gl.
vgsummary information contains provisional VG information
that is obtained without holding the VG lock. This info
can be used to lock the VG, and then read it with vg_read().
After the VG is read properly, the vgsummary info should
be verified.
Add the VG lock_type to the vgsummary. It needs to be
known before the VG can be locked and read.
This is a regression introduced by commit
6c0e44d5a2 which changed
the way dev_cache_get fn works - before this patch, when a
device was not found, it fired a full rescan to correct the
cache. However, the change coming with that commit missed
this full_rescan call, causing the lvmcache to still contain
info about PVs which should be filtered now.
Such situation may have happened by coincidence of using
old persistent cache (/etc/lvm/cache/.cache) which does not
reflect the actual state anymore, a device name/symlink which
now points to a device which should be filtered and a fact we
keep info about usable DM devices in .cache no matter what
the filter setting is.
This bug could be hidden though by changes introduced in
commit f1a000a477 as it
calls full_rescan earlier before this problem is hit.
But we need to fix this anyway for the dev_cache_get
to be correct if we happen to use the same code path
again somewhere sometime.
For example, simple reproducer was (before commit
1a000a477558e157532d5f2cd2f9c9139d4f87c):
- /dev/sda contains a PV header with UUID y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M
- lvm.conf: filter = [ "r|.*|" ]
- rm -f .cache (to start with clean state)
- dmsetup create test --table "0 8388608 linear /dev/sda 0" (8388608 is
just the size of the /dev/sda device I use in the reproducer)
- pvs (this will create .cache file which contains
"/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M"
as well as "/dev/mapper/test" and the target node "/dev/dm-1" - all the
usable DM mappings (and their symlinks) get into the .cache file even
though the filter "is set to "ignore all" - we do this - so far it's OK)
- dmsetup remove test (so we end up with /dev/disk/by-id/lvm-pv-uuid-...
pointing to the /dev/sda now since it's the underlying device
containing the actual PV header)
- now calling "pvs" with such .cache file and we get:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M vg lvm2 a-- 4.00g 0
Even though we have set filter = [ "r|.*|" ] in the lvm.conf file!
Moved out from lib/display and a little documentation added.
It's tuned to LVM's requirements historically and its behaviour
might not always be what you would expect.
There is no longer an "enable" option for the global lock,
so remove the bit of code that was checking for it. It
was an optional variation anyway, and not one that was likely
to be used.
Also update the corresponding comment describing global lock
creation.
pvscan autoactivation does not work for lockd VGs because
lock start is needed on a lockd VG before locking can be
done for it. Add a check to skip the attempt at autoactivate
rather than calling it, knowing it will fail.
Add a comment explaining why pvscan --cache works fine for
lockd VGs without locks, and why autoactivate is not done.
The CFG_SECTION_NO_CHECK flag can be used to mark a section
and its whole subtree as containing settings where checks
won't be made (lvmconfig --validate).
These are setting where we don't know the names and and type
in advance and they're recognized in runtime. As we don't know
the type and name in advance, we can't do any checks here
of course.
Use this flag with great care as it disables config checks
for the whole config subtree found under such section.
This flag is going to be used by subsequent patches from
Zdenek to support some cache settings...
Comply with the rules we have for log_error and log_warn...
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate /dev/sda1 --force
WARNING: Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
libblkid may return the list of signatures found, but it may not
provide offset and size for each signature detected. This may
happen in case signatures are mixed up or there are more, possibly
overlapping, signatures found.
Make lvm commands pass if such situation happens and we're using
--force (or any stronger force method).
For example:
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate --force /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
The vgchange/lvchange activation commands read the VG, and
don't write it, so they acquire a shared VG lock from lvmlockd.
When other commands fail to acquire a shared VG lock from
lvmlockd, a warning is printed and they continue without it.
(Without it, the VG metadata they display from lvmetad may
not be up to date.)
vgchange/lvchange -a shouldn't continue without the shared
lock for a couple reasons:
. Usually they will just continue on and fail to acquire the
LV locks for activation, so continuing is pointless.
. More importantly, without the sh VG lock, the VG metadata
used by the command may be stale, and the LV locks shown
in the VG metadata may no longer be current. In the
case of sanlock, this would result in odd, unpredictable
errors when lvmlockd doesn't find the expected lock on
disk. In the case of dlm, the invalid LV lock could be
granted for the non-existing LV.
The solution is to not continue after the shared lock fails,
in the same way that a command fails if an exclusive lock fails.
When lvmlockd is compiled without support for one of the
lock managers (sanlock or dlm), and a command tries to use
one of them, explain that in the error message.
When lvm is built without lvmlockd support, vgcreate using a
shared lock type would succeed and create a local VG (the
--shared option was effectively ignored). Make it fail.
Fix the same issue when using vgchange to change a VG to a
shared lock type.
Make the error messages consistent.
A segfault was reported when extending an LV with a smaller number of
stripes than originally used. Under unusual circumstances, the cling
detection code could successfully find a match against the excess
stripe positions and think it had finished prematurely leading to an
allocation being pursued with a length of zero.
Rename ix_offset to num_positional_areas and move it to struct
alloc_state so that _is_condition() can obtain access to it.
In _is_condition(), areas_size can no longer be assumed to match the
number of positional slots being filled so check this newly-exposed
num_positional_areas directly instead. If the slot is outside the
range we are trying to fill, just ignore the match for now.
(Also note that the code still only performs cling detection against
the first segment of the LV.)
Replace misleading "not found" in the log message when
devices/preferred_names is set to empty array:
Really not found:
device/dev-cache.c:689 devices/preferred_names not found in config: using built-in preferences
Found, but empty:
config/config.c:1431 Setting devices/preferred_names to preferred_names = [ ]
device/dev-cache.c:689 devices/preferred_names is empty: using built-in preferences
Commit 7e728fe1a1 added a log call
directly in find_config_tree_array when defaults are used.
This patch also adds the log for the value which is found in
existing configuration and for which defaults are not used.
For example:
Defaults used:
config/config.c:1428 devices/scan not found in config: defaulting to scan = [ "/dev" ]
Value defined in configuration used:
config/config.c:1431 Setting devices/scan to scan = [ "/dev", "/mydev", "/abc" ]
This makes the logging consistent with the other find_config_tree_* functions.
Policy name has to be always defined.
Capture it as an internal error before write.
When reading metadata without defined policy name, use default defined policy.
TODO: Unsure, but it might have to be actually always 'mq' in this case.
Keep policy name separate from policy settings and avoid
to mangling and demangling this string from same config tree.
Ensure policy_name is always defined.
Use find_config_tree_array for all config arrays. Also, add
INTERNAL_ERROR in case there should have been at least default
value defined for a setting but it was not returned for some
reason (either config_settings.h misconfiguration or other config
tree error printed by functions called by find_config_tree_array).
There are two different failure conditions detected in
access_vg_lock_type() that should have different error
messages. This adds another failure flag so the two
cases can be distinguished to avoid printing a misleading
error message.
Require global/{thin,cache}_{check,repair}_options to be always defined.
If not defined directly by user in the configuration and if there's no
concrete default option to use, make "" (empty string) the default one -
it's then clearly visible in the "lvmconfig --type default" (and
generated lvm.conf) and also it makes its handling in the code more
straightforward so we don't need to handle undefined values.
This means, if there are no default values for these settings defined,
we end up with this generated now:
{thin,cache}_{check,repair}_options = [ "" ]
So the value is never undefined and if it is, it's an error.
(The cache_repair_options is actually not used in the code at the moment,
but once the code using this setting is in, it will follow the same logic
as used for thin_repair_options.)
When --nolocking is used (by vgs, lvs, pvs):
. don't use lvmlockd at all (set use_lvmlockd to 0)
. allow lockd VGs to be read
When --readonly is used (by vgs, lvs, pvs, vgdisplay, lvdisplay,
pvdisplay, lvmdiskscan, lvscan, pvscan, vgcfgbackup):
. skip actual lvmlockd locking calls
. allow lockd VGs to be read
. check that only shared gl/vg locks are being requested
(even though the actually locking is being skipped)
. check that no LV locks are requested, because no LVs
should be activated or used in readonly mode
. disable using lvmetad so VGs are read from disk
It is important to note the limited commands that accept
the --nolocking and --readonly options, i.e. no commands
that change/write a VG or change/activate LVs accept these
options, only commands that read VGs.
There are at least a couple instances where
the lock_args check does not work correctly,
(listed in the comment), so disable the
NULL check for lock_args until those are
resolved.
log_warn was added recently because no known code used
the given condition, but running pvcreate on an existing
PV uses this case, and should not produce a warning.
Put the change from commit #10d27998b3d2f6100e9e29e83d1d99948c55875f
back so we have working tree again for now. This code needs a bit of
a cleanup to return proper return value to check...
lib/format1/import-export.c:167: var_deref_op: Dereferencing null pointer "vg->lvm1_system_id"
lib/cache/lvmetad.c:1023: var_deref_op: Dereferencing null pointer "this"
daemons/lvmlockd/lvmlockd-core.c:2659: check_after_deref: Null-checking "act" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
/daemons/lvmetad/lvmetad-core.c:1024: check_after_deref: Null-checking "pvmeta" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
lib/log/log.c:115: leaked_storage: Variable "st" going out of scope leaks the storage it points to
daemons/lvmpolld/lvmpolld-core.c:573: leaked_storage: Variable "cmdargv" going out of scope leaks the storage it points to
daemons/lvmlockd/lvmlockd-core.c:5341: leaked_handle: Handle variable "fd" going out of scope leaks the handle
daemons/lvmlockd/lvmlockctl.c:575: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:571: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:385: leaked_handle: Handle variable "s" going out of scope leaks the handle
Before, we used general find_config_tree_node function to retrieve
array values. This had a downside where if the node was not found,
we had to insert default values directly in-situ after the
find_config_tree_node call. This way, we had two copies of default
values - one in config_settings.h and the other one directly in the
code where we found out that find_config_tree_node returned NULL and
hence we needed to fall back to defaults.
With separate find_config_tree_array used for array config values,
we keep all the defaults centrally in config_settings.h because
the new find_config_tree_array automatically returns these defaults
if it can't find any value set in the configuration.
This patch just makes the behaviour exactly the same for arrays as
for any other non-array type where we call find_config_tree_<type>
already, hence making the internal interface for handling array
values consistent with the rest of the config types.
including the allow_override_lock_modes setting.
It was not possible to override default lock modes any longer,
since the command line options had already been removed.
A mechanism will probably be required later that puts part of
this back.
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
Recognize date and time specification within selection criteria
that is formulated in a more free-form way besides to the original
basic YYYY-MM-DD HH:MM format that libdevmapper supports.
Currently, this free-form format is recognized for lv_time field.
Users are able to use expressions from this set:
- weekday names ("Sunday" - "Saturday" or abbreviated as "Sun" - "Sat")
- labels for points in time ("noon", "midnight")
- labels for a day relative to current day ("today", "yesterday")
- points back in time with relative offset from today (N is a number)
( "N" "seconds"/"minutes"/"hours"/"days"/"weeks"/"years" "ago")
( "N" "secs"/"mins"/"hrs" ... "ago")
( "N" "s"/"m"/"h" ... "ago")
- time specification either in hh:mm:ss format or with AM/PM suffixes
- month names ("January" - "December" or abbreviated as "Jan" - "Dec")
For example:
$ date
Fri Jul 3 10:11:13 CEST 2015
$ lvmconfig --type full report/time_format
time_format="%a %Y-%m-%d %T %z %Z [%s]"
$ lvs
LV VG Time
lvol0 vg Fri 2014-08-22 21:25:41 +0200 CEST [1408735541]
lvol2 vg Sun 2015-04-26 14:52:20 +0200 CEST [1430052740]
root fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
swap fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time=yesterday'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "June 30"'
LV VG Time
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "noon June 30"'
LV VG Time
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 9AM"'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 1PM"'
LV VG Time
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
...and so on.
Wire the dm_report_reserved_handler instance call in reporting/selection
infrastructure to handle reserved value actions (currently only
DM_REPORT_RESERVED_PARSE_FUZZY_NAME and DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
actions).
With fuzzy names we mean the names for which it's hard or even impossible
to enumerate all possible variations of the name - the name needs to
be evaluated. An example of fuzzy name is a name which has a base
(substring) which matches and it can contain arbitrary variations
around this base. We can cover human language better with fuzzy
names as people may use several different names (or sentences) to
denote the same thing.
With dynamic values we mean the values which are not constants
and they need to be evaluated in runtime. An example of dynamic
value is a value which depends on current system state (e.g. time,
current configuration or any other state which may change and it
needs runtime evaluation).
There's a handler that can be registered with reporting/selection
using dm_report_reserved_handler instance. This is a central point
in which the computation/evaluation happens when processing reserved
values. Currently, there are two actions declared:
DM_REPORT_RESERVED_PARSE_FUZZY_NAME
(translates fuzzy name into canonical name)
DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
(gets value for canonical name)
The handler is then registered as value in struct
dm_report_reserved_value (see explaining comments besided
the struct dm_report_reserved_value in libdevmapper.h).
Also, this patch provides support for simple caching of values
used during report/selection via dm_report_value_cache_{set,get}.
This is supposed to be used mainly in the dm_report_reserved_handler
instances to save values among calls so all the handler calls work
with the same base value used in computation/evaluation and/or
possibly to save resources if the evaluation is more time-consuming.
The cache is attached to the dm_report handle and so the cache is
dropped one dm_report is dropped.
dm_snprintf() returns upon success the number of characters printed
(excluding the null byte used to end output to strings).
So add extra byte to preserve \0.
This fixes regression when displaying more then a single lv name.
This patch adds support for time values used in reporting fields.
The raw values are always stored as number of seconds since epoch.
The support that comes with this patch is the basic one which allows
only for recognition of strictly formatted date and time in selection
criteria (the format follows a subset of formats defined by ISO 8601):
date time timezone
date:
YYYY-MM-DD (or shortly YYYYMMDD)
YYYY-MM (shortly YYYYMM), auto DD=1
YYYY, auto MM=01 and DD=01
time:
hh:mm:ss (or shortly hhmmss)
hh:mm (or shortly hhmm), auto ss=0
hh (or shortly hh), auto mm=0, auto ss=0
timezone (always with + or - sign):
+hh:mm or -hh:mm (or shortly +hhmm or -hhmm)
+hh or -hh
Or directly the time (number of seconds) since Epoch (1970-01-01 00:00:00 UTC)
when the number value is prefixed by "@":
@number_of_seconds_since_epoch
This patch also adds aliases for comparison operators
used together with time values which are more intuitive
to use:
since (as alias for >=)
after (as alias for >)
until (as alias for <=)
before (as alias for <)
For example:
$ lvmconfig --type full report/time_format
time_format="%Y-%m-%d %T %z %Z [%s]"
$ lvs -o name,time vg
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol2 2015-04-26 14:52:20 +0200 CEST [1430052740]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30 6:00"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
$ lvs vg -o name,time -S 'time since @1435519541'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
This is basic time recognition support that is directly a part of
libdevmapper. Recognition of more free-form expressions will be a
part of subsequent patches.
Just like we have DEFAULT_USE_LVMETAD (or DEFUALT_USE_LVMPOLLD), use
fallback_to_lvm1=1 lvm.conf setting if we configured lvm2 with
--enable-lvm1-fallback and use fallback_to_lvm1=0 otherwise.
Also, generate proper lvm.conf.in with unconfigured value.
This patch allows for registration and recognition of reserved
values which are ranges, so they're composed of two values actually
to denote the lower and upper bound for the range (stored as an array
with exactly two items to define the boundaries).
Also, this patch allows for flagging reserved values as named-only
which means that such values are not strictly reserved. The strictly
reserved values are reserved values as used before this patch.
Distinction between strictly-reserved and named-only values
is clearly visible with comparisons. Normally, strictly reserved
value is not accounted for if we do "greater than" or "lower than"
comparisons, for example:
1 2 3 ....
|
abc
- we have "abc" as reserved value for field with value "2"
- the value reported for the field is "abc" (or "2", it doesn't matter here)
- the selection we're processing is -S 'field < abc'
- the result of the selection gives nothing as "abc" is strictly
reserved value (bound to "2") and there's no order defined for
it and it would only match if we directly compared the value
(so -S 'field = abc' would match)
With named-only values, the "abc" is named-only value for "2",
so selection -S 'field < abc" is the same as using -S 'field < 2'.
The "abc" is just an alias for some value so the value or its
assigned name can be used equally in selection criteria.
Make it possible to define format for time that is displayed.
The way the format is defined is equal to the way that is used
for strftime function, although not all formatting options as
used in strftime are available for LVM2 - the set is restricted
(e.g. we do not allow newline to be printed). The lvm.conf
comments contain the whole list that LVM2 accepts for time format
together with brief description (copied from strftime man page).
For example:
(defaults used - the format is the same as used before this patch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m 2015-06-25 16:18:34 +0200
lvol1 vg -wi-a----- 4.00m 2015-06-29 09:17:11 +0200
(using 'time_format = "@%s"' in lvm.conf - number of seconds
since the Epoch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m @1435241914
lvol1 vg -wi-a----- 4.00m @1435562231
Do not display settings with undefined default values, but do display
these settings in case the value is defined directly in any part of
the existing config cascade.
For example, the lvmconfig --type current always displays these settings
(as it's somewhere in "current" configuration cascade that makes it defined).
The lvmconfig --type full displays these settings only if it's defined
somewhere in the cascade, but not if default value is used instead
The lvmconfig --type default never displays these settings...
More concrete example - let's have activation/volume_list directly
set in lvm.conf and activation/read_only_volume_list not set.
Both of these settings have *undefined default* values.
$lvmconfig --type full activation/volume_list activation/read_only_volume_list
volume_list="/dev/vg/lv"
(...only volume_list is defined, hence it's printed)
However, the comments will display more info (see also previous commit):
$lvmconfig --type full activation/volume_list activation/read_only_volume_list --withsummary
# Configuration option activation/volume_list.
# Only LVs selected by this list are activated.
# This configuration option does not have a default value defined.
# Value defined in existing configuration has been used for this setting.
volume_list="/dev/vg/lv"
# Configuration option activation/read_only_volume_list.
# LVs in this list are activated in read-only mode.
# This configuration option does not have a default value defined.
Display comment abour value from existing config being used. For example:
$ lvmconfig --type full --withsummary report/compact_output report/buffered
# Configuration option report/compact_output.
# Do not print empty report fields.
# Value defined in existing configuration has been used for this setting.
compact_output=1
# Configuration option report/buffered.
# Buffer report output.
buffered=1
The lvmconfig --type full is actually a combination of --type current
and --type missing together with --mergedconfig options used.
The overall outcome is a configuration tree with settings as LVM sees
it when it looks for the values - that means, if the setting is defined
in some config source (lvm.conf, --config, lvmlocal.conf or any profile
that is used), the setting is used. Otherwise, if the setting is not
defined in any part of the config cascade, the defaults are used.
The --type full displays exactly this final tree with all the values
defined, either coming from configuration tree or from defaults.
Synchronize with udev logic before reusing device as snapshot.
This patch tries to fix the problem with udev, where we manage
to 'active' LV for clearing, then we deactivate such device and
active again as member of 'origin&snapshot' tree all in 1 step.
There needs to be a sync point where udev has time to remove all links,
otherwise we race with scans and we may end-up with mysterious 'free'
links in the system pointing to wrong dm names.
This patch tries to fix failing topology cluster tests..
We shouldn't be adding spaces by default in output as that
may be be used already in scripts and especially for the eval
in shell scripts where spaces are not allowed between key
and value!
Add --withspaces option to lvmconfig for pretty output with
more space in for readability.
It's not an error to define empty values for
{thin,cache}_{check,repair}_options - such empty value means no
options are passed when these external commands are called.
If blkid wiping is possible, than set use_blkid_wiping=1 and
use_blkid_wiping=0 otherwise for its default value. If blkid wiping
is disabled during configure and use_blkid_wiping=1 is set by chance,
it's simply ignored - this patch is just a cleanup that makes it more
obvious for the user (we use similar logic for use_lvmetad and
use_lvmpolld settings).
Default value for lvmetad and lvmpolld has hooks in configure script,
the "lvmconfig --type default --unconfigured" should display:
use_lvmetad = @DEFAULT_USE_LVMETAD@
use_lvmpolld = @DEFAULT_USE_LVMPOLLD@
Note that these settings are not of string type. Recent change (the
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES formatting flag) makes it
possible to recognize that the setting is not of string type and if
there's unconfigured value defined for it, the enclosing " " is
automatically removed on output.
Do not use "#S" (blank string) as default value as that ends up with
'key = [ "" ]' to be generated which is not what we want in most cases.
Also, fix default values for global/{thin,cache}_{check,repair}_options
and avoid assigning blank values. For example, the thin_check_options
had this set as default value previously:
"#S" DEFAULT_THIN_CHECK_OPTION1 "#S" DEFAULT_THIN_CHECK_OPTION2
If any (or both) of DEFAULT_THIN_CHECK_OPTION* variables was set
to "", we ended up with clumsy default value generated like:
thin_check_options = [ "-q", "" ]
With this patch, we end up with correct:
thin_check_options = [ "-q" ]
or, if all options are undefined:
thin_check_options = [ ]
Which is the correct way to express this.
There are two basic groups of formatting flags (32 bits):
- common ones applicable for all config value types (lower 16 bits)
- type-related formatting flags (higher 16 bits)
With this patch, we initially support four new flags that
modify the the way the config value is displayed:
Common flags:
=============
DM_CONFIG_VALUE_FMT_COMMON_ARRAY - causes array config values
to be enclosed in "[ ]" even if there's only one item
(previously, there was no way to recognize an array with one
item and scalar value, hence array values with one member
were always displayed without "[ ]" which libdm accepted
when reading, but it may have been misleading for users)
DM_CONFIG_VALUE_FMT_COMMON_EXTRA_SPACE - causes extra spaces to
be inserted in "key = value" (or key = [ value, value, ... ] in
case of arrays), compared to "key=value" seen on output before.
This makes the output more readable for users.
Type-related flags:
===================
DM_CONFIG_VALUE_FMT_INT_OCTAL - prints integers in octal form with
"0" as a prefix (libdm's config reading code can handle this via
strtol just fine so it's properly recognized as number in octal
form already if there's "0" used as prefix)
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES - makes it possible to print
strings without enclosing " "
This patch also adds dm_config_value_set_format_flags and
dm_config_value_get_format_flags functions to set and get
these formatting flags.
This is the client side handling of the global_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
The function added here:
. checks if the global state in lvmetad is invalid
. if so, scans disks to update the state in lvmetad
. clears the global_invalid flag in lvmetad
. updates the local udev db to reflect any changes
and update the lvmetad copy after it is reread from disk.
This is the client side handling of the vg_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
Use of display_lvname() in plain log_debug() may accumulate memory in
command context mempool. Use instead small ringbuffer which allows to
store cuple (10 ATM) names so upto 10 full names can be used at one.
We are not keeping full VG/LV names as it may eventually consume larger
amount of RAM resouces if vgname is longer and lots of LVs are in use.
Note: if there would be ever needed for displaing more names at once,
the limit should be raised (e.g. log_debug() would need to print more
then 10 LVs on a single line).
With thin-pool kernel target module 1.13 it's now support usage of
external origin with sizes which are not 'alligned' with chunk size
of thin-pool.
Enable lvm2 support for this and also fix reporting of data_percent
usage for case sizes are not alligned.
Note that this is just a quick fix and it needs more robust fix to
encompass any combination, not just the (old) snapshot one!
This started with this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1219222
If we have devices/ignore_suspended_devices=1 set based on which we
filter out suspended devices as unusable (or if we ignore suspended
devices by force, e.g. during lvconvert called from dmeventd) and
when we have snapshot and snapshot origin devices in the play, we
need to look at their components unerneath (*-real and *-cow) to
check if they're not suspended. If they are, the snapshot/snapshot
origin is not usable as well and hence it needs to be filtered out
by filter-usable.c code which does suspended device filtering.
Not going into much details here, more details are in the bugzilla
mentioned above. However, this is a quick fix since snapshot
and this exact situation is not the only one. So this is
something that needs to be revisited and fixed properly
with full dm tree and checking the whole stack to state
whether the device at the very top is usable or not.
This patch fixes segfault which was caused by incorrectly marking some
settings CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED - the
ones which have NULL default value, hence they're really undefined.
A regression caused by a98ceceb1d.
For example:
$ lvmconfig log/file
file="/a"
Before this patch:
$ lvmconfig --type diff
Segmentation fault (core dumped)
With this patch applied:
$ lvmconfig --type diff
log {
file="/a"
}
The same applies for these settings:
log/activate_file
global/library_dir
global/system_id_file
<disk_area>/disk_area_id
There were also other settings with NULL default value and marked as
CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED, but they were
cfg_array config settings where the NULL value was not causing segfault
(NULL == empty array).
Just as 'e' means activation with an exclusive lock,
add an 's' to mean activation with a shared lock.
This allows the existing but implicit behavior of '-ay'
of clvm LVs to be specified explicitly. For local VGs,
asy simply means ay, just like aey means ay.
For local VGs, ay == aey == asy
For clvm VGs, ay == asy, aey == aey, asy == asy
lvmetad_init() should not be called with open connection to the daemon.
Doing so is considered to be an internall error within lvm2 code.
Such coincidence can't occur within current code. Let's assure us it won't
ever happen.
Some of descritpions were misleading at least. Some were completely
off the reality.
lvmetad_init doesn't re-establish or initialise a connection
lvmetad_active and lvmetad_connect_or_warn can do so.
These wrappers have been replaced by direct calls
to vg_read() and find_lv() in previous commits.
This commit should have no functional impact since
all bits were already unreachable.
Previous patch incorrectly skipped replace of @LOCALEDIR@.
The standard option is --localedir so use --with-localedir
as backward compatible option and set localedir if it's not
yet been set (if the could ever happen).
Use double-eval to translate $datarootdir to $prefix to real dir.
Use CFG_DEFAULT_COMMENTED and CFG_DEFAULT_UNDEFINED to
replicate the existing comments in example.conf.
Fix host_list to be cfg_array.
UNDEFINED is only used if the value depends on other
system/kernel values outside of lvm. The most common
case is when dm-thin or dm-cache have built-in default
settings in the kernel, and lvm will use those built-in
default values unless the corresponding lvm config setting
is set.
COMMENTED is used to comment out the default setting in
lvm.conf. The effect is that if the LVM version is
upgraded, and the new version of LVM has new built-in
default values, the new defaults are used by LVM unless
the previous default value was set (uncommented) in lvm.conf.
Introduce new implmentation of dm_task_get_info() function
with support for reading internal_suspend.
.
This time it is done in a 'versioned' way.
We keep the old fashion dm_task_get_info(Base) to implement
the old behavior of 1.02.95 libdm code.
libdm version 1.02.96 introduced 'macro' wrapper
dm_task_get_info_with_deferred_remove with new implementation
of dm_task_get_info() - we cannot do anything else then to
provide compatible version of this symbol.
Now in version 1.02.97 we add new versioned implementation of
dm_task_get_info(DM_1_02_97) symbol.
This has the effect that i.e. rpm build will finaly resolve proper
dependency on a new symbol - so it will be no longer possible,
to build a new binary and use old library
(rpm -q --provides will show libdevmapper.so.1.02(DM_1_02_97)(64bit))
Also the history is now tracked. If a new function is added (or
reimplemented), it needs to be placed in proper file,
so it could be exported with right versioning symbol.
File .exported_symbols.Base should and any existing older DM
should be treated as read-only after a release.
Also - only libdm has been currently enhanced with versioned .Base
file, as soon as other libs (liblvm, libdevmapper-event) needs changes
they should also get their exported symbol files - meanwhile
make.tmpl handles both cases.
Configure provides proper settings for
use_lvmetad and use_lvmpolld conf setttings.
When the build of polld & lvmetad, these settings
are enabled by default unless explicitelly disabled
with --disable-use-lvmetad/--disable-use-lvmpolld.
This is an alternative/equivalent to commit
ca67cf84df
The problem (wrong label->dev after a new preferred
duplicate device is chosen) was isolated to the lvmetad
case (non-lvmetad worked fine), and this fixes the problem
by setting the new label->dev in the lvmetad-specific
code rather than in the general lvmcache code.
In process_each_{vg,lv,pv} when no vgname args are given,
the first step is to get a list of all vgid/vgname on the
system. This is exactly what lvmetad returns from a
vg_list request. The current code is doing a vg_lookup
on each VG after the vg_list and populating lvmcache with
the info for each VG. These preliminary vg_lookup's are
unnecessary, because they will be done again when the
processing functions call vg_read. This patch eliminates
the initial round of vg_lookup's, which can roughly cut in
half the number of lvmetad requests and save a lot of extra work.
Use 64bit arithmentic for PV size calculation (Coverity).
Also remove sector shift for compared PV size, since all
values are already held in sectors.
This fixes validatio of PV size when restoring PV
from vg metadata backup file.
Synopsis: STR_LIST needs to be treated as STR for properties.
For any lvm property that was internally 'typed' as a string list we were failing
to return a string in the property API. This was due to the fact that for the
properties to work the value needs to either be evaulated as a string or a
number. This change corrects the macro used to build the memory array of
structures so that the string bitfield is set as needed to ensure that the value
is a string.
https://bugzilla.redhat.com/show_bug.cgi?id=1139920
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Currently lvm2app properties have the following structure:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t padding:28;
union {
const char *string;
uint64_t integer;
} value;
} lvm_property_value_t;
which assumes that numerical values were in the range of 0 to 2**64-1. However,
some of the properties were 'signed', like LV major/minor numbers and some
reserved values for properties that represent percentages. Thus when the
values were retrieved they were in two's complement notation. So for a -1
major number the API user would get a value of 18446744073709551615. The
API user could cast the returned value to an int64_t to handle this, but that
requires the API developer to look at the source code and determine when it
should be done.
This change modifies the return property structure to:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t is_signed:1;
uint32_t padding:27;
union {
const char *string;
uint64_t integer;
int64_t signed_integer;
} value;
} lvm_property_value_t;
With this addition the API user can interrogate that the value is numerical,
(is_integer = 1) and subsequently check if it's signed (is_signed = 1) too.
If signed, then the API developer should use the union's signed_integer to
avoid casting.
This change maintains backwards compatibility as the structure size remains
unchanged and integer value remains unchanged. Only the additional bit
taken from the pad is utilized.
Bugzilla reference:
https://bugzilla.redhat.com/show_bug.cgi?id=838257
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When kernel target reports sync status as 0% it might as well mean
it's 100% in sync, just the target is in some race inconsistent
state - so reread once again and take a more optimistic value ;)
Patch tries to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=1210637
This patch adds supporting code for handling deprecated settings.
Deprecated settings are not displayed by default in lvmconfig output
(except for --type current and --type diff). There's a new
"--showdeprecated" lvmconfig option to display them if needed.
Also, when using lvmconfig --withcomments, the comments with info
about deprecation are displayed for deprecated settings and with
lvmconfig --withversions, the version in which the setting was
deprecated is displayed in addition to the version of introduction.
If using --atversion with a version that is lower than the one
in which the setting was deprecated, the setting is then considered
as not deprecated (simply because at that version it was not
deprecated).
For example:
$ lvmconfig --type default activation
activation {
...
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated
activation {
...
mirror_region_size=512
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withversions
activation {
...
# Available since version 1.0.0.
# Deprecated since version 2.2.99.
mirror_region_size=512
# Available since version 2.2.99.
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withcomments
activation {
...
# Configuration option activation/mirror_region_size.
# This has been replaced by the activation/raid_region_size
# setting.
# Size (in KB) of each copy operation when mirroring.
# This configuration option is deprecated.
mirror_region_size=512
# Configuration option activation/raid_region_size.
# Size in KiB of each raid or mirror synchronization region.
# For raid or mirror segment types, this is the amount of
# data that is copied at once when initializing, or moved
# at once by pvmove.
raid_region_size=512
...
}
$ lvmconfig --type default activation --withcomments --atversion 2.2.98
activation {
...
# Configuration option activation/mirror_region_size.
# Size (in KB) of each copy operation when mirroring.
mirror_region_size=512
...
}
A preparatory code for marking configuration nodes as deprecated:
- struct cfg_def_item gains 2 new fields ("deprecated_since_version" and "deprecation_comment"
- cfg* macros to handle new fields
- related config_settings.h edits to add new fields for each item (null for all at the moment)
Patch with implementation will follow...
Before this patch:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad
With this patch applied:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad - [2.2.93]
We're commenting out settings with undefined default values.
The comment character '#' was printed at the very beginning of
the line, it should be placed just at the beginning of the setting,
after the space/tab prefix is printed.
Before this patch:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
With this patch applied:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
These settings are in the "unsupported" group:
devices/loopfiles
log/activate_file
metadata/disk_areas (section)
metadata/disk_areas/<disk_area> (section)
metadata/disk_areas/<disk_area>/size
metadata/disk_areas/<disk_area>/id
These settings are in the "advanced" group:
devices/dir
devices/scan
devices/types
global/proc
activation/missing_stripe_filler
activation/mlock_filter
metadata/pvmetadatacopies
metadata/pvmetadataignore
metadata/stripesize
metadata/dirs
Also, this patch causes the --ignoreunsupported and --ignoreadvanced
switches to be honoured for all config types (lvmconfig --type).
By default, the --type current and --type diff display unsupported
settings, the other types ignore them - this patch also introduces
--showunsupported switch for all these other types to display even
unsupported settings in their output if needed.
lvmconfig --type list displays plain list of configuration settings.
Some of the existing decorations can be used (--withsummary and
--withversions) as well as existing options/switches (--ignoreadvanced,
--ignoreunsupported, --ignorelocal, --atversion).
For example (displaying only "config" section so the list is not long):
$lvmconfig --type list config
config/checks
config/abort_on_errors
config/profile_dir
$ lvmconfig --type list --withsummary config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig -l config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig --type list --withsummary --withversions config
config/checks - If enabled, any LVM configuration mismatch is reported. [2.2.99]
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found. [2.2.99]
config/profile_dir - Directory where LVM looks for configuration profiles. [2.2.99]
Example with --atversion (displaying global section):
$ lvmconfig --type list global
global/umask
global/test
global/units
global/si_unit_consistency
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/etc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/prioritise_write_locks
global/library_dir
global/locking_library
global/abort_on_internal_errors
global/detect_internal_vg_cache_corruption
global/metadata_read_only
global/mirror_segtype_default
global/raid10_segtype_default
global/sparse_segtype_default
global/lvdisplay_shows_full_device_path
global/use_lvmetad
global/thin_check_executable
global/thin_dump_executable
global/thin_repair_executable
global/thin_check_options
global/thin_repair_options
global/thin_disabled_features
global/cache_check_executable
global/cache_dump_executable
global/cache_repair_executable
global/cache_check_options
global/cache_repair_options
global/system_id_source
global/system_id_file
$ lvmconfig --type list global --atversion 2.2.50
global/umask
global/test
global/units
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/library_dir
global/locking_library
Example:
/dev/loop0 and /dev/loop1 are duplicates,
created by copying one backing file to the
other.
'identity /dev/loopX' creates an identity
mapping for loopX named idmloopX, which
adds a duplicate for the named device.
The duplicate selection code for lvmetad is
incomplete, and lvmetad is disabled for this
example.
[~]# losetup -f loopfile0
[~]# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 foo lvm2 a-- 308.00m 296.00m
[~]# losetup -f loopfile1
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop1 foo lvm2 a-- 308.00m 308.00m
[~]# ./identity /dev/loop0
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 without holders, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop0 foo lvm2 a-- 308.00m 296.00m
[~]# ./identity /dev/loop1
[~]# pvs
WARNING: duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV is being used from both devices /dev/loop0 and /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop1 not /dev/mapper/idmloop0
Using duplicate PV /dev/mapper/idmloop1 which is more recent, replacing /dev/mapper/idmloop0
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop1 foo lvm2 a-- 308.00m 308.00m
Describe
thin_check_options, thin_repair_options,
cache_check_option, scache_repair_options
as a "list of options", rather than a "string of options"
because a single string, e.g. "-q --clear-needs-check-flag"
does not work, and needs to be entered as a list,
e.g. ["-q", "--clear-needs-check-flag"]
The settings which have their default value evaluated in runtime should
have their 'unconfigured' counterparts also evaluated in runtime since
those values can be constructed by using other settings.
For example, before this patch:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="/etc/lvm/cache/.cache
With this patch applied:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@/.cache"
The @something@ used for unconfigured default value is not bound to
CFG_TYPE_STRING settings defined in config_settings.h, it can be
used for any other config type too.
Show full chain of ancestors and descendants for snapshots
(both thick and thin - in case of thick, the "ancestor" field
is actually equal to "origin" field as snapshots can't be
chained for thick snapshots).
These fields display current state as it is, they do not
display any history! If the snapshot chain is broken in
the middle, we don't report the historical origin (this
is going to be a part of another patch and a different
set of fields or just a switch for existing fields to
show ancestors and descendants with history included).
For example:
(origin --> snapshot)
lvol1 --> lvol2 --> lvol3 --> lvol4
\
--> lvol5 --> lvol6 --> lvol7 --> lvol8
$ lvs -o name,pool_lv,origin,ancestors,descendants vg
LV Pool Origin Ancestors Descendants
lvol1 pool lvol2,lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol2 pool lvol1 lvol1 lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol3 pool lvol2 lvol2,lvol1 lvol4
lvol4 pool lvol3 lvol3,lvol2,lvol1
lvol5 pool lvol2 lvol2,lvol1 lvol6,lvol7,lvol8
lvol6 pool lvol5 lvol5,lvol2,lvol1 lvol7,lvol8
lvol7 pool lvol6 lvol6,lvol5,lvol2,lvol1 lvol8
lvol8 pool lvol7 lvol7,lvol6,lvol5,lvol2,lvol1
Scenario:
$ vgs -o+vg_mda_copies
VG #PV #LV #SN Attr VSize VFree #VMdaCps
fedora 1 2 0 wz--n- 9.51g 0 unmanaged
vg 16 9 0 wz--n- 1.94g 1.83g 2
$ lvs -o+read_ahead vg/lvol6 vg/lvol7
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Before this patch:
$vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
VG #VMdaCps
vg 2
Problem:
Reserved values can be only used with exact match = or !=, not <,<=,>,>=.
In the example above, the "unamanaged" is internally represented as
18446744073709551615, but this should be ignored while not comparing
field directly with "unmanaged" reserved name with = or !=. Users
should not be aware of this internal mapping of the reserved value
name to its internal value and hence it doesn't make sense for such
reserved value to take place in results of <,<=,> and >=.
There's no order defined for reserved values!!! It's a special
*reserved* value that is taken out of the usual value range
of that type.
This is very similar to what we have already fixed with
2f7f6932dc, but it's the other way round
now - we're using reserved value name in selection criteria now
(in the patch 2f7f693, we had concrete value and we compared it
with the reserved value). So this patch completes patch 2f7f693.
This patch also fixes this problem:
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Problem:
In the example above, the internal reserved value "auto" is in the
range of selection "> 32k" - it shouldn't match as well. Here the
"auto" is internally represented as MAX_DBL and of course, numerically,
MAX_DBL > 256k. But for users, the reserved value should be uncomparable
to any number so the mapping of the reserved value name to its interna
value is transparent to users. Again, there's no order defined for
reserved values and hence it should never match if using <,<=,>,>=
operators.
This is actually exactly the same problem as already described in
2f7f6932dc, but that patch failed for
size field types because of incorrect internal representation used.
With this patch applied, both problematic scenarios mentioned
above are fixed now:
$ vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
(blank)
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Rahead
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
By default these are empty strings, so the config settings
should be flagged as undefined, so they will be commented
out of the generated config. Otherwise, the lines:
thin_repair_options=""
cache_repair_options=""
in the dump output cause a warning when processed since
lvm doesn't want an empty string.
Also regenerate lvm.conf.in.
Rename envvar LVM_LOG_FILE_UNLINK_STATUS to LVM_EXPECTED_EXIT_STATUS
and change compare sign from '!' to '>'.
Validate LVM_LOG_FILE_EPOCH and support strictly only
up-to 32 alpha chars. If the content doesn't pass
epoch is simply ignored.
Add support for 2 new envvars for internal lvm2 test suite
(though it could be possible usable for other cases)
LVM_LOG_FILE_EPOCH
Whether to add 'epoch' extension that consist from
the envvar 'string' + pid + starttime in kernel units
obtained from /proc/self/stat.
LVM_LOG_FILE_UNLINK_STATUS
Whether to unlink the log depending on return status value,
so if the command is successful the log is automatically
deleted.
API is still for now experimental to catch various issue.
--withfullcomments prints all comment lines for each config option.
--withcomments prints only the first comment line, which should be
a short one-line summary of the option.
When performing initial allocation (so there is nothing yet to
cling to), use the list of tags in allocation/cling_tag_list to
partition the PVs. We implement this by maintaining a list of
tags that have been "used up" as we proceed and ignoring further
devices that have a tag on the list.
https://bugzilla.redhat.com/983600
Add A_PARTITION_BY_TAGS set when allocated areas should not share tags
with each other and allow _match_pv_tags to accept an alternative list
of tags. (Not used yet.)
Comments from the sample config files are copied into
the comment field of the config settings structure.
This includes only minimal changes to the text.
With this in place, the sample config files can
be generated from 'lvm dumpconfig', and content
for an lvm.conf man page can also be generated.
pv_write is called both to write orphans and to rewrite PV headers
of PVs in VGs. It needs to select the correct VG id so that the
internal cache state gets updated correctly.
It only affected commands that involved further steps after
the pv_write and was often masked because the metadata would
be re-read off disk and correct itself.
"Incorrect metadata area header checksum" warnings appeared.
Example:
Create vg1 containing dev1, dev2 and dev3.
Hide dev1 and dev2 from the system.
Fix up vg1 with vgreduce --removemissing.
Bring back dev1 and dev2.
In a single operation reinstate dev1 and dev2 into vg1 (vgextend).
Done as separate operations (automatically fix-up dev1 and dev2 as orphans,
then vgextend) it worked, but done all in one go the internal cache got
corrupted and warnings about checksum errors appeared.
Commit 80f4b4b803
introduced undesirable side-effects for lvm2app user
which happens to be our own python binding.
It appear obtaing pvs list keeps global lock.
So restricting this to VG_GLOBAL READ locks and skip
the drop skip if WRITE lock is held.
Do not keep dangling LVs if they're removed from the vg->lvs list and
move them to vg->removed_lvs instead (this is actually similar to already
existing vg->removed_pvs list, just it's for LVs now).
Once we have this vg->removed_lvs list indexed so it's possible to
do lookups for LVs quickly, we can remove the LV_REMOVED flag as
that one won't be needed anymore - instead of checking the flag,
we can directly check the vg->removed_lvs list if the LV is present
there or not and to say if the LV is removed or not then. For now,
we don't have this index, but it may be implemented in the future.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
The code never mixes reads of committed and precommitted metadata,
so there's no need to attempt to set PRECOMMITTED when
*use_previous_vg is being set.
Refactor the recent metadata-reading optimisation patches.
Remove the recently-added cache fields from struct labeller
and struct format_instance.
Instead, introduce struct lvmcache_vgsummary to wrap the VG information
that lvmcache holds and add the metadata size and checksum to it.
Allow this VG summary information to be looked up by metadata size +
checksum. Adjust the debug log messages to make it clear when this
shortcut has been successful.
(This changes the optimisation slightly, and might be extendable
further.)
Add struct cached_vg_fmtdata to format-specific vg_read calls to
preserve state alongside the VG across separate calls and indicate
if the details supplied match, avoiding the need to read and
process the VG metadata again.
Fixes segfault when 'pvs' encounters two different PVs sharing the same
uuid but one an orphan, the other in a VG.
If VG_GLOBAL is held, there seems no point in doing a full scan more
than once.
If undesirable side-effects show up, we can try restricting this to
VG_GLOBAL READ locks. The original code dates back to 2.02.40.
When pvscan --cache --major --minor command is issued from
udev REMOVE event, it basically resulted into a whole device
scan since the device was missing. So avoid such scan
and first check via /sysfs (when available) if such device actually
exists.
When available use nanosecond stat info.
If commands are running closely enough after config update,
the .cache file from persistent filter could have been ignored.
This happens sometimes during i.e. synthetic test suite run.
Metadata areas which are marked as ignored should not be scanned
and read during pvscan --cache. Otherwise, this can cause lvmetad
to cache out-of-date metadata in case other PVs with fresh metadata
are missing by chance.
Make this to work like in non-lvmetad case where the behaviour would
be the same as if the PV was orphan (in case we have no other PVs
with valid non-ignored metadata areas).
When lvm1 PVs are visible, and lvmetad is used, and the foreign
option was included in the reporting command, the reporting
command would fail after the 'pvscan all devs' function saw
the lvm1 PVs. There is no reason the command should fail
because of the lvm1 PVs; they should just be ignored.
Return 1 on success in pvdisplay_short() and lvdisplay_full()
so commands like vgdisplay are not printinig stracktraces
on successful passes.
As the results of fail/success have been internally ignored for those
calls, it had no other visible side effect - command's return value was
still 0 (success).
Detect an lvm1 system id by looking at the WRITE_LOCKED flag.
Don't copy this lvm1 system id into vg->system_id so that the
restrictions associated with the new system id are not applied
to the old VG with the inherited lvm1 system id.
Since we take a lock inside vg_lock_newname() and we do a full
detection of presence of vgname inside all scanned labels,
there is no point to do this for second time to be sure
there is no such vg.
The only side-effect of such call would be a full validation of
some already exising VG metadata - but that's not the task for
vgcreate when create a new VG.
This call noticable reduces number of scans during 'vgcreate'.
Use similar logic as with text_vg_import_fd() and avoid repeated
parsing of same mda and its config tree for vgname_from_mda().
Remember last parsed vgname, vgid and creation_host in labeller
structure and if the metadata have the same size and checksum,
return this stored info.
TODO: The reuse of labeller struct is not ideal, some lvmcache API for
this functionality would be nicer.
When reading VG mda from multiple PVs - do all the validation only
when mda is seen for the first time and when mda checksum and length
is same just return already existing VG pointer.
(i.e. using 300PVs for a VG would lead to create and destroy 300 config trees....)
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
The seg_monitor did not display monitored status for thick snapshots
and mirrors (with mirror log *not* mirrored). The seg monitor did work
correctly even before for other segtypes - thins and raids.
Before (mirrors and snapshots, only mirrors with mirrored log properly displayed monitoring status):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot
With this patch applied (monitoring status displayed for all mirrors and snapshots):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public monitored
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot monitored
If configuration setting is marked in config_setting.h with CFG_DISABLED
flag, default value is always used for such setting, no matter if it's defined
by user (in --config/lvm.conf/lvmlocal.conf).
A warning message is displayed if this happens:
For example:
[1] f21/~ # lvm dumpconfig --validate
WARNING: Configuration setting global/system_id_source is disabled. Using default value.
LVM configuration valid.
[1] f21/~ # pvs
WARNING: Configuration setting global/system_id_source is disabled. Using default value.
PV VG Fmt Attr PSize PFree
/dev/sdb lvm2 --- 128.00m 128.00m
...
Set ACCESS_NEEDS_SYSTEM_ID VG status flag whenever there is
a non-lvm1 system_id set. Prevents concurrent access from
older LVM2 versions.
Not set on VGs that bear a system_id only due to conversion
from lvm1 metadata.
Export _lvm1_system_id as generate_lvm1_system_id and call it in
vg_setup() so it is set before writing the metadata to disk
and not missing from the initial metadata backup file.
format_text processes both lvm2 on-disk metadata and metadata read
from other sources such as backup files. Add original_fmt field
to retain the format type of the original metadata.
Before this patch, /etc/lvm/archives would contain backups of
lvm1 metadata with format = "lvm2" unless the source was lvm1 on-disk
metadata.
The vg->lvm1_systemd_id needs to be initialized as all the code around
counts with that. Just like we initialize lvm1_system_id in vg_create
(no matter if it's actually LVM1 or LVM2 format), this patch adds this
init in alloc_vg as well so the rest of the code does not segfaul
when trying to access vg->lvm1_system_id.
In log messages refer to it as system ID (not System ID).
Do not put quotes around the system_id string when printing.
On the command line use systemid.
In code, metadata, and config files use system_id.
In lvmsystemid refer to the concept/entity as system_id.