IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Seems the amount of allocated data on stack is dependent on page size.
As the page size on aarch64 is 64kiB writing to memory allocated by
alloca results in stack overflow as at the time of allocation the are
already 2 pages allocated. Clearly 128kiB is not sufficient and at least
3 pages are needed.
While we could probably reacquire some type of lock when
going from non-clustered to clustered vg, we don't have any
single road back to drop the lock and keep LV active.
For now keep it safe and prohibit conversion when LV
is active in the VG.
Try to enforce consistent macro usage along these lines:
lv_is_mirror - mirror that uses the original dm-raid1 implementation
(segment type "mirror")
lv_is_mirror_type - also includes internal mirror image and log LVs
lv_is_raid - raid volume that uses the new dm-raid implementation
(segment type "raid")
lv_is_raid_type - also includes internal raid image / log / metadata LVs
lv_is_mirrored - LV is mirrored using either kernel implementation
(excludes non-mirror modes like raid5 etc.)
lv_is_pvmove - internal pvmove volume
Cmirrord has endian bugs, which cause failure to lvcreate a mirrored lv
on s390.
- data_size is uint32, should not use xlate64 to convert, which will
cause data_size 0 after xlate.
- request_type and data_size still used by local(v5_data_switch),
should convert later. If request_type xlate too early, it will
cause request_type judge error; if data_size xlate too early, it
will cause coredump in case DM_ULOG_CLEAR_REGION.
- when receiving package in clog_request_from_network. vp[0] will always
be little endian. We could use xlate64(vp[0]) == vp[0] to decide if
the local node is little endian or not.
Signed-off-by: Lidong Zhong<lzhong@suse.com> & Liuhua Wang <lwang@suse.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Use lv_is_* macros throughout the code base, introducing
lv_is_pvmove, lv_is_locked, lv_is_converting and lv_is_merging.
lv_is_mirror_type no longer includes pvmove.
This is probably better approach than 3880ca5eca.
If dm module is not loaded during dm_is_dm_major call, there are no
lines for dm in /proc/devices, of course. Normally, dm_is_dm_major
is called to check existing devices, hence if module is not loaded,
we can expect there's no DM device present at the same time so we
can directly return 0 here (meaning the major number being inspected
is not dm device's one).
See also https://bugzilla.redhat.com/show_bug.cgi?id=1059711.
For dm_is_dm_major to determine whether the major number given as
an argument belongs to a DM device, libdm code needs to know what
the actual DM major is to do the comparison.
It may happen that the dm-mod module is not loaded during this
call and so for the completness let's try our best before we start
giving various errors - we can still make use of dm-mod autoloading,
though only since kernels 2.6.36 where this feature was introduced.
Caused by recent changes - a7be3b12df.
If global filter was not defined, then part of the code
creating composite filter (the cmd->lvmetad_filter) incorrectly
increased index value even if this global filter was not created
as part of the composite filter. This caused a gap with "NULL"
value in the composite filter array which ended up with the rest
of the filters after the gap to be ignored and also it caused a mem
leak when destroying the composite filter.
We used to print an error message whenever we tried to deal with devices that
lvmetad knew about but were rejected by a client-side filter. Instead, we now
check whether the device is actually absent or only filtered out and only print
a warning in the latter case.
If a PV label is exposed both through a composite device (MD for example) and
through its component devices, we always want the PV that lvmetad sees to be the
composite, since this is what all LVM commands (including activation) will then
use. If pvscan --cache is triggered for multiple clones of the same PV, the last
to finish wins. This patch basically re-arranges the filters so that
component-device filters are part of the global_filter chain, not of the
client-side filter chain. This has a subtle effect on filter evaluation order,
but should not alter visible semantics in the non-lvmetad case.
The code in dev_iter_create assumes that if a filter can be wiped, doing so will
always trigger a call to _full_scan. This is not true for composite filters
though, since they can always be wiped in principle, but there is no way to know
that a component filter inside will exist that actually triggers the scan.
This message should be printed only for activation commands,
however since the handling of this flag is not correct
(rhbz 1140029) and will require further changes,
do now just a minor change and switch message into log_debug
(so it's not printed i.e. with every 'lvs -v')
Use lv_update_and_reload() and lv_update_and_reload_origin()
to handle write/suspend/commit/resume sequence.
In few places this properly handle vg_revert() after suspend failure,
and also ensures there is metadata backup after successful vg_commit().
Fix rename operation for snapshot (cow) LV.
Only the snapshot's origin has the lock and by mistake suspend
and resume has been called for the snapshot LV.
This further made volumes unusable in cluster.
So instead of suspend and resuming list of LVs,
we need to just suspend and resume origin.
As the sequence write/suspend/commit/resume
is widely used in lvm2 code base - move it to
new lv_update_and_reload function.
Commit 94786a3bbf introduced
another bug - since sscanf needs extra 1 byte for \0.
Since there is no easy way to do a macro evaluation for (PATH_MAX-1)
and string concatation of this number to get resulting (%4095s) - let's
go with easiest path and restore extra byte for 0.
Other option would be to prepare sscanf parsing string in runtime.
But lets resolve it when we look at PATH_MAX handling later...
Coverity noticed this function may return untouched buffer,
however in this state can't really happen, but anyway
ensure on error path the buffer will have zero lenght string.
Add extra safety detection for thin pool transaction id
and query pool status after confirmed message.
In case there is a missmatch, immeditelly abort further
processing.
Fixing problem, when user sets volume_list and excludes thin pools
from activation. In this case pool return 'success' for skipped activation.
We need to really check the volume it is actually active to properly
to remove queued pool messages. Otherwise the lvm2 and kernel
metadata started to go async since lvm2 believed, messages were submitted.
Add also better check for threshold when create a new thin volume.
In this case we require local activation of thin pool so we are able
to check pool fullness.
This patch makes the keyword combinations found in "lv_layout" and
"lv_role" much more understandable - there were some ambiguities
for some of the combinations which lead to confusion before.
Now, the scheme used is:
LAYOUTS ("how the LV is laid out"):
===================================
[linear] (all segments have number of stripes = 1)
[striped] (all segments have number of stripes > 1)
[linear,striped] (mixed linear and striped)
raid (raid layout always reported together with raid level, raid layout == image + metadata LVs underneath that make up raid LV)
[raid,raid1]
[raid,raid10]
[raid,raid4]
[raid,raid5] (exact sublayout not specified during creation - default one used - raid5_ls)
[raid,raid5,raid5_ls]
[raid,raid5,raid6_rs]
[raid,raid5,raid5_la]
[raid,raid5,raid5_ra]
[raid6,raid] (exact sublayout not specified during creation - default one used - raid6_zr)
[raid,raid6,raid6_zr]
[raid,raid6,raid6_nc]
[raid,raid6,raid6_ns]
[mirror] (mirror layout == log + image LVs underneath that make up mirror LV)
thin (thin layout always reported together with sublayout)
[thin,sparse] (thin layout == allocated out of thin pool)
[thin,pool] (thin pool layout == data + metadata volumes underneath that make up thin pool LV, not supposed to be used for direct use!!!)
[cache] (cache layout == allocated out of cache pool in conjunction with cache origin)
[cache,pool] (cache pool layout == data + metadata volumes underneath that make up cache pool LV, not supposed to be used for direct use!!!)
[virtual] (virtual layout == not hitting disk underneath, currently this layout denotes only 'zero' device used for origin,thickorigin role)
[unknown] (either error state or missing recognition for such layout)
ROLES ("what's the purpose or use of the LV - what is its role"):
=================================================================
- each LV has either of these two roles at least: [public] (public LV that users may use freely to write their data to)
[public] (public LV that users may use freely to write their data to)
[private] (private LV that LVM maintains; not supposed to be directly used by user to write his data to)
- and then some special-purpose roles in addition to that:
[origin,thickorigin] (origin for thick-style snapshot; "thick" as opposed to "thin")
[origin,multithickorigin] (there are more than 2 thick-style snapshots for this origin)
[origin,thinorigin] (origin for thin snapshot)
[origin,multithinorigin] (there are more than 2 thin snapshots for this origin)
[origin,extorigin] (external origin for thin snapshot)
[origin,multiextoriginl (there are more than 2 thin snapshots using this external origin)
[origin,cacheorigin] (cache origin)
[snapshot,thicksnapshot] (thick-style snapshot; "thick" as opposed to "thin")
[snapshot,thinsnapshot] (thin-style snapshot)
[raid,metadata] (raid metadata LV)
[raid,image] (raid image LV)
[mirror,image] (mirror image LV)
[mirror,log] (mirror log LV)
[pvmove] (pvmove LV)
[thin,pool,data] (thin pool data LV)
[thin,pool,metadata] (thin pool metadata LV)
[cache,pool,data] (cache pool data LV)
[cache,pool,metadata] (cache pool metadata LV)
[pool,spare] (pool spare LV - common role of LV that makes it used for both thin and cache repairs)
This makes it a bit more readable since we can report more general
layouts/roles first and keywords describing the LV more precisely
afterwards in the list.
The 'lv_type' field name was a bit misleading. Better one is 'lv_role'
since this fields describes what's the actual use of the LV currently -
its 'role'.
Sort out the lvresize calculation code to handle size changes
specified as physical extents as well as logical extents
and to process mirror resizing and raid extensions correctly.
The 'approx alloc' option was masking the underlying problem.
Commit 5ebff6cc9f seemed to introduce
new 'for' loop but the mode is not yet used.
But the access to /dev dir needs to go through $DM_DEV_DIR
and whole path needs to be in "".
When testing conversion sanity, we checked lv->status & MIRRORED
which encompasses both old mirrors and raid1 mirrors. But we need to
ban only the old mirrors here hence allow raid1 mirrors.
Avoid playing with +1.
PATH_MAX code needs probably more thinking anyway, since
there is no MAX path in Linux - user may easily create path
with 64kB chars - so 4kB buffer is surelly not enough for
such dirs.
Note:
http://insanecoding.blogspot.cz/2007/11/pathmax-simply-isnt.html
The lv_type_name function is remnant from old code that reported
only single string for the LV type. LV types are now reported
in a more extended way as keyword list that describe the type
precisely (using lv_layout_and_type fn).
The lv_type_name was used in some error messages to display the
type of the LV so just reinstate the old messages back referencing
the type directly with a string - this is enough for error messages.
They don't need to display the LV type as precisely as it's used
on lvs output (which is optimized for selection anyway).
$ lvs -a -o name,vg_name,attr,layout,type
LV VG Attr Layout Type
lvol0 vg -wI-a----- linear linear
[pvmove0] vg p-C-aom--- mirror mirror,pvmove
(added "mirror" for pvmove LV)
$ lvs -a -o name,vg_name,attr,layout,type
LV VG Attr Layout Type
lvol0 vg ori------- linear external,multiple,origin,thin
[lvol1_pmspare] vg ewi------- linear metadata,pool,spare
lvol2 vg Vwi-a-tz-- thin snapshot,thin
lvol3 vg Vwi-a-tz-- thin snapshot,thin
pool vg twi-a-tz-- pool,thin pool,thin
[pool_tdata] vg Twi-ao---- linear data,pool,thin
[pool_tmeta] vg ewi-ao---- linear metadata,pool,thin
(added "multiple" for external origin used for more than one
thin snapshot - lvol0 in the example above)
Thin snapshots having external origins missed the "snapshot" keyword for
lv_type field. Also, thin external origins which are thin devices (from
another pool) were not recognized properly.
For example, external origin itself can be either non-thin volume (lvol0
below) or it can be a thin volume from another pool (lvol3 below):
Before this patch:
$ lvs -o name,vg_name,attr,pool_lv,origin,layout,type
Internal error: Failed to properly detect layout and type for for LV vg/lvol3
Internal error: Failed to properly detect layout and type for for LV vg/lvol3
LV VG Attr Pool Origin Layout Type
lvol0 vg ori------- linear external,origin,thin
lvol2 vg Vwi-a-tz-- pool lvol0 thin thin
lvol3 vg ori---tz-- pool unknown external,origin,thin,thin
lvol4 vg Vwi-a-tz-- pool1 lvol3 thin thin
pool vg twi-a-tz-- pool,thin pool,thin
pool1 vg twi-a-tz-- pool,thin pool,thin
- lvol2 as well as lvol4 have missing "snapshot" in type field
- lvol3 has unrecognized layout (should be "thin"), but has double
"thin" in lv_type which is incorrect
- (also there's double "for" in the internal error message)
With this patch applied:
$ lvs -o name,vg_name,attr,pool_lv,origin,layout,type
LV VG Attr Pool Origin Layout Type
lvol0 vg ori------- linear external,origin,thin
lvol2 vg Vwi-a-tz-- pool lvol0 thin snapshot,thin
lvol3 vg ori---tz-- pool thin external,origin,thin
lvol4 vg Vwi-a-tz-- pool1 lvol3 thin snapshot,thin
pool vg twi-a-tz-- pool,thin pool,thin
pool1 vg twi-a-tz-- pool,thin pool,thin
The maximum stripe size is equal to the volume group PE size. If that
size falls below the STRIPE_SIZE_MIN, the creation of RAID 4/5/6 volumes
becomes impossible. (The kernel will fail to load a RAID 4/5/6 mapping
table with a stripe size less than STRIPE_SIZE_MIN.) So, we report an
error if it is attempted.
This is very rare because reducing the PE size down that far limits the
size of the PV below that of modern devices.
This patch adds a new flag --deferred to dmsetup remove. If this flag is
specified and the device is open, it is scheduled to be deleted on
close.
struct dm_info is extended.
The existing dm_task_get_info() is converted into a wrapper around the
new version dm_task_get_info_with_deferred_remove() so existing binaries
can still use the old smaller structure.
Recompiled code will pick up the new larger structure.
From: Mikulas Patocka <mpatocka@redhat.com>
metadata/lv_manip.c:269: warning: declaration of "snapshot_count" shadows a global declaration
There's existing function called "snapshot_count" so rename the
variable to "snap_count".
When ignoring 'listed' volume, print info message.
(So the final command error message is a bit less confusing,
i.e. when user tries to deactive virtual origin:
> lvchange -an vg/lvol2_vorigin
Ignoring virtual origin logical volume vg/lvol2_vorigin.
One or more specified logical volume(s) not found.
The lv_layout and lv_type fields together help with LV identification.
We can do basic identification using the lv_attr field which provides
very condensed view. In contrast to that, the new lv_layout and lv_type
fields provide more detialed information on exact layout and type used
for LVs.
For top-level LVs which are pure types not combined with any
other LV types, the lv_layout value is equal to lv_type value.
For non-top-level LVs which may be combined with other types,
the lv_layout describes the underlying layout used, while the
lv_type describes the use/type/usage of the LV.
These two new fields are both string lists so selection (-S/--select)
criteria can be defined using the list operators easily:
[] for strict matching
{} for subset matching.
For example, let's consider this:
$ lvs -a -o name,vg_name,lv_attr,layout,type
LV VG Attr Layout Type
[lvol1_pmspare] vg ewi------- linear metadata,pool,spare
pool vg twi-a-tz-- pool,thin pool,thin
[pool_tdata] vg rwi-aor--- level10,raid data,pool,thin
[pool_tdata_rimage_0] vg iwi-aor--- linear image,raid
[pool_tdata_rimage_1] vg iwi-aor--- linear image,raid
[pool_tdata_rimage_2] vg iwi-aor--- linear image,raid
[pool_tdata_rimage_3] vg iwi-aor--- linear image,raid
[pool_tdata_rmeta_0] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_1] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_2] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_3] vg ewi-aor--- linear metadata,raid
[pool_tmeta] vg ewi-aor--- level1,raid metadata,pool,thin
[pool_tmeta_rimage_0] vg iwi-aor--- linear image,raid
[pool_tmeta_rimage_1] vg iwi-aor--- linear image,raid
[pool_tmeta_rmeta_0] vg ewi-aor--- linear metadata,raid
[pool_tmeta_rmeta_1] vg ewi-aor--- linear metadata,raid
thin_snap1 vg Vwi---tz-k thin snapshot,thin
thin_snap2 vg Vwi---tz-k thin snapshot,thin
thin_vol1 vg Vwi-a-tz-- thin thin
thin_vol2 vg Vwi-a-tz-- thin multiple,origin,thin
Which is a situation with thin pool, thin volumes and thin snapshots.
We can see internal 'pool_tdata' volume that makes up thin pool has
actually a level10 raid layout and the internal 'pool_tmeta' has
level1 raid layout. Also, we can see that 'thin_snap1' and 'thin_snap2'
are both thin snapshots while 'thin_vol1' is thin origin (having
multiple snapshots).
Such reporting scheme provides much better base for selection criteria
in addition to providing more detailed information, for example:
$ lvs -a -o name,vg_name,lv_attr,layout,type -S 'type=metadata'
LV VG Attr Layout Type
[lvol1_pmspare] vg ewi------- linear metadata,pool,spare
[pool_tdata_rmeta_0] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_1] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_2] vg ewi-aor--- linear metadata,raid
[pool_tdata_rmeta_3] vg ewi-aor--- linear metadata,raid
[pool_tmeta] vg ewi-aor--- level1,raid metadata,pool,thin
[pool_tmeta_rmeta_0] vg ewi-aor--- linear metadata,raid
[pool_tmeta_rmeta_1] vg ewi-aor--- linear metadata,raid
(selected all LVs which are related to metadata of any type)
lvs -a -o name,vg_name,lv_attr,layout,type -S 'type={metadata,thin}'
LV VG Attr Layout Type
[pool_tmeta] vg ewi-aor--- level1,raid metadata,pool,thin
(selected all LVs which hold metadata related to thin)
lvs -a -o name,vg_name,lv_attr,layout,type -S 'type={thin,snapshot}'
LV VG Attr Layout Type
thin_snap1 vg Vwi---tz-k thin snapshot,thin
thin_snap2 vg Vwi---tz-k thin snapshot,thin
(selected all LVs which are thin snapshots)
lvs -a -o name,vg_name,lv_attr,layout,type -S 'layout=raid'
LV VG Attr Layout Type
[pool_tdata] vg rwi-aor--- level10,raid data,pool,thin
[pool_tmeta] vg ewi-aor--- level1,raid metadata,pool,thin
(selected all LVs with raid layout, any raid layout)
lvs -a -o name,vg_name,lv_attr,layout,type -S 'layout={raid,level1}'
LV VG Attr Layout Type
[pool_tmeta] vg ewi-aor--- level1,raid metadata,pool,thin
(selected all LVs with raid level1 layout exactly)
And so on...
_pvcreate_check() has two missing requirements:
After refreshing filters there must be a rescan.
(Otherwise the persistent filter may remain empty.)
After wiping a signature, the filters must be refreshed.
(A device that was previously excluded by the filter due to
its signature might now need to be included.)
If several devices are added at once, the repeated scanning isn't
strictly needed, but we can address that later as part of the command
processing restructuring (by grouping the devices).
Replace the new pvcreate code added by commit
54685c20fc "filters: fix regression caused
by commit e80884cd080cad7e10be4588e3493b9000649426"
with this change to _pvcreate_check().
The filter refresh problem dates back to commit
acb4b5e4de "Fix pvcreate device check."
Using "[ ]" operator together with "&&" (or ",") inside causes the
string list to be matched if and only if all the items given match
the value reported and the number of items also match. This is
strict list matching and the original behaviour we already have.
In contrast to that, the new "{ }" operator together with "&&" inside
causes the string list to be matched if and only if all the items given
match the value reported but the number of items don't need to match.
So we can provide a subset in selection criteria and if the subset
is found, it matches.
For example:
$ lvs -o name,tags
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
$ lvs -o name,tags -S 'tags=[a,b]'
LV LV Tags
lvol1 a,b
$ lvs -o name,tags -S 'tags={a,b}'
LV LV Tags
lvol1 a,b
lvol3 a,b,y
So in the example above the a,b is subset of a,b,y and therefore
it also matches.
Clearly, when using "||" (or "#") inside, the { } and [ ] is the
same:
$ lvs -o name,tags -S 'tags=[a#b]'
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
$ lvs -o name,tags -S 'tags={a#b}'
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
Also in addition to the above feature, fix list with single value
matching when using [ ]:
Before this patch:
$ lvs -o name,tags -S 'tags=[a]'
LV LV Tags
lvol0 a
lvol1 a,b
lvol3 a,b,y
With this patch applied:
$ lvs -o name,tags -S 'tags=[a]'
LV LV Tags
lvol0 a
In case neither [] or {} is used, assume {} (the behaviour is not
changed here):
$ lvs -o name,tags -S 'tags=a'
LV LV Tags
lvol0 a
lvol1 a,b
lvol3 a,b,y
So in new terms 'tags=a' is equal to 'tags={a}'.
If using persistent filter and we're refreshing filters (just like we
do for pvcreate now after commit 54685c20fc),
we can't rely on getting the primary device of the partition from the cache
as such device could be already filtered by persistent filter and we get
a device cache lookup failure for such device.
For example:
$ lvm dumpconfig --type diff
devices {
obtain_device_list_from_udev=0
}
$lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 128M 0 disk
`-sda1 8:1 0 127M 0 part
$cat /etc/lvm/cache/.cache | grep sda
"/dev/sda1",
$pvcreate /dev/sda1
dev_is_mpath: failed to get device for 8:1
Physical volume "/dev/sda1" successfully created
The problematic part of the code called dev_cache_get_by_devt
to get the device for the device number supplied. Then the code
used dev_name(dev) to get the name which is then used in check
whether there's any mpath on top of this dev...
This patch uses sysfs to get the base name for the partition
instead, hence avoiding the device cache which is a correct
approach here.
The message "Cannot deactivate remotely exclusive device locally." makes
sense only for clustered LV. If the LV is non-clustered, then it's
always exclusive by definition and if it's already deactivated, this
message pops up inappropriately as those two conditions are met.
So issue the message only if the conditions are met AND we have clustered VG.
With cmirrord, we can do pvmove of clustered mirror. The code checking
suitability of LVs on the PV being moved issued a message if a mirror
LV was found and the VG was clustered. However, the actual pvmove did
work correctly.
The top-level mirror LV is actually skipped in the code since it's
always layered on top of internal LVs making up the mirror LV and for pvmove
we consider these internal devices only as they're actually layered on
top of concrete PVs then. But we don't need to issue any message here
about skipping the top-level mirror LV - it's misleading here.
Commit e80884cd08 tried to dump filters
for them to be reevaluated when creating a PV to avoid overwriting
any existing signature that may have been created after last
scan/filtering.
However, we need to call refresh_filters instead of
persistent_filter->dump since dump requires proper rescannig to fill
up the persistent filter again. However, this is true only for pvcreate
but not for vgcreate with PV creation where the scanning happens before
this PV creation and hence the next rescan (if not full scan), does not
fill the persistent filter.
Also, move refresh_filters so that it's called sooner and only for
pvcreate, vgcreate already calls lvmcache_label_scan(cmd, 2) which
then calls refresh_filters itself, so no need to reevaluate this again.
This caused the persistent filter (/etc/lvm/cache/.cache file) to be
wrong and contain only the PV just being processed with
vgcreate <vg_name> <pv_name_to_create>.
This regression caused other block devices to be filtered out in case
the vgcreate with PV creation was used and then the persistent filter
is used by any other LVM command afterwards.
Make lvresize -l+%FREE support approximate allocation.
Move existing "Reducing/Extending' message to verbose level
and change it to say 'up to' if approximate allocation is being used.
Replace it with a new message that gives the actual old and new size or
says 'unchanged'.
This is addendum to commit 2e82a070f3
which fixed these spurious messages that appeared after commit
651d5093ed ("avoid pv_read in
find_pv_by_name").
There was one more "not found" message issued in case the device
could not be found in device cache (commit 2e82a07 fixed this only
for PV lookup itself). But if we "allow_unformatted" for
find_pv_by_name, we should not issue this message even in case
the device can't be found in dev cache as we just need to know
whether there's a PV or not for the code to decide on next steps
and we don't want to issue any messages if either device itself
is not found or PV is not found.
For example, when we were creating a new PV (and so allow_unformatted = 1)
and the device had a signature on it which caused it to be filtered
by device filter (e.g. MD signature if md filtering is enabled),
or it was part of some other subsystem (e.g. multipath), this message
was issued on find_pv_by_name call which was misleading.
Also, remove misleading "stack" call in case find_pv_by_name
returns NULL in pvcreate_check - any error state is reported
later by pvcreate_check code so no need to "stack" here.
There's one more and proper check to issue "not found" message if
the device can't be found in device cache within pvcreate_check fn
so this situation is still covered properly later in the code.
Before this patch (/dev/sda contains MD signature and is therefore filtered):
$ pvcreate /dev/sda
Physical volume /dev/sda not found
WARNING: linux_raid_member signature detected on /dev/sda at offset 4096. Wipe it? [y/n]:
With this patch applied:
$ pvcreate /dev/sda
WARNING: linux_raid_member signature detected on /dev/sda at offset 4096. Wipe it? [y/n]:
Non-existent devices are still caught properly:
$ pvcreate /dev/sdx
Device /dev/sdx not found (or ignored by filtering).
2.02.106 added suffixes to some LV uuids in the kernel.
If any of these LVs is activated with 2.02.105 or earlier,
and then a later version is used, the LVs appear invisible and
activation commands fail.
The code now has to check the kernel for both old and new uuids.
Fix get_pool_params to only read params.
Add poolmetadataspare option to get_pool_params.
Move all profile code into update_pool_params.
Move recalculate code into pool_manip.c
Revert logic and rename new arg_ functions to:
arg_from_list_is_set()
arg_outside_list_is_set()
When err_found is given, log_error message is automaticaly
printed.
Few unecessary comments were written to on-disc metadata.
Use outfc() to have comments only in archived files.
(may also save couple bytes in ringbuffer).
TODO: needed validation against newline char...
Support --repair and --use-policies with mirrors.
(fixes another regression from lvconvert change for thin and cache).
TODO: the code path for mirror needs update.
Major update of lvconvert code to handle cache and thin.
related targets.
Code tries to unify handling of cache and thin pools.
Better supports lvm2 syntax:
lvconvert --type cache --cachepool vg/pool vg/cache
lvconvert --type thin --thinpool vg/pool vg/extorg
lvconvert --type cache-pool vg/pool
lvconvert --type thin-pool vg/pool
as well as:
lvconvert --cache --cachepool vg/pool vg/cache
lvconvert --thin --thinpool vg/pool vg/extorg
lvconvert --cachepool vg/pool
lvconvert --thinpool vg/pool
While catching much more command line errors.
(Yet couple paths still needs more tests)
Detects as much cmdline errors prior opening VG.
Uses single lvconvert_name_params to convert LV names.
Detects as much incompatibilies in VG prior prompting.
Uses single prompt to confirm whole conversion.
TODO: still the code needs fixes...
Cache pools are similar as with thin pools.
Add (needs %s) - since cache has currently
a bit strange need for extra few kb over
our default 4M extent size so make it more obvious.
Since the type passed LV is changed and content of data detroyed,
query user with prompt to confirm this operation.
Also add a proper wiping of header.
Using '--yes' will skip this prompt:
lvconvert -s --yes vg/lv vg/lvcow
When EOF is detect - it could be either 'Ctrl+C'
or empty stdin.
For Ctrl+C there is visual ^C sign.
For EOF print 'n' so decision is clear in debug print.
This is addendum for commit 6dc7b783c8.
LVM1 format stores the ALLOCATABLE flag directly in PV header, not
in VG metadata. So the code needs to be fixed further to work
properly for lvm1 format so that the correct PV header is written
(the flag is set only if the PV is in some VG, unset otherwise).
Before the patch:
$ lvs -o name,active vg/lvol1 --driverloaded n
WARNING: Activation disabled. No device-mapper interaction will beattempted.
LV Active
lvol1 active
With this patch applied:
$ lvs -o name,active vg/lvol1 --driverloaded n
WARNING: Activation disabled. No device-mapper interaction will be attempted.
LV Active
lvol1 unknown
The same for active_{locally,remotely,exclusively} fields.
Also, rename headings for these fields (ActLocal/ActRemote/ActExcl).
If the lv_info call fails for whatever reason/INFO dm ioctl fails or
the dm driver communication is disabled (--driverloaded n), make
sure we always display "unknown" for LVSINFO fields as that's exactly
what happens - we don't know the state.
Before the patch:
$ lvs -o name,device_open --driverloaded n
WARNING: Activation disabled. No device-mapper interaction will be attempted.
Command failed with status code 5.
With this patch applied:
$ lvs -o name,device_open --driverloaded n
WARNING: Activation disabled. No device-mapper interaction will be attempted.
LV DevOpen
lvol1 unknown
Commit 33d69162e4 reduced the number of
PVs to a level where the test could not function. (It is impossible
to replace 3 PVs of a 4-way RAID1 LV if there are only 5 PVs.)
Like other binary fields we already have:
$ lvs -o name,zero vg/lvx vg/pool vg/pool1
LV Zero
lvx unknown
pool
pool1 zero
$ lvs -o name,zero vg/lvx vg/pool vg/pool1 --binary
LV Zero
lvx -1
pool 0
pool1 1
Currently, we have two modes of activation, an unnamed nominal mode
(which I will refer to as "complete") and "partial" mode. The
"complete" mode requires that a volume group be 'complete' - that
is, no missing PVs. If there are any missing PVs, no affected LVs
are allowed to activate - even RAID LVs which might be able to
tolerate a failure. The "partial" mode allows anything to be
activated (or at least attempted). If a non-redundant LV is
missing a portion of its addressable space due to a device failure,
it will be replaced with an error target. RAID LVs will either
activate or fail to activate depending on how badly their
redundancy is compromised.
This patch adds a third option, "degraded" mode. This mode can
be selected via the '--activationmode {complete|degraded|partial}'
option to lvchange/vgchange. It can also be set in lvm.conf.
The "degraded" activation mode allows RAID LVs with a sufficient
level of redundancy to activate (e.g. a RAID5 LV with one device
failure, a RAID6 with two device failures, or RAID1 with n-1
failures). RAID LVs with too many device failures are not allowed
to activate - nor are any non-redundant LVs that may have been
affected. This patch also makes the "degraded" mode the default
activation mode.
The degraded activation mode does not yet work in a cluster. A
new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
to make that work. Currently, there is limited space for this
extra flag and I am looking for possible solutions. One possible
solution is to usurp LCK_CONVERT, as it is not used. When the
locking_type is 3, the degraded mode flag simply gets dropped and
the old ("complete") behavior is exhibited.
Change the help heading from 'Common Fields' to 'Special Fields' for
the fields: selected, help, ?
Remove the code that does 'all' processing with these special fields as
each of them changes the behaviour of the command in an undesirable way.
'lvs -o all,selected' was of course just printing help.
(via internal expansion to 'lv_all,common_all')
and if we ignored the help fields, then '-o common_all' would still
pull in 'selected' and change the way rows were output.
We have 1/"descriptive word"/"yes" for 1 and 0/"no" for 0.
For example (the new recognized values are "yes" and "no"):
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2
LV DevOpen
root open
swap open
lvol1 open
lvol2
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2 -S 'device_open=open'
LV DevOpen
root open
swap open
lvol1 open
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2 -S 'device_open=1'
LV DevOpen
root open
swap open
lvol1 open
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2 -S 'device_open=yes'
LV DevOpen
root open
swap open
lvol1 open
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2 -S 'device_open=0'
LV DevOpen
lvol2
$ lvs -o name,device_open fedora vg/lvol1 vg/lvol2 -S 'device_open=no'
LV DevOpen
lvol2
So all attribute reporting functions are all in one section of code
for quick orientation (all these functions are defined in the order
of their attribute character displayed in pv/vg/lv_attr field).
lv_active_{locally,remotely,exclusively} display the original
"lv_active" field in a more separate way so that we can create
selection criteria in a binary-based form (yes/no).
The macros for reserved value definition makes the process a bit easier,
but there's still a place for improvement and make this even more
transparent. We can optimize and provide better automatism here later on.
Also respect --binary arg and/or report/binary_values_as_numeric
when displaying unknown values. If textual form is used, use "unknown",
if numeric value is used, use "-1" (which we already use to denote
unknown numeric values in other reports like lv_kernel_major and
lv_kernel_minor).
This avoids creating void matchers which have no effect anyway and
they just use resources. Also, it makes lvm dumpconfig --type diff
to mark these settings properly as not being different from defaults
(where by default, devices/preferred_names as well as devices/filter
are void).
Also, add a few comments about builtin rules used to select device
alias in case preferred_names is not defined or it doesn't match
any of device aliases.
There was missing lv_info call for situations where there were
mixed PV/LV segment fields together with LVSINFO fields which
require extra lv_info call for LV device status. This ended up
with NULL lvinfo passed to the field reporting functions, hence
the segfault.
All binary attr fields have synonyms so selection criteria can use
either 0/1 or words to match against the field value (base type
for these binary fields is numeric one - DM_REPORT_FIELD_TYPE_NUMBER
so words are registered as reserved values):
pv_allocatable - "allocatable"
pv_exported - "exported"
pv_missing - "missing"
vg_extendable - "extendable"
vg_exported - "exported"
vg_partial - "partial"
vg_clustered - "clustered"
lv_initial_image_sync - "initial image sync", "sync"
lv_image_synced_names - "image synced", "synced"
lv_merging_names - "merging"
lv_converting_names - "converting"
lv_allocation_locked - "allocation locked", "locked"
lv_fixed_minor - "fixed minor", "fixed"
lv_merge_failed - "merge failed", "failed"
For example, these three are all equivalent:
$ lvs -o name,fixed_minor -S 'fixed_minor=fixed'
LV FixMin
lvol8 fixed minor
$ lvs -o name,fixed_minor -S 'fixed_minor="fixed minor"'
LV FixMin
lvol8 fixed minor
$ lvs -o name,fixed_minor -S 'fixed_minor=1'
LV FixMin
lvol8 fixed minor
The same with binary output - it has no effect on this functionality:
$ lvs -o name,fixed_minor --binary -S 'fixed_minor=fixed'
LV FixMin
lvol8 1
$ lvs -o name,fixed_minor --binary -S 'fixed_minor="fixed
minor"'
LV FixMin
lvol8 1
[1] f20/~ # lvs -o name,fixed_minor --binary -S 'fixed_minor=1'
LV FixMin
lvol8 1
In contrast to per-type reserved values that are applied for all fields
of that type, per-field reserved values are only applied for concrete
field only.
Also add 'struct dm_report_field_reserved_value' to libdm for per-field
reserved value definition. This is defined by field number (an index
in the 'fields' array which is given for the dm_report_init_with_selection
function during report initialization) and the value to use for any
of the specified reserved names.
The --binary option, if used, causes all the binary values reported
in reporting commands to be displayed as "0" or "1" instead of descriptive
literal values (value "unknown" is still used for values that could not be
determined).
Also, add report/binary_values_as_numeric lvm.conf option with the same
functionality as the --binary option (the --binary option prevails
if both --binary cmd option and report/binary_values_as_numeric lvm.conf
option is used at the same time). The report/binary_values_as_numeric is
also profilable.
This makes it easier to use and check lvm reporting command output in scripts.
Physical Volume Fields:
pv_allocatable - Whether this device can be used for allocation.
pv_exported - Whether this device is exported.
pv_missing - Whether this device is missing in system.
Volume Group Fields:
vg_permissions - VG permissions.
vg_extendable - Whether VG is extendable.
vg_exported - Whether VG is exported.
vg_partial - Whether VG is partial.
vg_allocation_policy - VG allocation policy.
vg_clustered - Whether VG is clustered.
Logical Volume Fields:
lv_volume_type - LV volume type.
lv_initial_image_sync - Whether mirror/RAID images underwent initial resynchronization.
lv_image_synced - Whether mirror/RAID image is synchronized.
lv_merging - Whether snapshot LV is being merged to origin.
lv_converting - Whether LV is being converted.
lv_allocation_policy - LV allocation policy.
lv_allocation_locked - Whether LV is locked against allocation changes.
lv_fixed_minor - Whether LV has fixed minor number assigned.
lv_merge_failed - Whether snapshot merge failed.
lv_snapshot_invalid - Whether snapshot LV is invalid.
lv_target_type - Kernel target type the LV is related to.
lv_health_status - LV health status.
lv_skip_activation - Whether LV is skipped on activation.
Logical Volume Info Fields
lv_permissions - LV permissions.
lv_suspended - Whether LV is suspended.
lv_live_table - Whether LV has live table present.
lv_inactive_table - Whether LV has inactive table present.
lv_device_open - Whether LV device is open.
LVSINFO is exactly the same as existing LVS report type,
but it has the "struct lvinfo" populated in addition for
use - this is useful for fields that display the status
of the LV device itself (e.g. suspended state, tables
present/missing...).
Currently, such properties are reported within the "lv_attr"
field so separation is unnecessary - the "lvinfo" call
to populate the "struct lvinfo" is directly a part of the
field reporting function - _lvstatus_disp/lv_attr_dup.
With upcoming patches, we'd like the lv_attr field bits
to be separated into their own fields. To avoid calling
"lvinfo" fn as many times as there are fields requiring
the "lv_info" structure to be populated while reporting
one row related to one LV, we're separating former LVS
into LVS and LVSINFO report type. With this, there's
just one "lvinfo" call for one report row and LV reporting
fields will take the info needed from this struct then,
hence reusing it and not calling "lvinfo" fn on their own.
The get_lv_type_name helps with translating volume type
to human readable form (can be used in reports or
various messages if needed).
The lv_is_linear and lv_is_striped complete the set of
lv_is_* functions that identify exact volume types.
Mention parent LV as well as the LV triggering the warning.
Still leaves some confusing cases but its not worth fixing them
at the moment.
(Thin pool inactive but a thin volume active => deactivate thin vol.
Inactive mirror/raid with pvmove in progress => complete pvmove and
active&deactivate mirror/raid.
If new VG already exists it requires some LVs to be inactive
unnecessarily.)
Support remove of thin volumes With --force --force
when thin pools is damaged.
This way it's possible to remove thin pool with
unrepairable metadata without requiring to
manually edit lvm2 metadata.
lvremove -ff vg/pool
removes all thin volumes and pool even when
thin pool cannot be activated (to accept
removal of thin volumes in kernel metadata)
mirror_or_raid_type_requested really checks for mirror type.
Convert macros mirror_or_raid_type_requested() and
snapshot_type_requested() into inline functions.
Use builddir not srcdir with make pofile.
Append 'incfile:' lines to %.d files to handle newly-missing dependencies
without 'make clean' after a file is moved or deleted.
Using suffixes for mirrors and raids will need more work,
before this could be enabled.
Meanwhile revert to previous behavior.
Keep suffixes for thins and caches.
Since vg_name inside /lib function has already been ignored mostly
except for a few debug prints - make it and official internal API
feature.
vg_name is used only in /tools while the VG is not yet openned,
and when lvresize/lvcreate /lib function is called with VG pointer
already being used, then vg_name becomes irrelevant (it's not been
validated anyway).
So any internal user of lvcreate_params and lvresize_params does not
need to set vg_name pointer and may leave it NULL.
Use suffixes for easier detection of private volumes.
This commit makes older volume UUIDs incompatible and
it most probably needs machine reboot after upgrade.
When creating pool's metadata - create initial LV for clearing with some
generic name and after the volume is create & cleared - rename it to
reserved name '_tmeta/_cmeta'.
We should not expose 'reserved' names for public LVs.
When repairing RAID LVs that have multiple PVs per image, allow
replacement images to be reallocated from the PVs that have not
failed in the image if there is sufficient space.
This allows for scenarios where a 2-way RAID1 is spread across 4 PVs,
where each image lives on two PVs but doesn't use the entire space
on any of them. If one PV fails and there is sufficient space on the
remaining PV in the image, the image can be reallocated on just the
remaining PV.
Previously, the seg_pvs used to track free and allocated space where left
in place after 'release_pv_segment' was called to free space from an LV.
Now, an attempt is made to combine any adjacent seg_pvs that also track
free space. Usually, this doesn't provide much benefit, but in a case
where one command might free some space and then do an allocation, it
can make a difference. One such case is during a repair of a RAID LV,
where one PV of a multi-PV image fails. This new behavior is used when
the replacement image can be allocated from the remaining space of the
PV that did not fail. (First the entire image with the failed PV is
removed. Then the image is reallocated from the remaining PVs.)
I've changed build_parallel_areas_from_lv to take a new parameter
that allows the caller to build parallel areas by LV vs by segment.
Previously, the function created a list of parallel areas for each
segment in the given LV. When it came time for allocation, the
parallel areas were honored on a segment basis. This was problematic
for RAID because any new RAID image must avoid being placed on any
PVs used by other images in the RAID. For example, if we have a
linear LV that has half its space on one PV and half on another, we
do not want an up-convert to use either of those PVs. It should
especially not wind up with the following, where the first portion
of one LV is paired up with the second portion of the other:
------PV1------- ------PV2-------
[ 2of2 image_1 ] [ 1of2 image_1 ]
[ 1of2 image_0 ] [ 2of2 image_0 ]
---------------- ----------------
Previously, it was possible for this to happen. The change makes
it so that the returned parallel areas list contains one "super"
segment (seg_pvs) with a list of all the PVs from every actual
segment in the given LV and covering the entire logical extent range.
This change allows RAID conversions to function properly when there
are existing images that contain multiple segments that span more
than one PV.
If a RAID LV has images that are spread across more than one PV
and you allocate a new image that requires more than one PV,
parallel_areas is only honored for one segment. This commit
adds a test for this condition.
...to avoid using cached value (persistent filter) and therefore
not noticing any change made after last scan/filtering - the state
of the device may have changed, for example new signatures added.
$ lvm dumpconfig --type diff
allocation {
use_blkid_wiping=0
}
devices {
obtain_device_list_from_udev=0
}
$ cat /etc/lvm/cache/.cache | grep sda
$ vgscan
Reading all physical volumes. This may take a while...
Found volume group "fedora" using metadata type lvm2
$ cat /etc/lvm/cache/.cache | grep sda
"/dev/sda",
$ parted /dev/sda mklabel gpt
Information: You may need to update /etc/fstab.
$ parted /dev/sda print
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 134MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
$ cat /etc/lvm/cache/.cache | grep sda
"/dev/sda",
====
Before this patch:
$ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
With this patch applied:
$ pvcreate /dev/sda
Physical volume /dev/sda not found
Device /dev/sda not found (or ignored by filtering).
A field where it has no meaning to do any type of comparison is the
implicit "help" or "?" field. The error given was a bit cryptic
before this patch, the FLD_UNCOMPARABLE flag makes it easier to identify
this situation anywhere in the code and provide much better error message.
This flag can be applied to other fields that may appear in the future -
mostly usable for implicit fields as they always have special purpose
(so we're not exporting it in libdevmapper for now - usual reporting
fields don't need this).
Before this patch:
$ vgs -S help=1
dm_report_object: no data assigned to field help
dm_report_object: no data assigned to field help
(...which is true actually, but let's provide something better...)
With this patch applied:
$vgs -S help=1
Selection field is uncomparable: help.
Selection syntax error at 'help=1'.
$vgs -S '(name=vg && help=1) || vg_size > 1g'
Selection field is uncomparable: help.
Selection syntax error at 'help=1) || vg_size > 1g'.
Take a local file lock to prevent concurrent activation/deactivation of LVs.
Thin/cache types and an extension for cluster support are excluded for
now.
'lvchange -ay $lv' and 'lvchange -an $lv' should no longer cause trouble
if issued concurrently: the new lock should make sure they
activate/deactivate $lv one-after-the-other, instead of overlapping.
(If anyone wants to experiment with the cluster patch, please get in touch.)
'lvs' would segfault if trying to display the "move pv" if the
pvmove was run with '--atomic'. The structure of an atomic pvmove
is different and requires us to descend another level in the
LV tree to retrieve the PV information.
In 'find_pvmove_lv', separate the code that searches the atomic
pvmove LVs from the code that searches the normal pvmove LVs. This
cleans up the segment iterator code a bit.
It's better to have implicit fields at the very end of the output
so users can see them without scrolling back if the list of fields
is long (the "help" is also an implicit field now so it should be
easily visible).
We have "help" and "?" defined as implicit fields now. As such, we
don't need to export these names in libdevmapper (as it was introduced
by commit 7c86131233 within this release).
If anyone uses these field names by mistake, the libdevmapper code can
error out correctly if it detects that the set of explicit field names
(the ones supplied by "fields" arg in dm_report_init/dm_report_init_with_selection)
contains any of the implicit field names (the ones defined internally
by libdevmapper itself).
Making "help" and "?" implicit also simplifies code since the
dm_report_init caller (lvm/dmsetup) doesn't need to check on
dm_report_init return whether "help" or "?" was hit while parsing
fields/sort keys in libdevmapper.
The libdevmapper now sets internal "RH_ALREADY_REPORTED" flag
after it reports the "help" or "?" implicit field. Then libdevmapper
itself checks for this flag in dm_report_object and if found,
the actual reporting is skipped (because the "help" implicit field
was reported instead of the actual report).
replicator/replicator.c:338:2: warning: passing argument 2 of 'build_dm_uuid' from incompatible pointer type [enabled by default]
replicator/replicator.c:629:3: warning: passing argument 2 of 'build_dm_uuid' from incompatible pointer type [enabled by default]
replicator/replicator.c:644:6: warning: passing argument 2 of 'build_dm_uuid' from incompatible pointer type [enabled by default]
replicator/replicator.c:668:7: warning: passing argument 2 of 'build_dm_uuid' from incompatible pointer type [enabled by default]
replicator/replicator.c:677:4: warning: passing argument 2 of 'build_dm_uuid' from incompatible pointer type [enabled by default]
Fix gcc warnings:
libdm-report.c:1952:5: warning: "end_op_flag_hit" may be used uninitialized in this function [-Wmaybe-uninitialized]
libdm-report.c:2232:28: warning: "custom" may be used uninitialized in this function [-Wmaybe-uninitialized]
And snap_percent is not 0% in dm < 1.10.0 so
don't test comparison with 0% here.
Ordering string list items on reports is also new compared to
previous state where items were not ordered at all and they
got reported simply as they appeared/were processed.
pvmove can be used to move single LVs by name or multiple LVs that
lie within the specified PV range (e.g. /dev/sdb1:0-1000). When
moving more than one LV, the portions of those LVs that are in the
range to be moved are added to a new temporary pvmove LV. The LVs
then point to the range in the pvmove LV, rather than the PV
range.
Example 1:
We have two LVs in this example. After they were
created, the first LV was grown, yeilding two segments
in LV1. So, there are two LVs with a total of three
segments.
Before pvmove:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
After pvmove inserts the temporary pvmove LV:
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
-------------------------------------
PV | 000 - 255 | 256 - 511 | 512 - 767 |
-------------------------------------
Each of the affected LV segments now point to a
range of blocks in the pvmove LV, which purposefully
corresponds to the segments moved from the original
LVs into the temporary pvmove LV.
The current implementation goes on from here to mirror the temporary
pvmove LV by segment. Further, as the pvmove LV is activated, only
one of its segments is actually mirrored (i.e. "moving") at a time.
The rest are either complete or not addressed yet. If the pvmove
is aborted, those segments that are completed will remain on the
destination and those that are not yet addressed or in the process
of moving will stay on the source PV. Thus, it is possible to have
a partially completed move - some LVs (or certain segments of LVs)
on the source PV and some on the destination.
Example 2:
What 'example 1' might look if it was half-way
through the move.
--------- --------- ---------
| LV1s0 | | LV2s0 | | LV1s1 |
--------- --------- ---------
| | |
-------------------------------------
pvmove0 | seg 0 | seg 1 | seg 2 |
-------------------------------------
| | |
| -------------------------
source PV | | 256 - 511 | 512 - 767 |
| -------------------------
| ||
-------------------------
dest PV | 000 - 255 | 256 - 511 |
-------------------------
This update allows the user to specify that they would like the
pvmove mirror created "by LV" rather than "by segment". That is,
the pvmove LV becomes an image in an encapsulating mirror along
with the allocated copy image.
Example 3:
A pvmove that is performed "by LV" rather than "by segment".
--------- ---------
| LV1s0 | | LV2s0 |
--------- ---------
| |
-------------------------
pvmove0 | * LV-level mirror * |
-------------------------
/ \
pvmove_mimage0 / pvmove_mimage1
------------------------- -------------------------
| seg 0 | seg 1 | | seg 0 | seg 1 |
------------------------- -------------------------
| | | |
------------------------- -------------------------
| 000 - 255 | 256 - 511 | | 000 - 255 | 256 - 511 |
------------------------- -------------------------
source PV dest PV
The thing that differentiates a pvmove done in this way and a simple
"up-convert" from linear to mirror is the preservation of the
distinct segments. A normal up-convert would simply allocate the
necessary space with no regard for segment boundaries. The pvmove
operation must preserve the segments because they are the critical
boundary between the segments of the LVs being moved. So, when the
pvmove copy image is allocated, all corresponding segments must be
allocated. The code that merges ajoining segments that are part of
the same LV when the metadata is written must also be avoided in
this case. This method of mirroring is unique enough to warrant its
own definitional macro, MIRROR_BY_SEGMENTED_LV. This joins the two
existing macros: MIRROR_BY_SEG (for original pvmove) and MIRROR_BY_LV
(for user created mirrors).
The advantages of performing pvmove in this way is that all of the
LVs affected can be moved together. It is an all-or-nothing approach
that leaves all LV segments on the source PV if the move is aborted.
Additionally, a mirror log can be used (in the future) to provide tracking
of progress; allowing the copy to continue where it left off in the event
there is a deactivation.
With recent changes introduced with the report selection support,
the content of lv_modules field is of string list type (before
it was just string type).
String list elements are always ordered now so update lvcreate-thin
test to expect the elements to be ordered.
The differentiation of the original number field into number, size and
percent field types has been introduced with recent changes for report
selection support.
Implicit fields are fields that are registered with the report
and reported internally by libdevmapper itself (compared to explicit
fields that are registered by the layer above libdevmapper - e.g. LVM,
dmsetup...).
The "selected" field is the implicit field (for now the only one)
that reports the result of the selection. Since the selection itself
is the property of the libdevmapper, the upper layer using dm_report_init
can't register this field itself and it must be done directly at
libdevmapper layer.
The "selected" field is internally registered as part of the "common"
report type with id 0x80000000 (the last bit in uin32_t) which is then
reserved (the explicit report types are then checked if they do not
contain this id and if yes, we error out).
This way, the "selected" field is recognized by all libdevmapper users
that initialize the reporting with "dm_report_init_with_selection".
If reporting is initialized with the classical "dm_report_init",
there's no functional change (so the "selected" field is not defined
and it's not recognized).
Make dm_report_init_with_selection to accept an argument with an
array of reserved values where each element contains a triple:
{dm report field type, reserved value, array of strings representing this value}
When the selection is parsed, we always check whether a string
representation of some reserved value is not hit and if it is,
we use the reserved value assigned for this string instead of
trying to parse it as a value of certain field type.
This makes it possible to define selections like:
... --select lv_major=undefined (or -1 or unknown or undef or whatever string representations are registered for this reserved value in the future)
... --select lv_read_ahead=auto
... --select vg_mda_copies=unmanaged
With this, each time the field value of certain type is hit
and when we compare it with the selection, we use the proper
value for comparison.
For now, register these reserved values that are used at the moment
(also more descriptive names are used for the values):
const uint64_t _reserved_number_undef_64 = UINT64_MAX;
const uint64_t _reserved_number_unmanaged_64 = UINT64_MAX - 1;
const uint64_t _reserved_size_auto_64 = UINT64_MAX;
{
{DM_REPORT_FIELD_TYPE_NUMBER, _reserved_number_undef_64, {"-1", "undefined", "undef", "unknown", NULL}},
{DM_REPORT_FIELD_TYPE_NUMBER, _reserved_number_unmanaged_64, {"unmanaged", NULL}},
{DM_REPORT_FIELD_TYPE_SIZE, _reserved_size_auto_64, {"auto", NULL}},
NULL
}
Same reserved value of different field types do not collide.
All arrays are null-terminated.
The list of reserved values is automatically displayed within
selection help output:
Selection operands
------------------
...
Reserved values
---------------
-1, undefined, undef, unknown - Reserved value for undefined numeric value. [number]
unmanaged - Reserved value for unmanaged number of metadata copies in VG. [number]
auto - Reserved value for size that is automatically calculated. [size]
Selection operators
-------------------
...
When the field list is displayed as help for constructing selection
criteria, show also the field value type. This is useful for users
to know what set of operators are allowed for the type - the subsequent
"Selection operands" section in the help output summarize all known
types that can be used in selection.
The "<lvm command> -S/--select help" shows help (including list of fields to match against):
...field list here including the field type name...
Selection operands
------------------
field - Reporting field.
number - Non-negative integer value.
size - Floating point value with units specified.
string - Characters quoted by ' or " or unquoted.
string list - Strings enclosed by [ ] and elements delimited by either
"all items must match" or "at least one item must match" operator.
regular expression - Characters quoted by ' or " or unquoted.
Selection operators
-------------------
Comparison operators:
=~ - Matching regular expression.
!~ - Not matching regular expression.
= - Equal to.
!= - Not equal to.
>= - Greater than or equal to.
> - Greater than
<= - Less than or equal to.
< - Less than.
Logical and grouping operators:
&& - All fields must match
, - All fields must match
|| - At least one field must match
# - At least one field must match
! - Logical negation
( - Left parenthesis
) - Right parenthesis
[ - List start
] - List end
Selection list items are enclosed in '[' and ']' (if there's only
one item, the '[' and ']' can be omitted). Each element of the list
is a string (either quoted or unquoted, like the usual string operand
used in selection) and each element is delimited either by conjunction
(meaining "match all") or disjunction operator (meaning "match any").
For example, if "," is the conjuction operator and "/" is the
disjunction operator then:
lv_tags=[a,b,c]
...will match all fields where tags contain *all* a, b and c.
lv_tags=[a/b/c]
...will match all fields where tags contain *any* of a, b, or c.
Mixing operators within the list is not supported:
lv_tags=[a,b/c]
...will give an error.
The order in which items are defined in the selection do not matter.
This patch enhances the selection parsing functionality to recognize
such lists.
The {pv,vg,lv,seg}_tags and lv_modules fields are reported as string
lists using the new dm_report_field_string_list - so we just pass
the list to the fn that takes care of reporting and item sorting itself.
Add a separate dm_report_field_string_list fn to libdevmapper to
support reporting string lists. Before, the code used libdevmappers's
dm_report_field_string fn which required formatting the list to a
single string. This functionality is now moved to libdevmapper
and the code that needs to report the string list just needs
to pass the list itself and libdevmapper will take care of this.
This also enhances code reuse.
The dm_report_field_string_list also accepts an argument to define
custom delimiter to use. If not defined, a default "," (comma) is
used as item delimiter in the string list reported.
The dm_report_field_string_list automatically sorts the items in
the list before formatting it to a final string. It also encodes
the position and length within the final string where each element
can be found. This can be used to support checking against each
list item reported since since when formatted as a single string
for the actual report, we would lose this information otherwise
(we don't want to copy each item, the position and length within
the final string is enough for us to get the original items back).
When such lists are checked against the selection tree, we can check
each item individually this way and we can support operators like
"match any" and "match all".
The list of strings is used quite frequently and we'd like to reuse
this simple structure for report selection support too. Make it part
of libdevmapper for general reuse throughout the code.
This also simplifies the LVM code a bit since we don't need to
include and manage lvm-types.h anymore (the string list was the
only structure defined there).
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
The dm_report_init_with_selection is the same as dm_report_init
but it contains an additional argument to set the selection
in the form of a string that contains field names to check against and
selection operators. The selection string is parsend and a selection
tree is composed for use in the checks against individual fields when
the report is processed. The parsed selection tree is stored in dm_report
structure as "selection_root".
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
Add support for parsing numbers, strings (quoted or unquoted), regexes
and operators amogst these operands in selection condition supplied.
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
This patch defines operators and structures that will be used
to store the report selection against which the actual values
reported will be checked.
Selection operators
-------------------
Comparison operators:
=~ - Matching regular expression.
!~ - Not matching regular expression.
= - Equal to.
!= - Not equal to.
>= - Greater than or equal to.
> - Greater than
<= - Less than or equal to.
< - Less than.
Logical and grouping operators:
&& - All fields must match
, - All fields must match
|| - At least one field must match
# - At least one field must match
! - Logical negation
( - Left parenthesis
) - Right parenthesis
This makes it easier to check against the fields (following patches for
report selection) and check whether size units are allowed or not
with the field value.
Use NAME_LEN constant to simplify creation of device name.
Since the max size should be already tested in validation,
throw INTERNAL_ERROR if the size of vg/lv is bigger then NAME_LEN.
Let's use the size of origin as the real base for percenta calculation,
and 'silenly' add needed metadata space for snapshot.
So now command 'lvcreate -s -l100%ORIGIN vg/lv' should always create a
snapshot to handle full device overwrite.
Expresing -lXX%LV is not valid for snapshot, but error message for
snapshost case was not complete and missed %ORIGIN.
Also document correct settings for in manpage properly where
it missed %PVS.
While polling for snapshot, detect first the snapshot still
exits. It's valid to have multiple polling threads watching
for the same thing and just 1 can 'win' the finish part.
All others should nicely 'fail'.
If the we are polling an LV due to some sort of conversion and it
becomes inactive, a rather worrisome message is produced, e.g.:
" ABORTING: Mirror percentage check failed."
We can cleanly exit if we do a simple check to see if the LV is
active before performing the check. This eliminates the scary
message.
When creating a cache LV with a RAID origin, we need to ensure that
the sub-LVs of that origin properly change their names to include
the "_corig" extention of the top-level LV. We do this by first
performing a 'lv_rename_update' before making the call to
'insert_layer_for_lv'.
The thin-generic.profile contains settings for thin/thin pool volumes
suitable for generic environment/use containing default settings.
This allows users to change the global lvm.conf settings at will
and still keep the original settings for volumes that have this
thin profile assigned already.
Internal reporting function cannot handle NULL reporting value,
so ensure there is at least dummy label.
So move dummy_lable from tools/reporter.c and use it for all
report_object() calls in lib/report/report.c.
(Fixes RHBZ 1108394)
Simlify lvm_report_object initialization.
More updates to manglename option.
Add reference to LVM2 resource page, since for a long time,
this is the right places for sources for libdevmapper....
Add more time for tests, since debug kernels are getting slower...
and we add more and more tests.
However many test should be shortened to avoid testing disk-perfomance
and focus on lvm functionality...
(Often we should probably test with inactive volumes when we check
metadata operation of lvm2)
We may need to support option for 'DEEP' longer testing.
Also something like LVM_TEST_TIMEOUT_FACTOR might be useful
though it would be much better if test suite could approximete
from system perfomance test lenght...
Enable 'retry' deactivation also in 'cleanup' phase.
It shouldn't be mostly needed - however udev now produces
more and more completelny non-synchronizable device opens,
so even for orphan devices we can't easily predict where
udevd opens devices.
So it's more preferable here to log error about device being open
and retry clean, but let the command proceed.
And use ifdefs there, not exposing it in the tool code itself.
Later in the future, we should probably make the PIDFILE and
daemon checking code available also in case the daemon itself
is not built.
If the user supplies a '--yes' argument, then don't bother them with
a question to confirm whether to change the cluster attribute (even
if clvmd isn't running).
If clvmd is not running or the locking type is not clustered and someone
attempts to set the cluster attribute on a volume group, prompt them to
see if they are sure. (Only prompt for one though. If neither are true,
simply ask them once.)
Instead of rereading device list via cat - keep
the list in bash array. (Also solves problem
with spaces in device path)
Move usage of "$path" out of lvm shell usage,
since we don't support such thing there...
Replace AC_PATH_PROG with AC_PATH_TOOL.
Drop 'x' when already using "" around shell variable.
Simlify some long line and ifs.
Merge multiple test evaluation with '-a', '-o'.
Use 'case' instead if several ifs when it's more elegant.
Improve usage of pkg_config_init and add it where it's been missing.
Check for UDEV_HAS_BUILTIN_BLKID and when building udev-rules.
Since we advertise 'none' as mangling name, accept it.
Keep it backward compatible and leave disabled option still working
(though I guess there is likely no user of this option...)
When directly corrupting RAID images for the purpose of testing,
we must use direct I/O (or a 'sync' after the 'dd') to ensure that
the writes are not caught in the buffer cache in a way that is not
reachable by the top-level RAID device.
As part of better error handling, remove DM devices that have been
sucessfully created but failed to load a table. This can happen
when pvmove'ing in a cluster and the cluster mirror daemon is not
running on a remote node - the mapping table failing to load as a
result. In this case, any revert would work on other nodes running
cmirrord because the DM devices on those nodes did succeed in loading.
However, because no table was able to load on the non-cmirrord nodes,
there is no table present that points to what needs to be reverted.
This causes the empty DM device to remain on the system without being
present in any LVM representation.
This patch should only be considered a partial fix to the overall
problem. This is because only the device which failed to load a
table is removed. Any LVs that may have been loaded as requirements
to the DM device that failed to load may be left in place. Complete
clean-up will require tracking those devices which have been created
as dependencies and removing them along with the device that failed
to load a table.
Accidently it's been commited - but it has also shown,
that on heavy loaded systems (like our test machine could be)
slightly bigger timeouts which waits longer for udev rules
processing does help and avoids occasional refuse of deactivation
because device is still being open.
(i.e. lvcreate...; lvchange -an...)
Unsure how we could now synchronize for this. On very slow(/loaded)
system 5 second timeout is simply not enough.
TODO: introduce at least lvm.conf configurable setting to
allow longer 'retry' loops.
Unsure if this is feature or bug of syncaction,
but it needs to be present physically on the media
and it ignores content of buffer cache...
(maybe lvchange should implicitely fsync all disks
that are members of raid array before starting test??)
Check we know how to handle same UUID
Test currently does NOT work on lvmetad
(or it's unclear it even should - thus test error
is currently lowered to 'test warning')
TODO: replace lib/test with a better shell script name
Reindent lv_check_not_in_use to simplify internal loop code.
Also return always '0/1' (drop -1) - since we only
check for failure (0) - and we don't really know
why lv_info() has failed.
disable_dev can't use transaction - since it may lead occasionaly to
weird error - example could be nomda-missing.sh test case.
Here occasionaly device instead of being removed was left as
error device and testing different code path (which is unfortunatelly
buggy)
When we want to test 'error' device - 'aux error_dev()' should be used.
Warn when --udevcookie/DM_UDEV_COOKIE is used with 'dmsetup remove --force'.
When command is doing multiple ioctl operations on a single device,
it may invoke udev activity, that is colliding with further ioctl commands.
The result of such operation becomes unpredictable.
Use of --retry could partially help...
Disable code which has postprocessed whole tree and reset udev flags.
We need to find out which case was troublesome - since this loop
was just hidding bug in other code parts (most probably preload tree)
LVM has restricter character set that is allowed for VG-LV names
and the dm names constructed do not contain any blacklisted characters
that would require name mangling.
Also, when any other device-mapper device is scanned that could
possibly contain such blacklisted characters, we reference the
device by its major:minor instead of dm name (e.g. _device_is_usable fn).
The "default.profile" name was misleading. It's actually a helper
*template* that can be used for copying and further editing to create
a new profile.
Also, we have separate command and metadata profiles now so the templates
are separated as well - we can't mix profile settings from one group with
another - such profile is rejected by lvm tools.
Delay udev notification after the point udev transaction
is finished - since otherwise some device may still
be found missing until udev transaction is finished.
When MAN7, MAN8CLUSTER or MAN8SYSTEMD_GENERATORS would be empty,
don't call respective INSTALL tools.
On older systems they even generate error causing abort
of makefile target.
Warn user before converting volume to different type.
WARNING: Converting vg/lvol0 logical volume to pool's meta/data volume.
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Since the content of volume is lost we have to query user to confirm
such operation. If user is 100% sure, he may use '--yes' to avoid prompts.
The dumpconfig now understands --commandprofile/--profile/--metadataprofile
The --commandprofile and --profile functionality is almost the same
with only one difference and that is that the --profile is just used
for dumping the content, it's not applied for the command itself
(while the --commandprofile profile is applied like it is done for
any other LVM command).
We also allow --metadataprofile for dumpconfig - dumpconfig *does not*
touch VG/LV and metadata in any way so it's OK to use it here (just for
dumping the content, checking the profile validity etc.).
The validity of the profile can be checked with:
dumpconfig --commandprofile/--profile/--metadataprofile --validate
...depending on the profile type.
Also, mention --config in the dumpconfig help string so users know
that dumpconfig handles this too (it did even before, but it was not
documented in the help string).
- When defining configuration source, the code now uses separate
CONFIG_PROFILE_COMMAND and CONFIG_PROFILE_METADATA markers
(before, it was just CONFIG_PROFILE that did not make the
difference between the two). This helps when checking the
configuration if it contains correct set of options which
are all in either command-profilable or metadata-profilable
group without mixing these groups together - so it's a firm
distinction. The "command profile" can't contain
"metadata profile" and vice versa! This is strictly checked
and if the settings are mixed, such profile is rejected and
it's not used. So in the end, the CONFIG_PROFILE_COMMAND
set of options and CONFIG_PROFILE_METADATA are mutually exclusive
sets.
- Marking configuration with one or the other marker will also
determine the way these configuration sources are positioned
in the configuration cascade which is now:
CONFIG_STRING -> CONFIG_PROFILE_COMMAND -> CONFIG_PROFILE_METADATA -> CONFIG_FILE/CONFIG_MERGED_FILES
- Marking configuration with one or the other marker will also make
it possible to issue a command context refresh (will be probably
a part of a future patch) if needed for settings in global profile
set. For settings in metadata profile set this is impossible since
we can't refresh cmd context in the middle of reading VG/LV metadata
and for each VG/LV separately because each VG/LV can have a different
metadata profile assinged and it's not possible to change these
settings at this level.
- When command profile is incorrect, it's rejected *and also* the
command exits immediately - the profile *must* be correct for the
command that was run with a profile to be executed. Before this
patch, when the profile was found incorrect, there was just the
warning message and the command continued without profile applied.
But it's more correct to exit immediately in this case.
- When metadata profile is incorrect, we reject it during command
runtime (as we know the profile name from metadata and not early
from command line as it is in case of command profiles) and we
*do continue* with the command as we're in the middle of operation.
Also, the metadata profile is applied directly and on the fly on
find_config_tree_* fn call and even if the metadata profile is
found incorrect, we still need to return the non-profiled value
as found in the other configuration provided or default value.
To exit immediately even in this case, we'd need to refactor
existing find_config_tree_* fns so they can return error. Currently,
these fns return only config values (which end up with default
values in the end if the config is not found).
- To check the profile validity before use to be sure it's correct,
one can use :
lvm dumpconfig --commandprofile/--metadataprofile ProfileName --validate
(the --commandprofile/--metadataprofile for dumpconfig will come
as part of the subsequent patch)
- This patch also adds a reference to --commandprofile and
--metadataprofile in the cmd help string (which was missing before
for the --profile for some commands). We do not mention --profile
now as people should use --commandprofile or --metadataprofile
directly. However, the --profile is still supported for backward
compatibility and it's translated as:
--profile == --metadataprofile for lvcreate, vgcreate, lvchange and vgchange
(as these commands are able to attach profile to metadata)
--profile == --commandprofile for all the other commands
(--metadataprofile is not allowed there as it makes no sense)
- This patch also contains some cleanups to make the code handling
the profiles more readable...
Mark profilable settings with a separate CFG_PROFILABLE_METADATA
flag where the profile can be attached to VG/LV. This makes it possible
to differentiate global command-profilable settings (CFG_PROFILABLE flag)
and contextual metadata-profilable (per VG/LV) settings (CFG_PROFILABLE_METADATA flag).
The dumpconfig reuses existing config_def_check results in case
the check is done during general lvm command context initialization
(when enabled by config/checks=1) so dumpconfig does not need to run
the same check again during its execution, hence saving some time.
However, we don't check for differences from defaults during general
lvm command initialization as it's useless at that time. It makes
sense only in case when such a check is directly requested (like in
the case of lvm dumpconfig --type diff). We need to take care that
the reused information was already produced with this "diff" checking
before and if not, we need to force the check so the check status also
gathers the new "diff" info now.
Also, do not do diff checking for any other dumpconfig command that
is run after dumpconfig --type diff.
When cmd refresh is called, we need to move any already loaded profiles
to profiles_to_load list which will cause their reload on subsequent
use. In addition to that, we need to take into account any change
in config/profile configuration setting on cmd context refresh
since this setting could be overriden with --config.
Also, when running commands in the shell, we need to remove the
global profile used from the configuration cascade so the profile
is not incorrectly reused next time when the --profile option is
not specified anymore for the next command in the shell.
This bug only affected profile specified by --profile cmd line
arg, not profiles referenced from LVM metadata.
Before, the cft_check_handle used to direct configuration checking
was part of cmd_context. It's better to attach this as part of the
exact config tree against which the check is done. This patch moves
the cft_check_handle out of cmd_context and it attaches it to the
config tree directly as dm_config_tree->custom->config_source->check_handle.
This change makes it easier to track the config tree check results
and provides less space for bugs as the results are directly attached
to the tree and we don't need to be cautious whether the global value
is correct or not (and whether it needs reinitialization) as it was
in the case when the cft_check_handle was part of cmd_context.
Add CONFIG_FILE_SPECIAL config source id to make a difference between
real configuration tree (like lvm.conf and tag configs) and special purpose
configuration tree (like LVM metadata, persistent filter).
This makes it easier to attach correct customized data to the config
tree that is created out of the source then.
Since decisions in the silent mode may not be always obvious,
print skipped prompt with answer 'n'.
Also document '-qq' behaviour (single -q only shuts
logging, while -qq sets silent mode).
Share DM_REPORT_FIELD_RESERVED_NAME_{HELP,HELP_ALT} between libdm and
any libdm user to handle reserved field names, in this case the virtual
field name to show help instead of failing on unrecognized field.
The libdm user also needs to check the field name so it can fire
proper code in this case (cleanup, exit etc.).
Improve testing for needs_check - when old tools are installed,
issue proper warning from configure, but stop sending needs_check
flag to the old tool.
Support upto 3 levels os nesting signal blocking.
As of today - code blocks signals immediatelly when it opens
VG in read-write mode - this however makes current prompt usage
then partially unusable since user may not 'break' command
during prompt (something most user would expect).
Until a better fix for prompting is implemented, put in support
for signal nesting - thus when prompt enables signal acceptance,
make it possible to really break command at this point.
Adding log_sys_debug for eventual logging of system errors.
(Using debug level, since currently signal handling functions
do not fail when any error is encoutered).
The sysinit.target is ordered even before sockets.target and there
may be situations in which the lvmetad is needed early, for example
in rescue.target to activate some LVs on which mountpoints reside.
Also, like in the case of rescue.target, the sockets.target is not
pulled in (unless some other service pulls it in explicitly).
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1087586#c26
for the summary of the problem.
When quering for dmeventd monitoring status, check first
if lvm2 is configured to monitor to avoid unwanted start
of dmeventd process for answering monitoring status.
When showing ACTIVE status for snapshot's origin,
avoid testing all its snapshot - it's not useful
to tell user origin is inactivate, while it's clearly
available and running - just one of its snapshot leg
is invalid...
Relocate info from thin pool and thin volume segments
to proper code section for segments.
Add discards and thin count status info.
Info is shown with 'lvdisplay --maps' (like for other segments).
For percentage display we need -tpool - so check for layered
device presence here instead of plain pool device.
Also update 'info' - so when pool is 'available' we
display open count for -tpool device instead of mostly
irrelevant pool.
TODO: Maybe we should actually display this open info always?
(even when just -tpool is available, but pool is not)
Emphesize virtual extents for virtual LVs and for
those use 'Virtual extents' instead of 'Logical extents',
so it's immeditatelly visible, which extents do have
straighforward physical backend.
While the 'raid1/10' segment types were being handled inadvertently
by '_move_mirrors()', the parity RAIDs were not being properly checked
to ensure that the user had specified all necessary PVs when moving
them. Thus, internal errors were being triggered when only part of
a RAID LV was moved to the new VG. I've added a new function,
'_move_raid()', which properly checks over any affected RAID LVs and
ensures that all the necessary PVs are being moved.
vgsplit of RAID volumes was problematic because the metadata area
and data areas were always on the same PVs. This problem is similar
to one that was just fixed for mirrors that had log and images sharing
a PV (commit 9ac858f). We can now add RAID LVs to the tests for
vgsplit.
Note that there still seems to be an issue when specifying an
incomplete set of PVs when moving RAID LVs.
Given a named mirror LV, vgsplit will look for the PVs that compose it
and move them to a new VG. It does this by first looking at the log
and then the legs. If the log is on the same device as one of the mirror
images, a problem occurs. This is because the PV is moved to the new VG
as the log is processed and thus cannot be found in the current VG when
the image is processed. The solution is to check and see if the PV we are
looking for has already been moved to the new VG. If so, it is not an
error.
ignore_suspended_devices=0 is already used in lvm.conf we distribute,
but it was still "1" in the code (so it was used when lvm.conf value
was not defined). It should be "0" too.
- better add reference to lvm dumpconfig --type default
than stating that lvmetad is not enabled by default
- substitute #DEFAULT_PID_DIR# with concrete value
Switch to allocate buffer from heap, since it might be potentially
bigger when extremaly large set of volumes would be monitored.
In case of allocation failure send ENOMEM message.
Also implicitelly ignore msg->size when msg->data is NULL.
Ensure daemon-io.h is used as a generic header included
with configure defines before other headers.
(In future all lvm2 libraries should settle on a single lib.h header)
Rename couple defines to better match header file names.
Config variables that are processed during setup prior to calling into
particular tools must not be accessed directly afterwards in case the
values already got overridden.
_process_config() already used the tests I'm removing here to call
lvmetad_set_active() and set up lvmetad_used().
This should be the preferred way of configuring lvm2 for udev/systemd
since otherwise one can end up with the processes run from udev (the
pvscan we run for lvmetad update on events) to be killed prematurely
and this can end up with LVM volumes not activated in the end.
This is sort of info we always ask people to retrieve when
inspecting problems in systemd environment so let's have this
as part of lvmdump directly.
The -s option does not need to be bound to systemd only. We could
add support for initscripts or any other system-wide/service tracking
info that can help us with debugging problems.
By default, the thin_pool_chunk_size is automatically calculated.
When defined, it disables the automatic calculation. So to be more
precise here, we should comment it out for the default.profile.
Also, "lvm dumpconfig --type profilable" was used here to generate
the default.profile content. This will be done automatically in the
future once we have the infrastructure for this in (see also
https://bugzilla.redhat.com/show_bug.cgi?id=1073415).
Perform two allocation attempts with cling if maximise_cling is set,
first with then without positional fill.
Avoid segfaults from confusion between positional and sorted sequential
allocation when number of stripes varies as reported here:
https://www.redhat.com/archives/linux-lvm/2014-March/msg00001.html
Set A_POSITIONAL_FILL if the array of areas is being filled
positionally (with a slot corresponding to each 'leg') rather
than sequentially (with all suitable areas found, to be sorted
and selected from).
Since the kill may take various amount of time,
(especially when running with valgrind)
check it's really pvmoved LV.
Restore initial restart of clvmd - it's currently
broken at various moments - basically killed lvm2
command may leave clvmd and confusing state leading
to reports of internal errors.
Prior adding new reply to the list, check
if the reply thread is not already finished.
In that case discard adding message
(which would otherwise be leaked).
Use mutex to access localsock values, so check
num_replies when the thread is not yet finished.
Check for threadid prior the mutex taking
(though this check is probably not really needed)
Added complexity with extra reply mutex is not worth the troubles.
The only place which may slightly benefit from this mutex is timeout
and since this is rather error case - let's convert it to
localsock.mutex and keep it simple.
Move the pthread mutex and condition creation and destroy
to correct place right after client memory is allocatedd
or is going to be released.
In the original place it's been in race with lvm thread
which could have still unlock mutex while it's been already
destroyed.
When TEST_MODE flag is passed around the cluster,
it's been use in thread unprotected way, so it may have
influenced behaviour of other running parallel lvm commands
(activation/deactivation/suspend/resume).
Fix it by set/query function only under lvm mutex.
For hold_un/lock function calls check lock_flags bits directly.
When pvmove0 is finished, it replaces temporarily pvmove0
with error segment, however in this case, pvmove0 remains
unremovable in case pvmove --abort is interrupted in this
moment - since it's not a pvmove anymore and normal
lvremove can't be used to remove LOCKED lv.
There were two bugs before when using pvcreate --restorefile together
with data alignment and its offset specified:
- the --dataalignment was always ignored due to missing braces in the
code when validating the divisibility of supplied --dataalignment
argument with pe_start which we're just restoring:
if (pp->rp.pe_start % pp->data_alignment)
log_warn("WARNING: Ignoring data alignment %" PRIu64
" incompatible with --restorefile value (%"
PRIu64").", pp->data_alignment, pp->rp.pe_start);
pp->data_alignment = 0
The pp->data_alignment should be zeroed only if the pe_start is not
divisible with data_alignment.
- the check for compatibility of restored pe_start was incorrect too
since it did not properly count with the dataalignmentoffset that
could be supplied together with dataalignment
The proper formula is:
X * dataalignment + dataalignmentoffset == pe_start
So it should be:
if ((pp->rp.pe_start % pp->data_alignment) != pp->data_alignment_offset) {
...ignore supplied dataalignment and dataalignment offset...
}
In general for non-toplevel LVs we shouldn't allow any _tree_action.
For now error on request for cache_pool activation which
doesn't even exist in dm-table.
If there ever would be a second call to dm_lib_init()
and envvar would be improperly set, some last set value
would be used while it should reset to default mangling mode.
When the node enters dtree with implicit dependency, it
automatically has udev flags from parent node
and could not be changed later when the node has been
entered again via i.e lvm's preload tracking.
Resolve this by tracking whether the node has been
created by implicit dependency tracking or has been
entered explicitely. Implicit node could be later
upgraded by an explicit _add_dev() with proper udev_flags.
For implicit devices add special udev flags to avoid
any scan and udev rule processing if we resume such device.
Patch allows easier removing of orphan nodes.
When down-converting a RAID1 LV, if the user specifies too few devices,
they will get a confusing message.
Ex:
[root]# lvcreate -m 2 --type raid1 -n raid -L 500M taft
Logical volume "raid" created
[root]# lvconvert -m 0 taft/raid /dev/sdd1
Unable to extract enough images to satisfy request
Failed to extract images from taft/raid
This patch makes the error message a bit clearer by telling the user
the count they are trying to remove and the number of devices they
supplied.
[root@bp-01 lvm2]# lvcreate --type raid1 -m 3 -L 200M -n lv vg
Logical volume "lv" created
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sdb1
Unable to remove 3 images: Only 1 device given.
Failed to extract images from vg/lv
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bc]1
Unable to remove 3 images: Only 2 devices given.
Failed to extract images from vg/lv
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcd]1
[root@bp-01 lvm2]# lvs -a -o name,attr,devices vg
LV Attr Devices
lv -wi-a----- /dev/sde1(1)
This patch doesn't work in all cases. The user can specify the right
number of devices, but not a sufficient amount of devices from the LV.
This will produce the old error message:
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcf]1
Unable to extract enough images to satisfy request
Failed to extract images from vg/lv
However, I think this error message is sufficient for this case.
Since the usability problem were fixed, we can use this function.
Cleanup orphan LVs with TEMPORARY flags
(reduces couple blkid error reports, but couple of them
is still left...)
Since cache segment is purely virtual mapping, it has nothing for
discard. Discardable is cache origin here which is now
properly removed on 'delete' phase.
Plain lv_empty() call needs to only detach cache origin and leave
origin unchanged.
This test for LV name restriction check name of device is below 128
chars (which is enforced by dm target).
Thus it should not count with device name.
(Though the test for PATH_MAX size should be probably also added,
but this is runtime test, since theoretically devpath might differ in cluster)
It's unclear why we should prohibit use of -v output.
So reenable (like with other 'display' tools)
But -c -m is really unsupported - return invalid cmd.
Sort order for -C|--columns as with other options,
and use short capital name as the first (as with other options).
Also drop multiple reference for pvs/lvs/vgs, since now
the text for -C is really close to referrence of lvm anyway.
TODO:
It seems commit 7e685e6c70
has changed the old logic, when 'pvs device_name' used
to work. (regression from 2.02.104)
Currently put in extra pvscan.
Drop unused passed cmd pointer from function.
TODO:
We have two similar functions (though not identical)
lv_manip.c: for_each_sub_lv()
metadata.c: _lv_each_dependency()
They seem to not always match - we should probably convert
to use only a single function.
Use proper vgmem memory pool for allocation of LV name in the vg
and check if new renamed LV is a valid name.
TODO: validation should really use also VG name, othewise we are not
able to tell "vgname-lvname" will be valid.
Commit 1a832398a7 moved
some code from _pvchange_single() to main pvchange() and
introduced exit code regression as return codes have not
been properly changed, thus pvchange command exited
with '0' exit code, even though it has reported error.
Also there is a missing vg unlock in error path.
Fix it by counting the total number of expected calls before
checking for pvname and also unlock and relase vg when
pv is not found.
For a quick overview of config when debugging and to quickly check
which values are different from defaults and which are not defined
in the config and for which defaults are used.
If clvmd does not hold any lock, it should also not keep any opened
device.
The reason for this patch is, that refresh_toolcontext calls
dev_cache_exit() which destroys whole device cache (even those with
opened file) - previous patch added recovery path to avoid memory
corruption, but opened files are still bugs that need to be fixed.
So this patch certainly kills many internal mirror & raid tests,
since they leak opened file descriptors (when tests are executed
with 'abort_on_error').
Operate with lvm_thread_exit while holding lvm_thread_mutex.
Don't leave unfinished work in the lvm thread queue
and always finish all queued tasks before exit,
so no cmd struct is left in the list.
(in-release fix)
When lvm2 command works with clvmd and uses locking in wrong way,
it may 'leak' certain file descriptors in opened (incorrect) state.
dev_cache_exit then destroys memory pool of cached devices, while
_open_devices list in dev-io.c was still referencing them if they
were still opened.
Patch properly calls _close() function to 'self-heal' from this
invalid state, but it will report internal error (so execution
with abort_on_internal_error causes immediate death). On the
normal 'execution', error is only reported, but memory state is
corrected, and linked list is not referencing devices from
released mempool.
For crash see: https://bugzilla.redhat.com/show_bug.cgi?id=1073886
When we're trying to search for certain tree node
in daemon's reply, we default to a blank string ""
if the node is not found. This happens during lvmetad
initialization.
However, when the default blank string is used, we
can't use dm_config_find_str at the same time - the
dm_config_find_str_allow_empty should be used instead.
Otherwise a a warning message:
"WARNING: Ignoring unsupported value for ..."
is issued.
Before:
thin_disabled_features = ""
Now:
thin_disabled_features = []
Which is a more correct and consistent way of specifying void array
though parses can handle both forms.
Smallest supported size for swap device is 40KB, however current
test skipped devices smaller then 4096 sectors (2MB).
Since page is in bytes, convert it to sectors before comparing
with device size (in sectors).
We have to close cluster in some predicatable way,
otherwise we may access released memory from different
threads.
So move closedown till the point we know all thread
are closed. New messages from cluster are discarded.
When multiple threads act on the same 'quit' variable
the order of exit becomes unpredictable.
So let the main_loop() finish first and then clean up
all queued lvm jobs.
Do not add any new work, when lvm_thread_exit is set.
Properly clean 'client' structure only for LOCAL_SOCK type.
(Fixes bug from commit 460c19df62)
(in release fix)
Also cleanup-up associated pthreads by using cleanup_zombie() function.
Since this function may change the list, restart scanning always from
the list header.
Note: couple following patches are necessary to make this working properly.
When daemon releases memory and it is still in critical
section, issue an error message and drop memory.
We cannot do anything better for now and we at least
release allocated resource.
FIXME:
This code is triggered when i.e. clvmd is killed while
some LVs are suspend - in this case suspended devices leak,
so if this happens during i.e. clvmd upgrade we have
unresolved problem - even locked rootfs...
This function is typically called for cmd context refresh or destroy.
On the non-clustered case we already unlocked all messages,
however when i.e. 'clvmd' gets break signal it may have
still couple messages queued.
For now just report an error.
Recent debug tracing commit introduce read of uninitialized memory,
since VGID is not really a proper string which ends with '\0'.
Enforce at most 32 (ID_LEN) chars are read from vgid.
(in release fix)
Since commit f12ee43f2e call destroy,
it start to check all VGs are unlocked. However when we become_daemon,
we simply reset locking (since lock is still kept by parent process).
So implement a simple 'reset' flag.
There are two types of CPG communications in a corosync cluster:
messages and state transitions. Cmirrord processes the state
transitions first.
When a cluster mirror issues a POSTSUSPEND, it signals the end of
cluster communication with the rest of the nodes in the cluster.
The POSTSUSPEND marks the last communication of the 'message'
type that will go around the cluster. The node then calls
cpg_leave which causes a final 'state transition' communication to
all of the nodes. Once the out-going node receives its own state
transition notice from the cluster, it finalizes the leave. At this
point, the state of the log is 'INVALID'; but it is possible that
there remains some cluster trafic that was queued up behind the
state transition that still wants to be processed. It is harmless
to attempt to dispatch any remaining messages - they won't be
delivered because the node is no longer in the cluster. However,
there was a warning message that was being printed in this case
that is now removed by this patch. The failure of the dispatch
created a false positive condition that triggered the message.
pvscan --uuid was broken since it was using only 128 char buffers
without checking any write size, so any longer device path leads to
crash.
Also ansure format is properly aligned into columns with this option.
The PV size is displayed in sectors, not kilobytes
for 'pvdisplay -c'
Signed-off-by: Thomas Fehr <fehr@suse.de>
Acked-by: Hannes Reinecke <hare@suse.de>
Instead of sending repeatedly LOCAL_SYNC commands to clvmds
like 'lvs', rememeber the last sent commmand, and if there was no other
clvmd command, drop this redundant SYNC call message.
The problem has started with commit:
56cab8cc03
This introduced correct synchronisation of name, when user requests to know
open_count (needs to wait for udev), however it is also executed for
read-only cases like 'lvs' command.
For now implement very simple solution, which is only monitoring
outgoing clvmd command, and when sequence of LOCAL sync names are
recognized, they are skipped automatically.
TODO:
Future solution might move this variable info 'cmd_context' and
use 'needs_sync' flag also i.e. in file locking code.
When the backup is disabled, avoid testing backup presence.
This only leads to errors being logged in debug trace and the missing
backup can't be fixed, since it's disabled.
Check whether lvm dumpconfig --mergedconfig is used only
with --type current (where we're merging current config and
the config supplied on command line). With other types
the config was merged, but it was thrown away since we're
generating other type of config anyway. This lead to a memleak.
Error out if --mergedconfig is used with anything else than
--type current (or without specifying --type in which case
the --type current is used by default).
Decorate NULL returns with debug_cache output so the
debug log doesn't contain spurios <bactrace> line without
any reason for it.
Add internal errors when cache is misused.
The global/suffix was missing from example lvm.conf but it can
be very useful when using lvm in scripts and now in profiles as well
Let's expose it more.
man/lvm2-activation-generator.8:
Generator Specification -> Generators Specification
(this is the exact word in the systemd man page)
conf/example.conf.in:
cleanup recent edits related to report section
man/lvm.conf.5:
add a line about a possibility to generate a new
profile with lvm dumpconfig command as an alternative
to copying the default profile
Delay archiving of metadata until we really start to
update metadata when converting volume into a snapshot.
Archive is not necessary when we abort operation early.
Do not allow conversion of too small LV into a COW snapshot device.
Without this patch snapshot target is generating these kernel
messages before creation fails:
attempt to access beyond end of device
dm-9: rw=16, want=8, limit=2
attempt to access beyond end of device
...
device-mapper: table: 253:11: snapshot: Failed to read snapshot metadata
device-mapper: ioctl: error adding target to table
device-mapper: reload ioctl on failed: Input/output error
Create a separate function to validation snapshot min chunk value
and relocate code into snapshot_manip file.
This function will be shared with lvconvert then.
Usage of origin as a snapshot 'COW' volume is unsupported.
Without this test lvm2 is able to generate this ugly internal error message.
To test this:
lvcreate -L1 -n lv1 vg
lvcreate -L1 -n lv2 -s vg/lv1
lvcreate -L1 -n lv3 vg
lvconvert -s vg/lv3 vg/lv1
Internal error: LVs (5) != visible LVs (1) + snapshots (1) + internal LVs (0) in VG vg
Users can create several profiles for how the tools report
the output very easily and then just use
<lvm reporting command> --profile <report_profile_name>
We need to use "--verifyudev" for dmsetup mangle command used in
the name-mangling test since without the --verifyudev, we'd end up
with the failed rename.
Also, add direct check for the dev nodes - node with old name must
be gone and node with new name must be present. Before, we checked
just the output of the command.
One bug popped up here when renaming with udev and libdevmapper
fallback checking the udev when target mangle mode is "none"
(fixme added in the libdevmapper's node rename code).
This prevents numerous VG refreshes on each "pvscan --cache -aay" call
if the VG is found complete. We need to issue the refresh only if the PV:
- is new
- was gone before and now it reappears (device "unplug/plug back" scenario)
- the metadata has changed
"%d" in buffer_append_vf is 64 bit wide. Using just `int` for the
variable will fetch more from va_list than intended and shifting
remaining arguments resulting in errors like:
Internal error: Bad format string at '#orphan'
This breaks brew/koji as DESTDIR should not be contained in any file and
results in message like:
+ /usr/lib/rpm/check-buildroot
/builddir/build/BUILDROOT/lvm2-2.02.106-0.311.el7.x86_64/usr/share/man/man8/lvm2-activation-generator.8:.B /builddir/build/BUILDROOT/lvm2-2.02.106-0.311.el7.x86_64/usr/lib/systemd/system-generators/lvm2-activation-generator
Found '/builddir/build/BUILDROOT/lvm2-2.02.106-0.311.el7.x86_64' in installed files; aborting
error: Bad exit status from /var/tmp/rpm-tmp.UfX2SX (%install)
Reorder detection for internal device - since this test
is much simpler then target analysis, check it sooner.
Replace test for '68' with sizeof & ID_LEN
Add FIXME about device alias problem with is_reserved_lvname,
since this test fails on devices like /dev/dm-X
so we need to convert tests to UUID.
Let's do this the other way round - this makes more logic than commit b995f06.
So let's allow empty values for global/thin_disabled_features where
such an empty value now means "none of this features are disabled".
The global/thin_disabled_features should be marked as having no default
value. Otherwise the output from 'lvm dumpconfig --type default' would
have 'thin_disabled_features=""' which will produce an error message
'Ignoring empty string in config file ...' if such output is feed
back to lvm.
Even though we make pool volume as a public visible LV,
we still do not want tools to look at this volume.
While we do not create /dev/vg/lv link, device is still
accessible via /dev/mapper/vg-lv and there is no easy
way to recognize it's private without lvm2 metadata.
Enhance UUID with -pool suffix and directly skip
any LV with a suffix in device_is_usable() call.
TODO: enhance other targets with this logic.
blkid may probably use same simple logic.
When we create thin-pool we have used trick to keep
volume active, but since we now support TEMPORARY flag,
we could just localy active & deactive metadata LV,
and let the thinpool through normal activation process.
The empty pool is also the pool which has yet queued list of messages
and transaction_id == 1.
Problem is exposed when pool is created inactive.
lvcreate -L10 -T vg/pool -an
lvcreate -V10 -T vg/pool
When pool_has_message() is queried with NULL lv and 0 device_id
it should just return 'true' when there is any message queued.
So it needs to return negative value dm_list_empty().
Since there is no user for this code path in code currently,
this bug has not been triggered.
When the last entry in the timeout queue is unregistered,
wakeup sleeping condition, so the thread is deleted earlier.
So the thread resource is release earlier.
Also when monitored with tools like valgrind this eliminites reported
leak.
Individual events are handled through separate threads,
so once we have more then a single thread in this eventwait
sleeping, we got race on the dm_log setting, since
if one event is timeout out on alarm, while another is still waiting,
then dm log has been restored to NULL and the next sigalarm
has been reported as error.
Fix it by introducing counter which is protected via mutex,
and only when the last event is released, logging is restored.
TODO: libdm seems to have some static vars which may audit
for this type of use.
This patch will releases allocated private resources from
startup. Needs previous dm_zalloc patch to ensure unset
private pointer is NULL.
TODO: check on real cluster.
We can't use mempool for temporary variable for configuration path inside
find_config_tree_* functions since these functions can use the mempool
themselves deeper in the code and we can free mempool chunks only from
top to bottom which is not the case here (some default string
configuration values can be allocated from the mempool).
Don't add blkid to every linkage.
Link udev library just with lvm tools.
Drop extra linkage of udev library, since deps from libdevmapper
are already resolved in linked -ldevmapper.
Based on patch:
https://www.redhat.com/archives/lvm-devel/2014-March/msg00015.html
The CPPFunction typedef (among others) have been deprecated in favour of
specific prototyped typedefs since readline 4.2 (circa 2001).
It's been working since because compatibility typedefs have been in
place until they where removed in the recent readline 6.3 release.
Switch to the new style to avoid build breakage.
But also add full backward compatibility with define.
Signed-off-by: Gustavo Zacarias <gustavo zacarias com ar>
The same as for allocation/thin_pool_chunk_size - the default value
used is just a starting point. The calculation continues using the
properties of the devices actually used.
The allocation/thin_pool_chunk_size is a bit more complex. It's default
value is evaluated in runtime based on selected thin_pool_chunk_size_policy.
But the value is just a starting point. The calculation then continues
with dependency on the properties of the devices used. Which means for
such a default value, we know only the starting value.
If the config setting is defined as having no default value, but it's
still not NULL, it means such a value acts as a *hint* only
(e.g. a starting value from which the default value is calculated).
The new "cfg_def_get_default_value_hint" will always return the value
as defined in config_settings.h.
The original "cfg_def_get_default_value" will always return 0/NULL if
the config setting is defined with CFG_DEFAULT_UNDEFINED flag (hence
ignoring the hint).
This is needed for proper distiction between a correct default value
and the value which is just a hint or a starting point in calculation,
but it's not the final value (yes, we do have such settings!).
The devices/cache and devices/cache_dir are evaluated in runtime this way:
- if devices/cache is set, use it
- if devices_cache/dir or devices/cache_file_prefix is set, make up a
path out of that for devices/cache in runtime, taking into account
the LVM_SYSTEM_DIR environment variable if set
- otherwise make up the path out of default which is:
<LVM_SYSTEM_DIR>/<cache_dir>/<cache_file_prefix>.cache
With the runtime defaults, we can encode this easily now. Also, the lvm
dumpconfig can show proper and exact information about this setting then
(the variant that shows default values).
Previously, we declared a default value as undefined ("NULL") for
settings which require runtime context to be set first (e.g. settings
for paths that rely on SYSTEM_DIR environment variable or they depend
on any other setting in some way).
If we want to output default values as they are really used in runtime,
we should make it possible to define a default value as function which
is evaluated, not just providing a firm constant value as it was before.
This patch defines simple prototypes for such functions. Also, there's
new helper macros "cfg_runtime" and "cfg_array_runtime" - they provide
exactly the same functionality as the original "cfg" and "cfg_array"
macros when defining the configuration settings in config_settings.h,
but they don't set the constant default value. Instead, they automatically
link the configuration setting definition with one of these functions:
typedef int (*t_fn_CFG_TYPE_BOOL) (struct cmd_context *cmd, struct profile *profile);
typedef int (*t_fn_CFG_TYPE_INT) (struct cmd_context *cmd, struct profile *profile);
typedef float (*t_fn_CFG_TYPE_FLOAT) (struct cmd_context *cmd, struct profile *profile);
typedef const char* (*t_fn_CFG_TYPE_STRING) (struct cmd_context *cmd, struct profile *profile);
typedef const char* (*t_fn_CFG_TYPE_ARRAY) (struct cmd_context *cmd, struct profile *profile);
(The new macros actually set the CFG_DEFAULT_RUNTIME flag properly and
set the default value link to the function accordingly).
Then such configuration setting requires a function of selected type to
be defined. This function has a predefined name:
get_default_<id>
...where the <id> is the id of the setting as defined in
config_settings.h. For example "backup_archive_dir_CFG" if defined
as a setting with default value evaluated in runtime with "cfg_runtime"
will automatically have "get_default_backup_archive_dir_CFG" function
linked to this setting to get the default value.
Using mempool is much safer than using the global static variable.
The global variable would be rewritten on each find_config_tree_* call
and we need to be very careful not to get into this problem (we don't
do now, but we can with the patches for "runtime defaults" that will follow).
These settings don't have any default value predefined:
log/file
log/activate_file
global/library_dir
This settings has default value but not yet declared in config_settings.h:
global/locking_library (default is DEFAULT_LOCKING_LIB)
cmirrord polls for messages on the kernel and cluster interfaces.
Sometimes it is possible for messages to be received on the cluster
interface and be waiting for processing while the node is in the
process of leaving the cluster group. When this happens, the
messages received on the cluster interface are attempted to be
dispatched, but an error is returned because the connection is no
longer valid. It is a harmless situation. So, if we get the
specific error (CS_ERR_BAD_HANDLE) and we know that we have left
the group, then simply don't print the message.
If the PV label is lost (e.g. by doing a dd on the device), call
"systemd-run pvscan --cache <major>:<minor>" in 69-dm-lvm-metad.rules
to inform lvmetad about this state.
The reason for this is that ENV{SYSTEMD_WANTS}="lvm2-pvscan@<major>:<minor>"
logic will not cause the pvscan to be fired in this case since this works
only on proper device addition/removal cycle - the lvm2-pvscan service's
ExecStop is called only on proper REMOVE event - the service is bound to
device existence. Hence we need pvscan call via systemd-run (that
instantiates a quick transient service just to call the command).
See also https://bugzilla.redhat.com/show_bug.cgi?id=1063813.
It's not so easy to recongnize unusable /dev/kmsg
Reorder the code in a way if the first regular read of /dev/kmsg
fail, fallback to klogctl interface.
Call drain_dmesg also for the case there is no user log output.
Add a bit more complexity here - Switch to use /dev/kmsg
which has been introduced in 3.5 kernels and could run without
lossing lines from /proc/kmsg.
On older systems user may set env var LVM_TEST_CAN_CLOBBER_DMESG=1
to get kernel messages via klogctl() call (which deletes dmesg buffer)
otherwise no logging of kernel messages is provided.
Since there could be multiple readers of kmsg (test & journald) it needs
to be fast, to capture things like sysrq trace.
But to capture whole output it would need to prioritize reading of kmsg,
thus we would first log kernel messages and followed by command output.
As a trade-off always log command output first and use large drain
buffer so is captures most of messages, but occasionaly miss some
lines.
Use separate files for raid1, raid456, raid10.
They need different target versions to work, so support
more precise test selection.
Optimize duplicate tests of target avalability and skip
unsupported test cases sooner.
Basically reverts commit af8580d756.
"test: Use klogctl in the harness instead of reading /var/log/messages."
Problem is - this interface clears dmesg buffer
(just like call of dmesg -c)
Thus after running lvm2 test suitedmesg is empty - while all the
messages are usually logged in the journal/message, it's still not nice to
clear dmesg buffer.
It's not a pure revert, but switch to use /proc/kmsg directly instead of
reading /var/log/messages.
In cases where PV appears on a new device without disappearing from an old one
first, the device->pvid pointers could become ambiguous. This could cause the
ambiguous PV to be lost from the cache when a different PV comes up on one of
the ambiguous devices.
This is probably not optimal, but makes the lvmetad case mimic non-lvmetad code
more closely. It also fixes vgremove of a partially corrupt VG with lvmetad, as
_vg_write_raw (and consequently, entire vg_write) currently panics when it
encounters a corrupt MDA. Ideally, we'd be able to explicitly control when it is
safe to ignore them.
%FREE allocation has been broken for RAID. At 100%FREE, there is
still an extent left for certain tests. For now, change the test
to warn rather than completely fail.
Move flags for segments to segtype header where it seems more closely
related as the features are related to segtype and not activation.
Use unsigned #define - since it's more common in lvm2 source code
for bit flags.
Condition was swapped - however since it's been based on 'random'
memory content it's been missed as attribute has not been set.
So now we have quite a few possible results when testing.
We have old status without separate metadata and
we have kernels with fixed snapshot leak bug.
(in-release update)
Code uses target driver version for better estimation of
max size of COW device for snapshot.
The bug can be tested with this script:
VG=vg1
lvremove -f $VG/origin
set -e
lvcreate -L 2143289344b -n origin $VG
lvcreate -n snap -c 8k -L 2304M -s $VG/origin
dd if=/dev/zero of=/dev/$VG/snap bs=1M count=2044 oflag=direct
The bug happens when these two conditions are met
* origin size is divisible by (chunk_size/16) - so that the last
metadata area is filled completely
* the miscalculated snapshot metadata size is divisible by extent size -
so that there is no padding to extent boundary which would otherwise
save us
Signed-off-by:Mikulas Patocka <mpatocka@redhat.com>
While stripe size is twice the physical extent size,
the original code will not reduce stripe size to maximum
(physical extent size).
Signed-off-by: Zhiqing Zhang <zhangzq.fnst@cn.fujitsu.com>
Switch to use ext2 to make it usable on older systems.
Previous test has not been able to catching problem.
Multiple tests are now put in.
FIXME: validate what is doing kernel target when
the header is undeleted and same chunk size is used.
It seems snapshot target successfully resumes and
just complains COW is not big enough:
kernel: dm-8: rw=0, want=40, limit=24
kernel: attempt to access beyond end of device
When chunk size is different it fails instantly.
For checking this with lvm2 and this test case use this patch:
--- a/tools/lvcreate.c
+++ b/tools/lvcreate.c
@@ -769,7 +769,7 @@ static int _read_activation_params(struct
lvcreate_params *lp,
lp->permission = arg_uint_value(cmd, permission_ARG,
LVM_READ | LVM_WRITE);
- if (lp->snapshot) {
+ if (0 && lp->snapshot) {
/* Snapshot has to zero COW header */
lp->zero = 1;
lp->wipe_signatures = 0;
---
and switch to use -c 4 for both snapshots
When read-only snapshot was created, tool was skipping header
initialization of cow device. If it happened device has been
already containing header from some previous snapshot, it's
been 'reused' for a newly created snapshot instead of being cleared.
To make "lvm dumpconfig --type default" output to be usable like any
other config, we need to comment out lines that have no default value
defined. Otherwise, we'd have the output with config options
with blank or zero values which is not the same as when the value
is not defined! And such configuration can't be feed into lvm again
without further edits. So let's fix this.
Currently this covers these configuration options exactly:
devices/loopfiles
devices/preferred_names
devices/filter
devices/global_filter
devices/types
allocation/cling_tag_list
global/format_libraries
global/segment_libraries
activation/volume_list
activation/auto_activation_volume_list
activation/read_only_volume_list
activation/mlock_filter
metadata/dirs
metadata/disk_areas
metadata/disk_areas/<disk_area>
metadata/disk_areas/<disk_area>/start_sector
metadata/disk_areas/<disk_area>/size
metadata/disk_areas/<disk_area>/id
tags/<tag>
tags/<tag>/host_list
The code seems to work fine for the most trivial case - moving a
simple cache LV. However, it can cause problems when trying to
split out other LVs on different VGs and there hasn't been sufficient
testing for LV stacks that contain cache to enable the code. So,
we actively disable what is already broken and wait for the next
release to fix it.
Start to convert percentage size handling in lvresize to the new
standard. Note in the man pages that this code is incomplete.
Fix a regression in non-percentage allocation in my last check in.
This is what I am aiming for:
-l<extents>
-l<percent> LV/ORIGIN
sets or changes the LV size based on the specified quantity
of logical logical extents (that might be backed by
a higher number of physical extents)
-l<percent> PVS/VG/FREE
sets or changes the LV size so as to allocate or free the
desired quantity of physical extents (that might amount to a
lower number of logical extents for the LV concerned)
-l+50%FREE - Use up half the remaining free space in the VG when
carrying out this operation.
-l50%VG - After this operation, this LV should be using up half the
space in the VG.
-l200%LV - Double the logical size of this LV.
-l+100%LV - Double the logical size of this LV.
-l-50%LV - Reduce the logical size of this LV by half.
Reorder detection of cmirrord. Now if cmirrord is not
running, target will not try to load kernel log module,
for communication with cmirrord.
Whole check for attrs now also happens just once.
Test raid10 availability as a target feature (instead of doing
it in all the places where raid10 should be checked).
TODO: activation needs runtime validation - so metadata with raid10
are skipped from activation in user-friendly way in lvm2.
Parsing vg structure during supend/commit/resume may require a lot of
memory - so move this into vg_write.
FIXME: there are now multiple cache layers which our doing some thing
multiple times at different levels. Moreover there is now different
caching path with and without lvmetad - this should be unified
and both path should use same mechanism.
Reuse _node_send_messages for just checking
for valid transaction_id with preload.
This allows earlier detection of incosistent thin pool.
Code does the same thing, except for sending messages.
Improve testing of transation_id to not allow other difference
then either kernel TID is equal or is lower by oned and there
are queued messages for transaction.
Mark messages as submitted if the transaction_id is already matching.
Do not try to deactivate node on failure here and leave it on
proper error path of the caller.
Deactivation of top level node has to happen,
before traversing subtree.
Swap list logic and rather append new nodes to the head
and then use normal iteration.
(in-release update)
Skip over LVs that have a cache LV in their tree of LV dependencies
when performing a pvmove.
This means that users cannot move a cache pool or a cache LV's origin -
even when that cache LV is used as part of another LV (e.g. a thin pool).
The new test (pvmove-cache-segtypes.sh) currently builds up various LV
stacks that incorporate cache LVs. pvmove tests are then performed to
ensure that cache related LVs are /not/ moved. Once pvmove is enabled
for cache, those tests will switch to ensuring that the LVs /are/
moved.
Several fixes for the recent changes that treat allocation percentages
as upper limits.
Improve messages to make it easier to see what is happening.
Fix some cases that failed with errors when they didn't need to.
Fix crashes when first_seg() returns NULL.
Remove a couple of log_errors that were actually debugging messages.
Test currently fails with make check_cluster - so uses 'should'
CLVMD[4f435880]: Feb 19 23:27:36 Send local reply
format_text/archiver.c:230 WARNING: This metadata update is NOT backed up
metadata/mirror.c:1105 Failed to initialize log device
metadata/mirror.c:1145 <backtrace>
lvconvert.c:1547 <backtrace>
lvconvert.c:3084 <backtrace>
Remove 'skip' argument passed into the function.
We always used '0' - as this is the only supported
option (-K) and there is no complementary option.
Also add some testing for behaviour of skipping.
We already have /dev/disk/by-id/dm-uuid-... (which encompasses the
VG UUID and LV UUID in case of LVs since the mapping's UUID is
VG+LV UUID together) and /dev/disk/by-id/dm-name-... (which encompasses
the VG and LV name in case of LVs).
This patch addds /dev/disk/by-id/lvm-pv-uuid-<PV_UUID> that completes
this scheme and makes navigation a bit easier using PV UUIDs since
one can navigate using PV UUIDs only and there's no need to do extra
PV UUID <--> kernel name matching (the PV UUID is stable across reboots).
This may come in handy in various scripts.
Since we already have the PV UUID stored in udev database (as a result
of blkid call - returned in ID_FS_UUID blkid's variable), this operation
is very cheap indeed, just creating the extra one symlink.
There are typically 2 functions for the more advanced segment types that
deal with parameters in lvcreate.c: _get_*_params() and _check_*_params().
(Not all segment types name their functions according to this scheme.)
The former function is responsible for reading parameters before the VG
has been read. The latter is for sanity checking and possibly setting
parameters after the VG has been read.
This patch adds a _check_raid_parameters() function that will determine
if the user has specified 'stripe' or 'mirror' parameters. If not, the
proper number is computed from the list of PVs the user has supplied or
the number that are available in the VG. Now that _check_raid_parameters()
is available, we move the check for proper number of stripes from
_get_* to _check_*.
This gives the user the ability to create RAID LVs as follows:
# 5-device RAID5, 4-data, 1-parity (i.e. implicit '-i 4')
~> lvcreate --type raid5 -L 100G -n lv vg /dev/sd[abcde]1
# 5-device RAID6, 3-data, 2-parity (i.e. implicit '-i 3')
~> lvcreate --type raid6 -L 100G -n lv vg /dev/sd[abcde]1
# If 5 PVs in VG, 4-data, 1-parity RAID5
~> lvcreate --type raid5 -L 100G -n lv vg
Considerations:
This patch only affects RAID. It might also be useful to apply this to
the 'stripe' segment type. LVM RAID may include RAID0 at some point in
the future and the implicit stripes would apply there. It would be odd
to have RAID0 be able to auto-determine the stripe count while 'stripe'
could not.
The only draw-back of this patch that I can see is that there might be
less error checking. Rather than informing the user that they forgot
to supply an argument (e.g. '-i'), the value would be computed and it
may differ from what the user actually wanted. I don't see this as a
problem, because the user can check the device count after creation
and remove the LV if they have made an error.
When clustered VG is available in the system but we don't have
clustering set up for whatever reason, the lvm2-monitor scripts should
not fail completely just because these clustered VGs are skipped during
vgs/vgchange calls in lvm2-monitor initscript/systemd unit.
Avoid introducing libdm structure allocated in library user.
Use direct call with all currently supported args.
When new arg is added, new function will cover it.
When an origin exists and the 'lvcreate' command is used to create
a cache pool + cache LV, the table is loaded into the kernel but
never instantiated (suspend/resume was never called). A user running
LVM commands would never know that the kernel did not have the
proper state unless they also ran the dmsetup 'table/status' command.
The solution is to suspend/resume the cache LV to make the loaded
tables become active.
Do not use default dependencies that systemd adds to the units
so we have better control of when the service is started/stopped
and we don't end up with unexpected behaviour.
When the activation units are generated if use_lvmetad=0 (no
autoactivation), use --ignoreskippedcluster option for vgchange calls
since the cluster with cLVM is set up by separate units.
This avoids a situation in which the generated activation units are
improperly in failed state just because of the vgchange return value
when clustered VGs are encountered while the activation of non-clustered
VGs does proceed normally.
Introduce a new parameter called "approx_alloc" that is set when the
desired size of a new LV is specified in percentage terms. If set,
the allocation code tries to get as much space as it can but does not
fail if can at least get some.
One of the practical implications is that users can now specify 100%FREE
when creating RAID LVs, like this:
~> lvcreate --type raid5 -i 2 -l 100%FREE -n lv vg
I've added an "Advanced Logical Volume Types" section that I hope
to contain information on the logical volume types that may use
multiple steps and multiple commands to create. Cache is the
first entry into this section. I'd like to see thin and RAID in
here in the future.
Update the man page so the user knows that dm-cache 1.3.0 module
is needed. Also, enforce that in the code and print a warning if
the module is not new enough.
Users now have the ability to convert their existing logical volumes
into cached logical volumes. A cache pool LV must be specified using
the '--cachepool' argument. The cachepool is the small, fast LV used
to cache the large, slow LV that is being converted.
This patch allows users to convert existing logical volumes into
cache pool LVs. Since cache pool LVs consist of data and metadata
sub-LVs, there is also the '--poolmetadata' (similar to thin_pool)
which allows for the specification of the metadata device.
Replace some in-test use of lvs commands with their check
and get equivalent.
Advantage is these 'checking' commands are not necessarily always
valiadated via extensive valgrind testing and also the output noice
is significantly reduces since the output of check/get is suppressed.
lv_active_change will enforce proper activation.
Modification of activation was wrong and lead to misuse of
autoactivation. Fix allows to use proper local exclusive activation,
while the removed code turned this into just exclusive
activation (losing required local property).
lvm2_cluster_activation_red_hat.service.in -> lvm2_cluster_activation_systemd_red_hat.service.in
lvm2_clvmd_red_hat.service.in -> lvm2_clvmd_red_hat.service.in
Edit lvm2-cluster-activation reference on cmirror - take new
lvm2-cmirrord.service, it was just cmirrord(.service) before
as the old initscript was used in compatibility mode.
Also, use WantedBy=multi-user.target instead of sysinit.target
in lvm2-cluster-activation.service.
The commit splits original clvmd service in two new native services
for systemd enabled systems while original init scripts remain unaltered.
New systemd native services:
1) clvmd daemon itself (lvm2_clvmd_red_hat.service.in)
2) (de)activation of clustered VGs (lvm2_cluster_activation_red_hat.service.in)
There're several reasons to split it. First, there's no support for conditional
stop in systemd and AFAIK they don't plan to support it. In other words:
if the deactivation fails for some reason, systemd doesn't care and will simply
kill all remaining processes in original cgroup (by default). Killing the
remaining procs can be suppressed however it doesn't solve the following problem:
You can't repeat the stop command of a failed service. The repeated stop command
is simply not propagated to the service in a failed state. You would have to start
and then try to stop the service again. Unfortunately, this can't be done while
the daemon is still running (and we need the daemon to stay active until all
clustered VGs are deactivated properly).
In a separated setup we need only to restart the failed activation service and
that's fine.
No need to fork lvmetad when running under systemd.
Also, the "lvmetad -R" support has been removed in lvm2 v2.02.98
so remove the ExecReload line that called it on "systemctl reload".
The libblkid can detect DM_snapshot_cow signature and when creating
new LVs with blkid wiping used (allocation/use_blkid_wiping=1 lvm.conf
setting and --wipe y used at the same time - which it is by default).
Do not issue any prompts about this signature when new LV is created
and just wipe it right away without asking questions. Still keep the
log in verbose mode though.
gcc reports:
metadata/merge.c:229:58: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
metadata/merge.c:232:58: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
The DM_EVENT_GET_PARAMETERS requests the parameters under which
the running dmeventd is run and the it sends them to caller.
The parameters sent:
- the pid of the running dmeventd
- foreground state
- exec_method (currently either "direct" or "systemd")
The exact message sent back:
pid=<pid> daemon=<no/yes> exec_method=<direct/systemd>
Trying to restart dmeventd as a reload action is causing problems
under systemd environment. The systemd loses track of new dmeventd
this way. See also https://bugzilla.redhat.com/show_bug.cgi?id=1060134
for more info.
We need to call dmeventd -R directly instead of "systemctl reload dm-event.service"
that was used before (the reload is aimed at configuration reload anyway,
not stateful restart of the daemon - we did this before just because
there's no ExecRestart in systemd and there's only ExecStart and
ExecStop with which we'd lose the state).
Also, use ExecStart="dmeventd -f" to run dmeventd in foreground
(and let's rely on systemd to daemonize it) and change the
service type from "forking" to "simple".
This patch allows users to create cache LVs with 'lvcreate'. An origin
or a cache pool LV must be created first. Then, while supplying the
origin or cache pool to the lvcreate command, the cache can be created.
Ex1:
Here the cache pool is created first, followed by the origin which will
be cached.
~> lvcreate --type cache_pool -L 500M -n cachepool vg /dev/small_n_fast
~> lvcreate --type cache -L 1G -n lv vg/cachepool /dev/large_n_slow
Ex2:
Here the origin is created first, followed by the cache pool - allowing
a cache LV to be created covering the origin.
~> lvcreate -L 1G -n lv vg /dev/large_n_slow
~> lvcreate --type cache -L 500M -n cachepool vg/lv /dev/small_n_fast
The code determines which type of LV was supplied (cache pool or origin)
by checking its type. It ensures the right argument was given by ensuring
that the origin is larger than the cache pool.
If the user wants to remove just the cache for an LV. They specify
the LV's associated cache pool when removing:
~> lvremove vg/cachepool
If the user wishes to remove the origin, but leave the cachepool to be
used for another LV, they specify the cache LV.
~> lvremove vg/lv
In order to remove it all, specify both LVs.
This patch also includes tests to create and remove cache pools and
cache LVs.
This patch allows the creation and removal of cache pools. Users are not
yet able to create cache LVs. They are only able to define the space used
for the cache and its characteristics (chunk_size and cache mode ATM) by
creating the cache pool.
A cache LV - from LVM's perpective - is a user accessible device that
links the cachepool LV and the origin LV. The following functions
were added to facilitate the creation and removal of this top-level
LV:
1) 'lv_cache_create' - takes a cachepool and an origin device and links
them into a new top-level LV of 'cache' segment type. No allocation
is necessary in this function, as the sub-LVs contain all of the
necessary allocated space. Only the top-level layer needs to be
created.
2) 'lv_cache_remove' - this function removes the top-level LV of a
cache LV - promoting the cachepool and origin sub-LVs to top-level
devices and leaving them exposed to the user. That is, the
cachepool is unlinked and free to be used with another origin to
form a new cache LV; and the origin is no longer cached.
(Currently, if the cache needs to be flushed, it is done in this
function and the function waits for it to complete before proceeding.
This will be taken out in a future patch in favor of polling.)
Cache pools require a data and metadata area (like thin pools). Unlike
thin pool, if 'cache_pool_metadata_require_separate_pvs' is not set to
'1', the metadata and data area will be allocated from the same device.
It is also done in a manner similar to RAID, where a single chunk of
space is allocated and then split to form the metadata and data device -
ensuring that they are together.
Building on the new DM function that parses DM cache status, we
introduce the following LVM level functions to aquire information
about cache devices:
- lv_cache_block_info: retrieves information on the cache's block/chunk usage
- lv_cache_policy_info: retrieves information on the cache's policy
I am reverting the commit below - removing the new 'dm_config_get_int'
function and simply calling 'dm_config_get_uint32' while casting the
'int *' pointer parameter.
Commit being reverted:
commit 94377dfd5e
Author: Jonathan Brassow <jbrassow@redhat.com>
Date: Mon Jan 27 05:26:19 2014 -0600
Misc: New function for reading lvm config file fields
Introduce 'dm_config_get_int', which will be used by the upcoming
cachepool segment type.
Avoid use of external origin with size unaligned/incompatible with
thin pool chunk size, since the last chunk is not correctly provisioned
when it is overwritten.
Avoid starting conversion of the LV to the thin pool and thin volume
at the same time. Since this is mostly a user mistake, do not try
to just convert to one of those type, since we cannot assume if the
user wanted LV to become thin volume or thin pool.
Before the fix tool reported pretty strange internal error:
Internal error: Referenced LV lvol1_tdata not listed in VG mvg.
Fixed output:
lvconvert --thinpool lvol0 -T mvg/lvol0
Can't use same LV mvg/lvol0 for thin pool and thin volume.
Since we are currently incapable of providing zeroes for
reextended thin volume area, let's disable extension of
such already reduce thin volumes.
(in-release change)
This patch defines a structure for holding all of the device-mapper
cache target's status information. The associated function provides
an easy way for higher levels (LVM) to consume the information.
This patch finishes the device-mapper interface for the cache and
cachepool segment types (i.e. the cache target).
This patch adds the cache segment type - the second of two necessary
to create cache logical volumes. This segment type references the
cachepool (the small fast device) and the origin (the large slow device);
linking them to create the cache device. The cache device is the
hierarchical device-mapper device that the user ulitmately makes use
of.
The cache segment sources the information necessary to construct the
device-mapper cache target from the origin and cachepool segments to
which it links.
This patch adds the new cachepool segment type - the first of two
necessary to eventually create 'cache' logical volumes. In addition
to the new segment type, updates to makefiles, configure files, the
lv_segment struct, and some necessary libdevmapper flags.
The cachepool is the LV and corresponding segment type that will hold
all information pertinent to the cache itself - it's size, cachemode,
cache policy, core arguments (like migration_threshold), etc.
When lvm2 command forks, it calls reset_locking(),
which as an unwanted side effect unlinked lock file from filesystem.
Patch changes the behavior to just close locked file descriptor
in children - so the lock is being still properly hold in the parent.
Test LVM_LVMETAD_PIDFILE for pid for lvm command.
Fix WHATS_NEW envvar name usage
Fix init order in prepare_lvmetad to respect set vars
and avoid clash with system settings.
Update test to really test the 'is running' message.
Comparing for available feature missed the code path, when
maj is already bigger.
The bug would be only hit in the case, thin pool target would have
increased major version.
When thin volume is using external origin, current thin target
is not able to supply 'extended' size with empty pages.
lvm2 detects version and disables extension of LV past the external
origin size in this case.
Thin LV could be however still reduced and extended freely bellow
this size.
In preparation for other segment types that create and use "pools", we
s/create_thin_pool/create_pool/. This way it is not awkward when creating
a cachepool, for example, to use "create_thin_pool".
Functions that handle set-up, tear-down and creation of thin pool
volumes will be more generally applicable when more targets exist
that make use of device-mapper's persistent data format. One of
these targets is the dm-cache target. I've selected some functions
that will be useful for the cache segment type to be moved, since
they will no longer be thin pool specific but are more broadly
useful to any segment type that makes use of a 'pool' LV.
We need both offset and length when trying to wipe detected signatures.
The libblkid can fail so it's good to have an error message issued for
this state instead of being silent (libblkid does not issue any error
messages here). We just issued "stack" here before but that was not
quite useful if some error occurs...
Clear temporary DM_DISABLE_OTHER_RULES_FLAG properly. This did not
cause any bug or problem as the temporary variable is overwritten next
time it's used again, but we should still clean it properly!
Only flag thin LV for no scanning in udev if this LV is about
to be wiped. This happens only in case the thin LV's pool was not
created with zeroing of the new blocks enabled.
Since support for xfs_check is going to be obsoleted,
replace its usage with xfs_repair -n tool.
However this tool needs further intrumentation, since for really small
xfs devices (having just 1 allocation group) it needs to pass in
flag: "-o force_geometry". As we run the tool with '-n', it should
be safe to pass this flag always.
FIXME: figure way without always passing this flag.
Revert activated volumes if callback fails.
This is currently used only for thin_check failure support.
When thin_check detects failure in thin metadata device, it deactivate
volumes in reversed order that have been preloaded for thin pool activation.
After this change lvm command will not leave active pool subvolumes
in dm table.
The size of any metadata must be ignored when calculating the size of an
orphan PV.
Bug introduced by 603b45e0ed ("pvresize: Do
not use pv_read (get the PV from orphan VG).")
Block creations of archive and backup files for internal orphan VGs.
Bug introduced by 603b45e0ed ("pvresize: Do
not use pv_read (get the PV from orphan VG).")
Do not drop device's flag to report readiness for systemd
processing if there's any event that follows the activatiion
event itself. Otherwise, systemd would lost track of this device
on any other event that follows the activating event (IOW, we
need to make SYSTEMD_READY variable change level-based, not edge-based).
This patch applies for MD and loop devices used as PVs.
(intra-release fix for commit 4c267c7286)
DO NOT USE LVMETAD IF YOU HAVE ANY LVM1-FORMATTED PVS.
You may continue to use it without lvmetad, but do please schedule
an upgrade to the lvm2 format (with 'vgconvert').
Sending the original LVM1 formatted metadata to lvmetad is breaking
assumptions made by the code, so I am marking the format as obsolete for
now and no longer sending it to lvmetad.
This means that if you are using lvmetad, lvm1 volumes will usually
appear invisible - though not always: it depends on exactly what
sequence of commands you run!
The current situation is not satisfactory.
We'll either fix lvmetad and reenable this or we'll fix the code to
issue appropriate warning messages when lvm1 PVs are encountered
to avoid accidents.
(The latest unfixed problem is that lvmetad assumes metadata sequence
numbers exist and always increase - but the lvm1 format does not define
or store any sequence number, confusing both the daemon and client
when default values get passed to-and-fro.)
Several fields used to display 0 if undefined. Recent changes
to the way the fields are reported threw away some tests for
valid pointers, leading to segfaults with 'pvs -o all'.
Reinstate the original behaviour.
If a PV in an existing VG becomes orphaned (with 'pvcreate -ff', for
example) the VG struct cached against its vginfo must be invalidated.
This is because the struct device it references no longer contains
the PV label so becomes incorrect.
This triggers the error:
Internal error: PV $dev unexpectedly not in cache.
when the PV from the cached VG metadata is subsequently looked up
in the cache.
Bug introduced in 2.02.87 by commit 7ad0d47c3c
("Cache and share generated VG structs").
Before:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/loop3 vg12 lvm2 a-- 28.00m 28.00m
/dev/loop4 vg12 lvm2 a-- 28.00m 28.00m
lvm> pvcreate -ff /dev/loop3
Really INITIALIZE physical volume "/dev/loop3" of volume group "vg12" [y/n]? y
WARNING: Forcing physical volume creation on /dev/loop3 of volume group "vg12"
Physical volume "/dev/loop3" successfully created
lvm> pvs
Internal error: PV /dev/loop3 unexpectedly not in cache.
PV VG Fmt Attr PSize PFree
/dev/loop3 vg12 lvm2 a-- 28.00m 28.00m
/dev/loop3 lvm2 a-- 32.00m 32.00m
/dev/loop4 vg12 lvm2 a-- 28.00m 28.00m
After:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/loop3 vg12 lvm2 a-- 28.00m 28.00m
/dev/loop4 vg12 lvm2 a-- 28.00m 28.00m
lvm> pvcreate -ff /dev/loop3
Really INITIALIZE physical volume "/dev/loop3" of volume group "vg12" [y/n]? y
WARNING: Forcing physical volume creation on /dev/loop3 of volume group "vg12"
Physical volume "/dev/loop3" successfully created
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/loop3 lvm2 a-- 32.00m 32.00m
/dev/loop4 vg12 lvm2 a-- 28.00m 28.00m
unknown device vg12 lvm2 a-m 28.00m 28.00m
Pass dnode pointer instead of rather unknown child pointer.
The pointer is currently unused and passing child pointer
is quite undefined, while dnode has at least some usability.
Make this code a bit more readable for Coverity as otherwise
it marks the "type" variable in the "_thin_pool_add_message" fn
as undefined for certain path (...which is normally unreachable anyway,
but let's clean this up).
Introduce FMT_OBSOLETE to identify pool metadata and use it and FMT_MDAS
instead of hard-coded format names.
Explain device accesses on pvscan --cache man page.
lvm has a default umask value in the config file that defaults
to 0077 which lvm changes to during normal operation. This
causes a problem when the code is used as a library with
liblvm as it is changing the umask for the process. This
patch saves off the current umask, sets to what is specified
in the config file and restores it what it was on library
function call exit.
The user is now free to change the umask in their application at
anytime including between library calls.
This fix address BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1012113
Tested by setting umask to 0777 and running the python unit
test and verifying that umask is still same value as expected
at the test completion and with a successful run.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When using filters for the pvscan --cache (the global_filter),
there's a difference between:
pvscan --cache -aay /dev/block/<major>:<minor>
and
pvscan --cache -aay <major>:<minor> (or --major <major> --minor <minor>)
In the first case, we need to be sure to have an exact matching line
in the filter for the device to be used, no aliases are considered
So for example even if we have accept rule for "/dev/sda" present,
this won't apply for "/dev/block/8:0" even though it's the same device!
This is because we're comparing the path used on command line directly
with the path written in the rule.
For the second one, any alias mentioned in the filter will apply
as we're comparing the major and minor pair, not looking at actual
device names - so any alias mentioned in the rules will suffice for
the filtering rule to apply.
For the global_filter to be properly used, we need to call the
second one in the lvm2-pvscan@.service - nobody is able to tell
what value of major:minor the kernel assignes next time, hence
this bug makes the use of global_filter quite unusable!
If there is no define for BLKPBSZGET - we have hard time how to
decrypt physical block size - we can't use here block_size,
since this is usually 4k while we need to use 512b.
FIXME: find some better way, until that enforce value 512.
Eventually we could also try to put in:
+#ifndef BLKPBSZGET
+# define BLKPBSZGET _IO(0x12,123)
+#endif
but this will still not work well on old kernels.
This reverts commit 24639be558.
Ok - seems we could be here a bit too active - and we
may remove devices which are unsuable for reasons we are not
aware of - thus taking down whole device could be way to big hammer.
So we still need some solution to recover from failing preload
and activation - but it needs more tunning.
When activation fails - we may leak large tree of partially loaded
devices in the dm table (i.e. failure in snapshot activation)
The best we can do here is try to deactivate whole device and
remove as much inactive table entries as we can.
When LV is scanned for its dependencies - scan also origin's snapshots,
and thin external origins.
So if any PV from snapshot or external origin device is missing - lvm2 will
avoid trying to activate such device.
The metadata/disk_areas setting was incorrectly registered as
"string" configuration option but it's a section where each area
is defined in its own subsection with "start_sector", "size" and "id"
setting.
This setting is not officialy supported, it's undocumented and it's
used solely for debugging.
Note: At this moment, it does not seem to be working with lvmetad!
When the device is inserted in dev_name_confirmed() stat() is
called twice as _insert() has it's own stat() call.
Extend _insert() parameter with struct stat* - which could be used
if it has been just obtained. When NULL is passed code is
doing its own stat() call as before.
Use internal type by default for thin provisioning.
If user is not interested in thin provisiong and doesn't
have thin provisining supporting tools installed,
configure will just print warning at the end of configure
process about limited support.
Boolean algebra changes for process_each_lv_in_vg().
1st.
Drop process_lv variable since it's not needed.
2nd.
process_lv was always initilized to 0 - so the condition was always true.
It the condition (!tags_supplied && !lvargs_supplied) evaluates as "true",
process_all is already set to 1, so skip vg tags evaluation.
3rd.
Move check for matching lv name in the front of lv tags check
since this check can't be skipped for lvargs_matched counter.
If this filter evaluates to true, skip lv tags evaluation.
Some devices, similarly to us, are not prepared after ADD event, but
after an extra CHANGE event when the device is properly set up.
This includes MD and loop devices. This patch fixes the
SYSTEMD_READY assignment that is crucial for proper functionality
of SYSTEMD_WANTS that we use to instantiate a lvm2-pvscan@.service
systemd service to activate the VG/LVs (see also bug
info).
All that extra handling of foreign devices should eventually be moved
to rules which process those devices primarily (MD and loop)! We should
only check a dedicated variable whether the device is usable or not.
Thin kernel target 1.9 still does not support online resize of
thin pool metadata properly - so disable it with expectation
for much higher version - and reenable after fixing kernel.
If we are stuck in user for too long without output,
grab kernel stack traces.
If we just produce too many lines of output, it's
not probably kernel related bug.
When the test is interrupted because debug.log has got to big,
and the test doesn't react on SIGINT - and needs to be only
killed with SIGKILL - it's still valuable to print at least
a portion of this debug.log (currently 4MB).
LVM_TEST_UNLIMITED could be set to avoid this limitation
(i.e. when busy-looping lvm command needs to be running
for gdb attachment)
Since activation takes only read-lock, there could be
multiple activation running in parallel.
So instead of checking before taking any real lock,
let the locking resolve the problem and just
detect if the reason for failure has been remote
exlusive activation.
It should be also faster, since each activation does
not need to do explicit lock query.
For merging thin snapshot we have to do couple extra
checks before we allow this operation.
We pretend thin-snapshot and thin-origin
are tied together and we have to properly
maintain locking.
The PIE and RELRO compiler/linker options can be used to produce a code
some techniques applied that makes the code more immune to some attacks:
- PIE (Position Independent Executable). It can make use of the ASLR
(Address Space Layout Randomization) provided by kernel to avoid
static locations for .text regions of executables (this is the 'pie'
compiler and linker option)
- RELRO (Relocation Read-Only). This prevents overwrite attacks of
the GOT (Global Offset Table) and PLT (Procedure Lookup Table)
used for relocations by making it read-only after all relocations
are resolved (these are the 'relro' and 'now' linker options) -
hence all symbols are resolved at the very start so there's no
need for those tables to be writeable later.
These compiler/linker options are now used by default for daemons
if the compiler/linker supports it.
In the case we have a dir with multiple objects and for
an individual object file we need special define -
allow to define it without adding extra rules.
To ensure dmeventd.o compilation will use EXTRA_FLAGS:
CFLAGS_dmeventd.o += $(EXTRA_FLAGS)
Then it's better to use:
dmeventd.o: CFLAGS += $(EXTRA_FLAGS)
At the end of lvconvert --snapshot with an active origin, the origin
gets reloaded.
Commit 57c0f72b1d ("lvconvert: use
_reload_lv on more places") accidentally replaced this with a snapshot
LV reload (which does nothing because only the origin is active).
Replacement of pv_read by find_pv_by_name in commit
651d5093ed caused spurious
error messages when running pvcreate or vgextend against an
unformatted device.
Physical volume /dev/loop4 not found
Physical volume "/dev/loop4" successfully created
Physical volume /dev/loop4 not found
Physical volume /dev/loop4 not found
Physical volume "/dev/loop4" successfully created
Volume group "vg1" successfully extended
Make it easier to run a live lvmetad in debugging mode and
to avoid conflicts if multiple test instances need to be run
alongside a live one.
No longer require -s when -f is used: use built-in default.
Add -p to lvmetad to specify the pid file.
No longer disable pidfile if -f used to run in foreground.
If specified socket file appears to be genuine but stale, remove it
before use.
On error, only remove lvmetad socket file if created by the same
process. (Previous code removes socket even while a running instance
is using it!)
Do not use signature wiping for newly created LVs in tests - we're
reusing the devs in tests and such detection could just interfere
inappropriately. We'd need to modify all tests to anwer the prompt
whether any signature found should be removed or not or we'd need
to use "-y" option for all lvcreates in tests. It's better to disable
this feature then and let's do a separate test to test this signature
wiping functionality.
If we're calling pvcreate on a device that already has a PV label,
the blkid detects the existing PV and then we consider it for wiping
before we continue creating the new PV label and we issue a warning
with a prompt whether such old PV label should be removed. We don't
do this with native signature detection code. Let's make it consistent
with old behaviour.
But still keep this "PV" (identified as "LVM1_member" or "LVM2_member"
by blkid) detection when creating new LVs to avoid unexpected PV label
appeareance inside LV.
Collapse 2 ifs and replace log_error() with log_warn(), since\
the reported message is not causing tools error.
(and cannot be probably triggered anyway).
Optimize and cleanup recently introduced new function wipe_lv.
Use compound literals to get nicely initialized wipe_params struct.
Pass in lv as explicit argument for wipe_lv.
Use cmd from lv structure.
Initialize only non-null members so it's easy to see what
is the special arg.
Drop find_merging_snapshot() function. Use find_snapshot()
called after check for lv_is_merging_origin() which
is the commonly used code path - so we avoid duplicated
tests and potential risk of derefering NULL point
in unhandled error path.
This is actually the wipefs functionailty as a matter of fact
(wipefs uses the same libblkid calls).
libblkid is more rich when it comes to detecting various
signatures, including filesystems and users can better
decide what to erase and what should be kept.
The code is shared for both pvcreate (where wiping is necessary
to complete the pvcreate operation) and lvcreate where it's up
to the user to decide.
The verbose output contains a bit more information about the
signature like LABEL and UUID.
For example:
raw/~ # lvcreate -L16m vg
WARNING: linux_raid_member signature detected on /dev/vg/lvol0 at offset 4096. Wipe it? [y/n]
or more verbose one:
raw/~ # lvcreate -L16m vg -v
...
Found existing signature on /dev/vg/lvol0 at offset 4096: LABEL="raw.virt:0" UUID="da6af139-8403-5d06-b8c4-13f6f24b73b1" TYPE="linux_raid_member" USAGE="raid"
WARNING: linux_raid_member signature detected on /dev/vg/lvol0 at offset 4096. Wipe it? [y/n]
The verbose output is the same output as found in blkid.
Use common wipe_lv (former set_lv) fn to do zeroing as well as signature
wiping if needed. Provide new struct wipe_lv_params to define the
functionality.
Bind "lvcreate -W/--wipesignatures y" with proper wipe_lv call.
Also, add "yes" and "force" to lvcreate_params so it's possible
to apply them for the prompt: "WARNING: %s detected on %s. Wipe it? [y/n]".
The wipe_known_signatures fn now wraps the _wipe_signature fn that is called
for each known signature (currently md, swap and luks). This patch makes the
code more readable, not repeating the same sequence when used anywhere in the
code. We're going to reuse this code later...
If the refresh fails for any reason before autoactivation, let's not
make this a stopper for autoactivation itself - just log the error
message if it appears.
The reason is that in some rare situations, we can still hit the
problem with the suspend call to fail (as already described in
commit d8085edf65, also
https://bugzilla.redhat.com/show_bug.cgi?id=1027314). The refresh
itself is done for only one reason - to refresh any dm tables
for LVs for which the underlying PVs got unplugged/disconnected
and then plugged/connected back (see also
https://bugzilla.redhat.com/show_bug.cgi?id=954061 for more info).
In this case, the major:minor pair is changed and we need to
update dm tables for LVs accordingly.
Now if refresh fails, the error is still logged, but autoactivation
continues.
If using lv/vgchange --sysinit -aay and lvmetad is enabled, we'd like to
avoid the direct activation and rely on autoactivation instead so
it fits system initialization scripts.
But if we're calling lv/vgchange --sysinit -aay too early when even
lvmetad service is not started yet, we just need to do the direct
activation instead without printing any error messages (while
trying to connect to lvmetad and not finding its socket).
This patch adds two helper functions - "lvmetad_socket_present" and
"lvmetad_used" which can be used to check for this condition properly
and avoid these lvmetad connections when the socket is not present
(and hence lvmetad is not yet running).
Whole struct will be set to 0, just
if the first member is array, gcc gives warning
we should initialized this element as array,
so pick any later simple type.
It will likely not fail to duplicate empty string, but
just keep the test of result of this function consistent.
Also on error path restore extent_size if in some
case someone would still use that variable.
Put common printf() case into a function and use
the string with text format as direct arg to make
the compile time validation of args easier and
code shorter.
Switch log_error() to log_warn(), since 'return 0'
doesn't cause any failure here.
Revert 4777eb6872 which put
target_present check into init_snapshot_merge(). However
this function is also used when parsing metadata. So we would
get this present test performed even when target is not really
needed. So move this target_present test directly into lvconvert.
Changed naming of methods from camel case to all lower case with
underscores per guidelines. Changed any methods that can be
static methods to static.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
The error buffer will stack error messages which is fine. However,
once you retrieve the error messages it doesn't make sense to keep
appending for each additional error message when running in the
context of a library call.
This patch clears and resets the buffer after the user retrieves
the error message.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Add a PV create which takes a paramters object that
has get/set method to configure PV creation.
Current get/set operations include:
- size
- pvmetadatacopies
- pvmetadatasize
- data_alignment
- data_alignment_offset
- zero
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=880395
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Moving the core functionality of vgreduce single into
lib/metadata/vg.c so that the command line and lvm2app library
can call the same core functionality. New function is
vgreduce_single.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
In a previous commit we added the ability for the library to do garbage
collection, to free all the memory used by the library handle buffer by closing
and re-opening the library handle. When we introduced this functionality we
also opened up the opportunity that the user of the python bindings to have
an object that references the old library handle. In this case if the user
tried to use the old object a segmentation fault could occur because the
memory had been previously freed.
This patch tries to mitigate this by storing a copy of the library handle that
was used when the object was created so that it can compare the current in
use pointer with what existed when the object was created. In the case where
they match the operation is permitted to continue, otherwise an exception is
throw, thus avoiding a segmentation fault.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
3.12.0 kernel prevents raid test to be usable,
leaving unremovable devices in table.
This needs to be fixed ASAP, meanwhile disable test to make
test machines at least usable.
Add 'can_use_16T' to detect systems where we could
safely use 16T devices without causing system deadlocks.
16T size leads on those to endless loops in udevd
- it calls blkid which tries cached read from such device
- this ends in endless loop.
Related problems:
https://bugzilla.redhat.com/show_bug.cgi?id=1015028
All labellers always use the "private" (void *) field as the fmt pointer. Making
this fact explicit in the type of the labeller simplifies the label reporting
code which needs to extract the format. Moreover, it removes a number of
error-prone casts from the code.
Only reading a single PV works correctly only in very limited circumstances.
Moreover, we can't rely on the MDA available on the PV either, since it may be
out of date in some circumstances (until now, we believed that PVs that have an
empty MDA are always orphans, but this is not 100% reliable either).
Add internal error warning when string value is used
as sort value for numerical field.
Using log_warn since the function itself does not return error,
so we do not confuse log_error() checker.
Fix buggy usage of "" (empty string) as a numerical string
value used for sorting.
On intel 64b platform this was typically resolve
as 0xffffff0000000000 - which is already 'close' to
UINT64_MAX which is used for _minusone64.
On other platforms it might have been giving
different numbers depends on aligment of strings.
Use proper &_minusone64 for sorting value when the reported
value is NUM.
Note: each numerical value needs to be thought about if it needs
default value &_zero64 or &_minusone64 since for cases, were
value of zero is valid, sorting should not be mixing entries
together.
Add wrapper function for dm_report_field_set_value() which returns void
and return 1, so the code could be shorter.
Add wrapper function for percent display _field_set_percent().
There's a tiny race when suspending the device which is part
of the refresh because when suspend ioctl is performed, the
dm kernel driver executes (do_suspend and dm_suspend kernel fn):
step 1: a check whether the dev is already suspended and
if yes it returns success immediately as there's
nothing to do
step 2: it grabs the suspend lock
step 3: another check whether the dev is already suspended
and if found suspended, it exits with -EINVAL now
The race can occur in between step 1 and step 2. To prevent
premature autoactivation failure, we're using a simple retry
logic here before we fail completely. For a complete solution,
we need to fix the locking so there's no possibility for suspend
calls to interleave each other to cause this kind of race.
This is just a workaround. Remove it and replace it with proper
locking once we have that in!
Failures in the temporary mirror used when up-converting cause dmeventd
to issue 'lvconvert --repair' on the sub-LV, <lv_name>_mimagetmp_?. The
'lvconvert' command refuses to deal with this sub-LV outright - it
expects to be given the name of the top-level LV. So, just like we do
with mirrored logs, we strip-off the portion of the name that is not
the top-level LV and issue the command on the top-level LV instead.
Send error message on stdout, since after _display_info_long()
command return errors.
Patch makes consistent behavior for command:
dmsetup info -c non-existing-dev
&
dmsetup info non-existing-dev
Now both commands report error on stderr when they return error status
for non-existing device.
This patch fixes mostly cluster behavior but also updates
non-cluster reaction where calls like 'lvchange -aln'
lead to incorrect errors for some segment types.
Fix the implicit activation rules where some segment types could
be activated only in exclusive mode in cluster.
lvm2 command was not preserver 'local' property and incorrectly
converted local activations in to plain exclusive, so the local
activation could have activate volumes exclusively, but remotely.
If the volume_list filters out volume from activation,
it is still success result for this function.
Change the error message back to verbose level.
Detect if the volume is active localy before zeroing,
so we report error a bit later for cases, where volume
could not be activated because it doesn't pass through volume
list (but user still could create volume when he disables
zeroing)
Correct return code of activate_lv_excl().
Function is not supposed to return activation state of
activated volume, but return code of the operation.
Since i.e. when activation filter is allowing to activate
volume on current system, it is still success even though
no volume is activated.
MD can directly create partition devices without a need to run
an extra kpartx or partprobe call. We need to react to this event in
a different way as for bare MD devices - we need to handle the ADD event
for KERNEL=="md[0-9]*p[0-9]*" kernel name and trigger the LVM scanning
to update lvmetad to trigger autoactivation and so on...
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1023250
This is an addition to original patch for lvcreate - commit 039bdad.
The same principle applies to lvconvert where there are several steps
during which we need to wipe the existing LV that's being converted
to thin pool, making sure there's no other interference from outside (udev).
Reset the DM_UDEV_OTHER_RULES_FLAG to original value right at the
time of dropping the DM_NOSCAN flag.
When DM_NOSCAN is set, the DM_UDEV_DISABLE_OTHER_RULES_FLAG is also set
to avoid udev processing in "other/foreign" rules. If the noscan flag
is dropped, the DM_UDEV_DISABLE_OTHER_RULES_FLAG should be reset to
its original value.
Also, lvmetad should respect the DM_UDEV_DISABLE_OTHER_RULES_FLAG
because if the volume is set with this flag it:
- definitely is not a top-level device (so makes no sense for lvmetad scanning)
- is not supposed to be scanned further (for any stacking on top of
it, including LVM stacking itself and any autoactivation of stacked LVs)
Remove conditional that boils down to "if yes or no, then do". The
previous condition in the statement is sufficient and the extra
(always true) condition is unnecessary.
This fixes a bug in commit 19baf842 where verify_message
was rejecting the CLVMD_FLAG_REMOTE flag. It was missed
since the patch was ported from an lvm version where that
flag does not exist.
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O. When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices. This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.
Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel. While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely. This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.
I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs). If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command. This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror. They
can then release their lock and allow a repair command to commence. In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.
This patch brings solution #1. If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
The blkdeactivate script iterates over the list of devices if they're
given as an argument and it tries to umount/deactivate them one by one.
This iteration failed to proceed if any of the umount/deactivation
was unsuccessful - there was a missing "shift" call to move to the
next argument (device) for processing. As a result of this, the same
device was tried again and again, causing an endless loop, never
proceeding to the next device given.
When using ENV{SYSTEMD_WANTS}=lvm2-pvscan@... to instantiate a service
for lvmetad scan when the new PV appears in the system, the service
is started and executed. However, to track device removal, we need
to bind it (the "BindsTo" systemd directive) to a certain .device
systemd unit.
In default systemd setup, the device is tracked by it's name and
sysfs path (there's normally a sysfs path .device systemd unit for
a device and then the device name .device unit as an alias for it).
Neither of these two is useful for lvmetad update as we need to bind
it to device's <major>:<minor> pair.
The /dev/block/<major>:<minor> is the essential symlink under /dev
that exists for each block device (created by default udev rules
provided by udev directly). So let's use this as an alias for
the device's .device unit as well by means of "ENV{SYSTEMD_ALIAS}"
declaration within udev rules which systemd understands (this will
create a new alias "dev-block-<major>:<minor>.device".
Then we can easily bind the "dev-block-<major>:<minor>" device
systemd unit with instantiated lvm2-pvscan@<major>:<minor>.service.
So once the device is removed from the systemd, the
lvm-pvscan@<major>:<minor>.service executes it's ExecStop action
(which in turn notifies lvmetad about the device being gone).
This completes the udev-systemd-lvmetad interaction then.
Before, pvscan recognized either:
pvscan --cache --major <major> --minor <minor>
or
pvscan --cache <DevicePath>
When the device is gone and we need to notify lvmetad about device
removal, only --major/--minor works as we can't translate DevicePath
into major/minor pair anymore. The device does not exist in the system
and we don't keep DevicePath index in lvmetad cache to make the
translation internally into original major/minor pair. It would be
useless to keep this index just for this one exact case.
There's nothing bad about using "--major <major> --minor <minor>",
but it makes our life a bit harder when trying to make an
interconnection with systemd units, mainly with instantiated services
where only one and only one arg can be passed (which is encoded in the
service name).
This patch tries to make this easier by adding support for recognizing
the "<major>:<minor>" as a shortcut for the longer form
"--major <major> --minor <minor>". The rule here is simple: if the argument
starts with "/", it's a DevicePath, otherwise it's a <major>:<minor> pair.
There is no point eating stderr for these commands. In fact the
redirect causes confusion and hurts dubugging.
Also reword an error message if the pvs command fails so as not be
certain that a device is not a PV. Coupled with removing the stderr
redirect this will improve the user experience in the face of errors.
The new lvm2-pvscan@.service is responsible for on-demand execution
of "pvscan --cache --activate ay" which causes lvmetad to be
updated and LVM activation done if the VG is complete.
Also, use udev-systemd mechanism to instantiate the job as the
lvm2-pvscan@$devnode.service on each newly appeared PV in the system.
This prevents the background job to be killed (that would happen
if it was directly forked from udev rule - this behaviour is seen
in recent versions of udev with the help of systemd that can track
detached processes - the detached process would still be in the same
cgroup).
To enable this official udev-systemd protocol for instantiating
background jobs, use new --enable-udev-systemd-background-jobs
configure switch (it's disabled by default). This option is highly
recommended wherever systemd is used!
Reverts previously added udevsettle call.
Seems to be unrelated, while udev on old system may take over 10
minutes, to finish it's very slow and CPU intensive work, it doesn't
interact directly with created device, only access /dev/mapper/control
node via dmsetup, so the device is ocasionaly blocked by something else.
Patch helps a bit when lvm2 is build with disabled udev_sync support,
but udevd runs in the system - so it randomly influences unrelated tests
even - so before every test wait at least till udevd is settled.
On modern systems udev manages nodes in /dev/mapper directory.
It creates, deletes and renames the nodes according to the
state of the kernel driver.
When the dmsetup is compiled without udev support (--enable-udev_sync)
and runs on the system with running udevd it tries to manage nodes in
/dev/mapper too, so it can race with udev.
dmsetup checks if the node was created/deleted/renamed with the stat
syscall, and skips the operation if it was. However, if udev
creates/deletes/renames the node after the stat syscall and before the
mknod/unlink/rename syscall, dmsetup reports an error.
Since in the system everything happened as expected, skip reporting
error for such case.
These races can be easily provoked by inserting sleep at appropriate
places.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
This file may be included by other programs, so it should be compliant
with the C standard.
* use __linux__ instead of linux - __linux__ is always defined, linux is
not defined when gcc runs in standard-compliant mode (with -std=c89 or
-std=c99) because the C standard doesn't allow polluting namespace
with arbitrary defines.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Initial testing of thin pool's metadata with thin repairing tools.
Try to use tools from configuration settings, but allow them
to be overriden by settings of these variables:
LVM_TEST_THIN_CHECK_CMD,
LVM_TEST_THIN_DUMP_CMD,
LVM_TEST_THIN_REPAIR_CMD
FIXME: test reveals some more important bugs:
pvremove -ff also needs --yes
vgremove -ff doesn not remove metadata when there are no real LVs.
vgreduce is not able to reduce VG with pool without pool's PVs
Prohibit conversion of pool device with active thin volumes.
Properly restore active states only for active thin pool volume.
Use new LV_NOSCAN when converting volume into thin pool's metadata.
This patch reinstates the lv_info call to check for open count of
the LV we're removing/deactivating - this was changed with commit 125712b
some time ago and we relied on the ioctl retry logic deeper in the libdm
while calling the exact 'remove' ioctl.
However, there are still some situations in which it's still required to
check for open count before we do any 'remove' actions - this mainly
applies to LVs which consist of several sub LVs, like it is for
virtual snapshot devices.
The commit 1146691 fixed the issue with ordering of actions during
virtual snapshot removal while the snapshot is still open. But
the check for the open status of the snapshot is still prone to
marking the snapshot as in use with an immediate exit even though
this could be a temporary asynchronous open only, most notably
because of udev and its WATCH udev rule with accompanying scans
for the event which is asynchronous. The situation where this crops
up most often is when we're closing the LV that was open for read-write
and then calling lvremove immediately.
This patch reinstates the original lv_info call for the open status
of the LV in the lv_check_not_in_use fn that gets called before
we do any LV removal/deactivation. In addition to original logic,
this patch adds its own retry loop with a delay (25x0.2 seconds)
besides the existing ioctl retry loop.
Component LVs of a thinpool can be RAID LVs. Users who attempt a
scrubbing operation directly on a thinpool will be prompted to
specify the sub-LV they wish the operation to be performed on. If
neither of the sub-LVs are RAID, then a message telling them that
the operation can only be performed on a RAID LV will be given.
Split image should have an out-of-sync attr ('I') - always. Even if
the RAID LV has not been written to since the LV was split off, it is
still not part of the group that makes up the RAID and is therefore
"out-of-sync".
Reshape code a bit to make sockepair 'swappable' with plain old pipe
call.
Display status for FAILED error.
Increase buffer to hold always at least 1 page size.
Print error results with capitals.
Since the virtual snapshot has no reason to stay alive once we
detach related snapshot - deactivate whole thing in front of
snapshot removal - otherwice the code would get tricky for
support in cluster.
The correct full solution would require to have transactions
for libdm operations.
Also enable to the check for snapshot being opened prior
the origin deactivation, otherwise we could easily end
with the origin being deactivate, but snapshot still kept
active, desynchronizing locking state in cluster.
Addendum to commit ce7489e which introduced a new *internal* LV_NOSCAN
flag and so it needs to be marked that way properly otherwise it
ends up unrecognized and improperly handled during metadata export.
Recognize DM_SUBSYSTEM_UDEV_FLAG0 which for LVM is the "LVM_NOSCAN"
flag that causes the scanning to be skipped (mainly blkid) and
also directs all the foreign rules to be skipped as well.
Important thing here is that the "watch" udev rules is still set
as well as the /dev/disk/by-id content created (which does not
require any scanning to be done). Also, the flag is dropped on
any subsequent event and scanning done...
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.
This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
Patch 562ad293fd introduced code regression
when LV was converted to a thin LV with external origin and at the same time,
conversion of LV to a thin pool has been requested.
(RHBZ: #997704)
data_lv needs to be assigned after test for external conversion find pool.
cfg(config_checks_CFG,"checks",config_CFG_SECTION,0,CFG_TYPE_BOOL,1,vsn(2,2,99),"Configuration tree check on each LVM command execution.")
cfg(config_abort_on_errors_CFG,"abort_on_errors",config_CFG_SECTION,0,CFG_TYPE_BOOL,0,vsn(2,2,99),"Abort LVM command execution if configuration is invalid.")
cfg(config_profile_dir_CFG,"profile_dir",config_CFG_SECTION,0,CFG_TYPE_STRING,0,vsn(2,2,99),"Directory with configuration profiles.")
cfg_runtime(config_profile_dir_CFG,"profile_dir",config_CFG_SECTION,0,CFG_TYPE_STRING,vsn(2,2,99),"Directory with configuration profiles.")
/* RAID devices are laid-out in metadata/data pairs */
if(!(seg_lv(seg,s)->status&RAID_IMAGE)||
!(seg_metalv(seg,s)->status&RAID_META)){
if(!lv_is_raid_image(seg_lv(seg,s))||
!lv_is_raid_metadata(seg_metalv(seg,s))){
log_error("RAID segment has non-RAID areas");
return0;
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.