IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add "size" and "size_seqno" to struct device to cache device's size
and also to control its lifetime - the cached value is valid as long
as the global _dev_size_seqno is equal to the device's size_seqno,
otherwise we need to get the size again and cache the new value.
This patch also adds new dev_size_seqno_inc() fn for the appropriate
parts of the code to increment current global value of _dev_size_seqno
and hence to cause all currently cached values for device sizes to
be invalidated.
The device size is now cached because we're planning to reuse this
information for further checks and we want to avoid checking it more
than necessary to save resources.
Fix regression caused by c9f021de0b.
This commit actually transfered real-action (e.g. device removal)
into the next loop which has however missed to check for break.
So add check for break also there.
When creating a list in 'context of command' - use proper mempool.
vg->vgmem is mempool related to VG metadata - and can be eventually
locked read-only when VG struct is shared.
W: manual-page-warning /usr/share/man/man8/lvm.8.gz 491: warning: macro `_cdata',' not defined
rpmlint actually notices we had few hidden word in man page.
the line cannot start with apostrophe as it has then a different
meaning.
If not using explicit --enable-blkid-wiping/--disable-blkid-wiping
configure option, the configure script tries to enable/disable blkid
wiping feature automatically based on blkid library version found.
The script incorrectly set default value for lvm.conf's
allocation/use_blkid_wiping" setting to "1" (enabled) if proper
blkid library version was not found or the version found was less
than the minimum required. It should be set to "0" in this case.
The extent size must fits all blocks in 4294967295 sectors
(in 512b units) this is 1/2 KiB less then 2TiB.
So while previous statement 'suggested' 2TiB is still acceptable value,
make it clear it's not.
As now we support any multiples of 128KB as extent size -
values like 2047G will still 'flow-in' otherwise the largest power-of-2
supported value is 1TiB.
With 1TiB user needs 8388608 extents for 8EiB device.
(FYI such device is already unusable with todays glibc-2.22.90-27)
4GiB extent size is currently the smallest extent size which allows
a user to create 8EiB devices (with 2GiB it's less then 8EiB).
TODO: lvm2 may possibly print amount of 'lost/unused space' on a PV,
since using such ridiculously sized extent size may result in huge
space being left unaccessible.
Since commit 2fc126b00d, the library
code requires udev to be initialised for device scanning and
clvmd can fail to find VGs if devices/external_device_info_source
is set to "udev".
There are two basic groups of fields for LV segment device reporting:
- related to LV segment's devices: devices and seg_pe_ranges
- related to LV segment's metadata devices: metadata_devices and seg_metadata_le_ranges
The devices and metadata_devices report devices in this format:
"device_name(extent_start)"
The seg_pe_ranges and seg_metadata_le_ranges report devices in
this format:
"device_name:extent_start-extent_end"
This patch reverts partly what commit 7f74a99502
(v 2.02.140) introduced in this area - it added [] for
hidden devices to mark them for all four fields mentioned above.
We won't be marking hidden devices in devices and metadata_devices
fields.
The seg_metadata_le_ranges field will have hidden devices marked -
it's new enough that we don't need to care about compatibility much
yet.
The seg_pe_ranges is old enough that we shouldn't be changing this
one - so we're reverting to not marking hidden devices here.
Instead, there's going to be a new field "seg_le_ranges" which
is going to replace the seg_pe_ranges and it will mark hidden devices -
this is going to be introduced in a patch later.
So in the end we'll end up with:
(LV segment's devices)
devices field with "device_name(extent_start)" format, not marking hidden devices
seg_pe_ranges field with "device_name:extent_start-extent_end" format, not marking hidden devices (deprecated, new seg_le_ranges should be used instead for standardized format)
seg_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
(LV segment's metadata devices)
metadata_devices field with "device_name:extent_start-extent_end" format, not marking hidden devices
seg_metadata_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
Also, both seg_le_ranges and seg_metadata_le_ranges will honour the
report/list_item_separator setting which can be used to configure
the delimiter used for list items.
So, to sum it up, we will recommend using the new seg_le_ranges and
seg_metadata_le_ranges fields because they display devices with
standard extent range format, they can mark hidden devices and they
honour the report/list_item_separator setting.
We'll be keeping devices,seg_pe_ranges and metadata_devices fields
for compatibility.
The associated devices,metadata_devices,seg_pe_ranges and
seg_metadata_le_ranges are reported as genuine string lists now.
This allows for using the items separately in -S|--select
(so searching for subsets etc.) and also it allows for
configuring the separator using report/list_item_separator
which may be useful in scripts (however, we'll enable this
only for seg_le_metadata_ranges and not for devices,seg_pe_ranges
and seg_metadata_devices for compatibility reasons - see following
patch).
Add a comment in _process_pvs_in_vg() to document the
place where there have been problems with processing
PVs twice.
For a while we had a hacky workaround here where we'd
skip processing a PV if its device wasn't found in
all_devices (and !is_missing_pv since we want to
process PVs with missing devices.). That workaround
was removed in commit 5cd4d46f because it was no
longer needed.
The workaround had originally been needed to prevent
a device from being processed twice when the PV had
no MDAs -- it would be processed once in its real VG
and then the workaround would prevent it from being
processed a second time in the orphan VG.
Wrongly appearing as an orphan likely happened because
lvmcache would consider the no-MDA PV an orphan unless
the real VG holding that PV was also in lvmcache.
This issue is also mentioned in pvchange where holding
the global lock allows VGs to remain in lvmcache so
PVs with 0 mdas are not considered orphans.
The workaround in _process_pvs_in_vg() was originally
intended for reporting commands, not for pvchange.
But, it was accidentally helping pvchange also because
the method described by the pvchange global lock
comment had been subverted by commit 80f4b4b8.
Commit 80f4b4b8 was found to be unnecessary, and was
reverted in commit e710bac0. This restored the
intended global lock lvmcache effect to pvchange, and
it no longer relied on the workaround in toollib.
When reporting on LVs, take the end of the range from the size of the
underlying (hidden) LV rather than the logical size of the current
segment (that PVs use).
Previously, pvmove used the function find_pv_in_vg() which did the
equivalent of process_each_pv() by doing:
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
With the found 'pv', it would do vg_read() on pv_vg_name(pv),
and then do the actual pvmove processing.
This commit simplifies by using process_each_pv() and putting
the actual pvmove processing into the "single" function.
This eliminates both find_pv_by_name() and the vg_read().
The processing code that followed vg_read remains the same.
The return code for the pvmove command is not based on the
process_each_pv return code, but is based on the success/fail
conditions in the existing code.
Make the lvb validation rules for convert match
those for unlock (even though it would be very
unlikely or impossible for convert to deal with
zero lvb.)
When an orphan PV is changed/resized, the
lvmlockd global lock is converted from sh
to ex. If the command is changing two
orphan PVs, the conversion to ex should
be done only once.
Existing cache_settings field displays the settings which are
saved in metadata. Add new kernel_cache_settings fields to display
the settings which are currently used by kernel, including fields
for which default values are used.
This way users have complete view of the set of cache settings
supported (and which they can set) and their values which are used
at the moment by kernel.
For example:
$ lvs -o name,cache_policy,cache_settings,kernel_cache_settings vg
LV Cache Policy Cache Settings KCache Settings
cached1 mq migration_threshold=1024,write_promote_adjustment=2 migration_threshold=1024,random_threshold=4,sequential_threshold=512,discard_promote_adjustment=1,read_promote_adjustment=4,write_promote_adjustment=2
cached2 smq migration_threshold=1024 migration_threshold=1024
cached3 smq migration_threshold=2048
Fix lvm2app to return either 0 or 1 for lvm_vg_is_{clustered,exported},
including internal functions pvseg_is_allocated and vg_is_resizeable
which are not yet exposed in lvm2app but make them consistent with the
rest.
This reverts e28e22b9e1
The problem that that commit was fixing (pytest failure)
no longer appears with the current code, so the commit is
not needed.
That commit is a problem for pvchange, because it prevents
lvmcache from retaining VG metadata even while the global
lock is held. pvchange holds the global lock to ensure
that VG metadata is kept in lvmcache throughout processing.
If the cache is not kept, a PV with zero MDAs will appear
first in its actual VG and then appear again in the orphan VG.
It wrongly appears a second time in the orphan VG only if
the actual VG is dropped from lvmcache.
Thin pool discard mode set in metadata can be different from the one
actually used if any device underneath does not support that mode. Add
kernel_discard report field to make it possible to see this difference.
Internal _alloc_init() is only called from allocate_extents(),
which already does prevent usage of virtual segments.
So mark as internal error early and do not process it any further.
Add new test for lv_is_snapshot().
Also move few other bitchecks into same place as remaining bit tests.
TODO: drop lv_is_merging_origin() and keep using lv_is_merging().
The problem addressed by this workaround no longer
seems to exist, so remove it. PVs with no mdas
no longer appear in both their actual VG and in
the orphan VG.
Include brackets for the name if the dev is invisible.
This change applies to all callers of _format_pvsegs fn:
- lvseg_devices (the "lvs -o devices")
- lvseg_metadata_devices (the "lvs -o metadata_devices)
- lvseg_seg_pe_ranges (the "lvs -o seg_pe_ranges")
- lvseg_seg_metadata_le_ranges (the "lvs -o seg_metadata_le_ranges")
The common lv_pool_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_metadata_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_data_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_mirror_log_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_origin_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_convert_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
Use common _lvname_disp to report lv_parent. The _lvname_disp
takes care of properly marking LVs which are not visible - such
LVs are always enclosed in brackets when reported within any
other field.
For example, thin pool over RAID.
Before:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] cache_pool_tdata
[cache_pool_tdata_rimage_1] cache_pool_tdata
[cache_pool_tdata_rmeta_0] cache_pool_tdata
[cache_pool_tdata_rmeta_1] cache_pool_tdata
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] cache_pool_tmeta
[cache_pool_tmeta_rimage_1] cache_pool_tmeta
[cache_pool_tmeta_rmeta_0] cache_pool_tmeta
[cache_pool_tmeta_rmeta_1] cache_pool_tmeta
[lvol0_pmspare]
With this patch applied:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] [cache_pool_tdata]
[cache_pool_tdata_rimage_1] [cache_pool_tdata]
[cache_pool_tdata_rmeta_0] [cache_pool_tdata]
[cache_pool_tdata_rmeta_1] [cache_pool_tdata]
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] [cache_pool_tmeta]
[cache_pool_tmeta_rimage_1] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_0] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_1] [cache_pool_tmeta]
[lvol0_pmspare]
Do not mix dm_report_field_set_value and _field_set_value and
use single function call throughout for clarity. The same applies
for dm_report_field_string and _string_disp.
Fix regression caused by commit c2d4330f27
which removed the dm_pool_strdup for the cache policy name in
_cache_policy_disp report function.
This regression was hit with buffered reporting only (which is
used by default). The reason is that for buffered reporting, we're
iterating over LVs in VG (process_each_lv) while gathering
all the information that is needed for the report. In this case,
the LV's cache policy name has not been duped, but only the pointer
to the original VG buffer was stored. When the LV iteration finished,
the VG buffer was freed and any report to output called later
(dm_report_output call) accessed already freed VG data.
This didn't appear if unbuffered reporting was used (--unbuffered)
because in this case, the data were reported to output as
soon as they were processed, hence it was reported to output
before the VG data was freed.
The lvm2-activation{-early,-net}.service systemd unit statuses were missing
in dump gathered by lvmdump -s. These are quite important when debugging
scenarios with systemd environment and where lvmetad is not used.
Have commands send lvmlockd the update message
in vg_write instead of vg_commit, so that it's
not done while LVs are suspended. If the vg_write
is not committed, and the seqno sent to lvmlockd
is not used, then lvmlockd can detect this when
the next update uses the same seqno.
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
Also always clear the internal lvmcache after rescanning, and
reinstate a test for --trustcache so that 'pvs --trustcache'
(for example) avoids rescanning.
Before commit c1f246fedf,
_get_all_devices() did a full device scan before
get_vgnameids() was called. The full scan in
_get_all_devices() is from calling dev_iter_create(f, 1).
The '1' arg forces a full scan.
By doing a full scan in _get_all_devices(), new devices
were added to dev-cache before get_vgnameids() began
scanning labels. So, labels would be read from new devices.
(e.g. by the first 'pvs' command after the new device appeared.)
After that commit, _get_all_devices() was called
after get_vgnameids() was finished scanning labels.
So, new devices would be missed while scanning labels.
When _get_all_devices() saw the new devices (after
labels were scanned), those devices were added to
the .cache file. This meant that the second 'pvs'
command would see the devices because they would be
in .cache.
Now, the full device scan is factored out of
_get_all_devices() and called by itself at the
start of the command so that new devices will
be known before get_vgnameids() scans labels.
Since we mark cache-pool as 'hidden/private' while it is in-use,
we may still allow user to change it's name.
It should not cause any harm and user may prefer better naming
for a cache-pool in use.
If an existing fifo has the wrong attributes it cannot be trusted
so we must unlink it and recreate it correctly.
(Replaces 2c8d6f5c90: if the other end of
the fifo already got opened while its mode was insecure, delaying the
chmod isn't going to make any difference!)
Reinstate and extend checks removed by e1b111b02a.
The code has always assumed that only root has access to the directory
containing the fifos and that they are under the complete control of
dmeventd code. If anything is found not to be as expected, then open()
should certainly not be attempted!
It's getting a bit more complex here.
Basic idea behind is - check_current_backup() should not
log error when a user is using a read-only filesystem,
so e.g. vgscan will not report any error when it tries
to take missing backup.
We still have cases when error could be reported though,
e.g. the backup this would be a symbolic link, but these
are rather misconfiguration and unexpected case.
We have to modes of 'archive()' usage -
1. compulsory - fail stops command and user may try '-An' option
to do a command.
2. non-compulsory - some fails in archiving are ignorable (i.e.
read-only filesystem where archive dir is located).
Those 2 cases needs to be properly handle - i.e. the non-compulsory
logging should not be tampering error logging message production.
So more work here is needed
Pass full buffer size to printf() function - no reason to make
buffer 1 char smaller.
Also rename locn buffer to message buffer directly since it's
not used for anything else.
TODO: we may use same buffer also for 'buf[]' since there is
no collision - so may safe 1K on stack usage.
In general, --select should be used to specify a VG by UUID,
but vgrename already allows a uuid to be substituted for
the name, so continue to allow it in that case.
If the VG arg from the command line does not match the
name of any known VGs, then check if the arg looks like
a UUID. If it's a valid UUID, then compare it to the
UUID of known VGs. If it matches the UUID of a known VG,
then process that VG.
Pass the single vgname as a new process_each_vg arg
instead of setting a cmd flag to tell process_each_vg
to take only the first vgname arg from argv.
Other commands with different argv formats will be
able to use it this way.
When two different VGs with the same name exist,
they are both stored in lvmcache using the vginfo->next
list. Previously, the code would print warnings (sometimes)
when adding VGs to this list. Now the duplicate VG names
are handled by higher level code, so this list no longer
needs to print warnings about duplicate VG names being found.
After recent changes to process_each, vg_read() is usually
given both the vgname and vgid for the intended VG.
However, in some cases vg_read() is given a vgid with
no vgname, or is given a vgname with no vgid.
When given a vgid with no vgname, vg_read() uses lvmcache
to look up the vgname using the vgid. If the vgname is
not found, vg_read() fails.
When given a vgname with no vgid, vg_read() should also
use lvmcache to look up the vgid using the vgname.
If the vgid is not found, vg_read() fails.
If the lvmcache lookup finds multiple vgids for the
vgname, then the lookup fails, causing vg_read() to fail
because the intended VG is uncertain.
Usually, both vgname and vgid for the intended VG are passed
to vg_read(), which means the lvmcache translations
between vgname and vgid are not done.
If two different VGs with the same name exist on the system,
a command that just specifies that ambiguous name will fail
with a new error:
$ vgs -o name,uuid
...
foo qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
foo vfhKCP-mpc7-KLLL-Uh08-4xPG-zLNR-4cnxJX
$ lvs foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ vgremove foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ lvs -S vg_uuid=qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
lv1 foo ...
This is implemented for process_each_vg/lv, and works
with or without lvmetad. It does not work for commands
that do not use process_each.
This change includes one exception to the behavior shown
above. If one of the VGs is foreign, and the other is not,
then the command assumes that the intended VG is the local
one and uses it.
This makes process_each_vg/lv always use the list of
vgnames on the system. When specific VGs are named on
the command line, the corresponding entries from
vgnameids_on_system are moved to vgnameids_to_process.
Previously, when specific VGs were named on the command
line, the vgnameids_on_system list was not created, and
vgnameids_to_process was created from the arg_vgnames
list (which is only names, without vgids).
Now, vgnameids_on_system is always created, and entries
are moved from that list to vgnameids_to_process -- either
some (when arg_vgnames specifies only some), or all (when
the command is processing all VGs, or needs to look at
all VGs for checking tags/selection).
This change adds one new lvmetad lookup (vg_list) to a
command that specifies VG names. It adds no new work
for other commands, e.g. non-lvmetad commands, or
commands that look at all VGs.
When using lvmetad, 'lvs foo' previously sent one
request to lvmetad: 'vg_lookup foo'.
Now, 'lvs foo' sends two requests to lvmetad:
'vg_list' and 'vg_lookup foo <uuid>'.
(The lookup can now always include the uuid in the request
because the initial vg_list contains name/vgid pairs.)
When not using lvmetad, this uses the system_id field in
the cached vginfo structs that are populated during a scan.
When using lvmetad, this requests the VG from lvmetad, and
checks the system_id field in the returned metadata.
When the command already knows both the vgid and vgname,
it should send both to lvmetad for a more exact request,
and it can save lvmetad the work of a name lookup.
Remove long outstand unused code lines, which were already
been obsoleted by other code.
Statuses and snapshot tree creation is already handled differently.
Also drop some 'extra' log_error() and use only stack;
since error has already been reported.
Since we do not use dev_manager in a way we would have destroyed VG
content while in-use - we could safely keep just pointer.
So dropping strdup.
Also it seems we actually no longer use vg_name for anything
so it may possibly go away completely unless it would be useful
for debugging...
Just for convenience to display all new configuration settings
introduced since given version (before, there was only --atversion
to display settings introduced in concrete version).
For example:
$ lvmconfig --type new --sinceversion 2.2.120
allocation {
# cache_mode="writethrough"
# cache_settings {
# }
}
global {
use_lvmlockd=0
# lvmlockd_lock_retries=3
# sanlock_lv_extend=256
use_lvmpolld=1
}
activation {
}
# report {
# compact_output_cols=""
# time_format="%Y-%m-%d %T %z"
# }
local {
# host_id=0
}
Unifying terminology.
Since all the metadata in-use are ALWAYS on disk - switch
to terminology committed and precommitted.
Patch has no functional change inside.
lv preload for detached LVs started to be used also
for various other types which just happens to pass through
weak if() condition.
TODO: find here better solution to rather explicitly check
for types we really need to preload.
We do not won't to 'expose' internals of VG struct.
ATM we use lists to keep all LVs - we may want to switch
to better struct for quicker 'search'.
Since we do not need 'lists' but always actual LV,
switch find_lv_in_vg_by_lvid() to return LV,
and replaces some use case of find_lv_in_vg()
with 'better' working find_lv() which already
returns LV.
When 'lvextend -L+XX vg/thinpool' do not leave inactive table
loaded for 'wrapping' LV on top of resized thin-pool
(ATM we use linear LV for this with same size as thin-pool).
Udev recently start to 'link-in' major amount of useless libs.
(Seem to be faulty 'systemd' link-in all issue)
Anyway - avoid locking those libs in RAM.
When preloading thin-pool device node for already
existing/running thin-pool do not resume such thin-pool.
This allows to properly schedule commit point for metadata,
when thin-pool data or metadata volume is resized.
Extra space between 'cache' target and metadata device caused
string comparation being not equal and thus always causing
table reload even when uneeded.
Avoid internal error message where thin pool repair code tries to
fix cache pool - was catched later in code stack, so rather
catch this early and make the repair function exlusive
to thin pools.
So far we have no code for repairing cache pools
(other then the automatic during activation/deactivation).
To handle multiple VGs with the same name.
Simply using the VG name is ambiguous, and
lvmetad requires the VG uuid be used to
specify which one is meant.
Coverity here is a bit 'blind' here and cannot resolve which
code paths are actually able to hit this code path.
(It's using 'statistic' to resolve all possible paths,
and it's not scanning 'individual' code paths.)
This just cleans warns and add 'cheap' tests.
Use 'mda' instead of NULL to quite Coverity warn.
However this code seems to be actually not even possible to hit.
With proper analysis it may possibly be dropped from code to
simplify logic.
Skip testing target_pvs for NULL, we already
dereference it in many other places.
If check would ever be needed - it needs to be
in front of _raid_extract_images().
In lookup, return a count of entries with the
same key rather than the value from a second
entry with the same key.
Using some slightly different names.
Simply use lookup_withval right away rather than doing a
standard lookup, checking for the wrong mapping, then
repeating with lookup_withval to get the right mapping.
Coverity is not able to understand assembly language in
system's header file, so provide model for such macro.
Note: to really see model in-use: #nodef FD_ZERO model_FD_ZERO
need to go to coverity/config/user_nodefs.h
Add missing display_lvname in _lvconvert_merge_thin_snapshot().
Also when we detect missing origin, report Internal error,
which would likely be the primary fault here
(and avoid dereft of NULL origin as noticed by Coverity).
When reading older lvm2 metadata for cache-pool - we now handle more
extended syntax - basically we want to enter most setting when
actually creating cached LV.
For this new validation code has been added. However older metadata
without new settings set is now found as invalid.
Fix it by adding default settings for cache policy mq
and cache mode writethrough.
If the data len is passed into the hash table
and saved there, then the hash table internals
do not need to assume that the data value is
a string at any point.
New hash table functions are added that allow for
multiple entries with the same key. Use of the
vgname_to_vgid hash table is converted to these
new functions since there are multiple entries
in vgname_to_vgid that have the same key (vgname).
When multiple VGs with the same name exist, commands
that reference only a VG name will fail saying the
VG could not be found (that error message could be
improved.) Any command that works with the select
option can access one of the VGs with -S vg_uuid=X.
vgrename is a special case that allows the first VG
name arg to be replaced by a uuid, which also works.
(The existing hash table implementation is not well
suited for handling this case, but it works ok with
the new extensions. Changing lvmetad to use its own
custom hash tables may be preferable at some point.)
Here Coverity cannot see the pointer cannot be NULL in this
code path - opened coverity case #00531860.
We could make a model to avoid seeing related reports,
but then we loose coverage for modeled function.
So decided to add minor hint for this case.
Recent change 2c8d6f5c90
actually droped restart when the reason of failing open is missing
device completely - check for ENOENT now as another reason
to start new dmeventd server (when there is no systemd to maintain it).
The udev_device_get_is_initialized is available since libudev version
165. Older versions are still used somewhere (e.g. RHEL6). So better
check for this fn and use it only if it's available.
Udev db records are marked as not initialized (incomplete) on timeout.
Issue an error message whenever LVM finds such records so users are
aware that something's going wrong with udev db.
This is important in case we use devices/external_device_info_source="udev"
where udev database records are used to do various filtering decisions.
For example:
udev log of timed out worker:
Nov 11 13:02:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' is taking a long time
Nov 11 13:04:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' killed
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] terminated by signal 9 (Killed)
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] failed while handling '/devices/virtual/block/dm-2'
...
LVM also issues error message visibly if incomplete udev db record is found,
devices/external_device_info_source="udev" is set:
$ pvs
Udev database has incomplete information about device /dev/dm-2.
Failed to get external handle for device /dev/dm-2 [udev].
...
Coverity noticed this condition is always false and the error
path could never be visited.
So check for all mismatches of supported messages
and actually mark log_error as internal error.
When the first arg is a UUID and vgrename translates
that UUID to a current VG name, the old and new VG
names are not being checked for equality. If they
are equal, it produces an internal error rather than
a proper error.
Use delay_dev to slow down mirror sync so we could more
easily check for race and proper reject of parallel mirror
leg addition/reduction.
Also expose fail in mirror allocation of parallel leg.
Rather then skipping whole test - just do not use it.
Failing tests that have required delay need to deal with reality
and shell either check for HAVE_DM_DELAY and skip portion
of test or using should when needed.
While through all codepaths we never 'read' lock_id unless LCKF_CONVERT,
coverity cannot decrypt this.
As since it's usually better to pass in 'well-defined' data structures
preset lock_id to 0.
Use fputs() when printing plain string,
easier then fprintf which needs to parse it.
Also check fd before close is >= 0 -
it is - but coverity fail to see it, so eliminate
this false-positive warning.
Doing 'stat' checking first and later opening is racy.
And since we do not really care about any 'status' info
here and we read 'sysfs' here - just drop whole 'stat()'
call and directly handle error from failing 'fopen()'.
Coverity here is not fully-in-picture - but please it
with validation of pointer which currently cannot be null,
since we always return at least empty string.
Check for arg_vgid_lookup and arg_name_lookup not being NULL.
Drop checking arg_vgid and arg_name for NULL since they
are already dereference earlier - thus mostly must be NOT NULL.
(If that would be possible larger rework of this function would be
required).
Put calls related to fifo opening into a single function.
Fix Time-Of-Check-Time-Of-Use and use fstat()
and fchmod() on already opened fd instead of
checking first path and then risking to open something
different.
Currently the code creates the log separately after allocating space for
the data and as no data allocation is needed this second time,
total_extents ends up holding zero so use new_extents directly instead.
When reading a foreign VG we cannot write it, since
it belongs to another host. When reading a shared VG
we cannot write it because we may not have an ex lock.
(Or we may be reading the shared VG while not using
lvmlockd in which case it's like reading a foreign VG.)
Add the same checks for wiping outdated PVs. We may
read a foreign or shared VG, or see the PVs, while
another host is part way through writing a new version
of the VG to the PVs. This might cause us to think
some of the PVs are outdated. We do not want to
write another host's PVs, especially when we may
wrongly conclude they are outdated.
When the command gets a list of alternate devices
from lvmetad, log each one directly. This is not
the same as the warnings when adding lvmcache,
which are related to which duplicate is preferred.
If two PVs have different VGs with the same name
(different uuids), one of the VGs is ignored by
lvmetad. A FIXME exists in lvmetad to find a
better response.
update_metadata and pv_found update the cached metadata;
these are both reworked to improve the code, organize it
by each possible state and transition, make it much more
clear what's changing, add more error checking and
handling, and add comments.
The state and content of the cache (hash tables) does not
change (apart from some things that didn't work before),
and the communication to/from commands does not change.
The implementation and organization of the code making
the state changes does change significantly.
One detail related to the content of the cache does change:
different hash tables do not reference the same memory any more;
the target values in each hash table are allocated and freed
individually.
The str_list_destroy function may be called to cleanup memory when
the list is not used anymore and the list itself was not allocated
from the memory pool.
When checking minimum mda size, make sure the mda_size after alignment
and calculation is more than 0 - if there's no place for an MDA at the
end of the disk, the _text_pv_add_metadata_area does not try to add it
there and it returns (because we already have the MDA at the start of
the disk at least).
Actually, we don't need extra condition as introduced in commit
00348c0a63. We should fix the last
condition:
(mdac->rlocn.size >= mdah->size)
...which should be:
(MDA_HEADER_SIZE + (rlocn ? rlocn->size : 0) + mdac->rlocn.size >= mdah->size))
Where the "mdac" is new metadata, the "rlocn" is old metadata.
So the main problem with the previous condition was that it
didn't count in MDA_HEADER_SIZE properly (and possible existing
metadata - the "rlocn"). This could have caused the error state
where metadata in ring buffer overlap to not be hit.
Replace the new condition introduced in 00348c0a63
with the improved one for the condition that existed there
already but it was just incomplete.
We're already checking whether old and new meta do not overlap in
ring buffer (as we need to keep both old and new meta during vg_write
up until vg_commit).
We also need to check whether the new metadata do not overlap
themselves in case we don't have old metadata yet (...because
we're in vgcreate). This could happen if we're creating a VG so
that the very first metadata written are long enough that it wraps
themselves in metadata ring buffer.
Although we limited the minimum metadata area size better with the
previous commit ccb8da404d which
makes the initial VG metadata overlap in ring buffer to be less
probable, the risk of hitting this overlap condition is still there
if we still manage to generate big enough metadata somehow.
For example, users can provide many and/or long VG tags during vgcreate
so that the VG metadata is long enough to start to wrap in the ring
buffer again...
Drop already tested 'threshold & create' which is in
lvextend-thin-full.sh
Count with now match faster 'dmeventd' wakeup on watermark
as it's now nearly instant after crossing threshold value.
If the underlaying device has actually bigger read-ahead settings,
let it pass.
But anyway switch to 512 strip-size to get really high R-A sector count.
This option could never have been printed in lvm2 metadata, so it could
be safely removed as it could have been set only as 0.
These configurable setting is supported via metadata profile.
Now with correctly functioning dmeventd enable usage of
low_water_mark for faster reaction on pool's threshold.
When user select e.g. 80% as a threshold value,
dmeventd doesn't need to wait 10 seconds till monitoring
timer expires, but nearly instantly resizes thin-pool
to fit bellow threshold.
If plugin's lvm command execution fails too often (>10 times),
there is no point to torture system more then necessary, just log
and drop monitoring in this case.
The recent addition to check for PVs that were
missed during the first iteration of processing
was unintentionally catching duplicate PVs because
duplicates were not removed from the all_devices
list when the primary dev was processed.
Also change a message from warn back to verbose.
If a VG is removed between the time that 'vgs'
or 'lvs' (with no args) creates the list of VGs
and the time that it reads the VG to process it,
then ignore the removed VG; don't report an error
that it could not be found, since it wasn't named
by the command.
The former patch(dab3ebce4c) is a little bit strict. For example, it is
OK to create PV on unpartitioned DASD devices with LDL formatted. So
after lvm version containing the patch, LVs created on those devices
could not be found.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Improve event string parser to avoid unneeded alloc+free.
Daemon talk function uses '-' to mark NULL/missing field.
So restore the NULL pointer back on parser.
This should have made old tools like 'dmevent_tool' work again.
As now 'uuid' or 'dso' could become NULL and then be
properly used in _want_registered_device() function.
Since lvm2 always fill these parameters, this change should
have no effect on lvm2.
PVs could be missing from the 'pvs' output if
their VG was removed at the same time that the
'pvs' command was run. To fix this:
1. If a VG is not found when processed, don't
silently skip the PVs in it, as is done when
the "skip" variable is set.
2. Repeat the VG search if some PVs are not
found on the first search through all VGs.
The second search uses a specific list of
PVs that were missed the first time.
testing:
/dev/sdb is a PV
/dev/sdd is a PV
/dev/sdg is not a PV
each test begins with:
vgcreate test /dev/sdb /dev/sdd
variations to test:
vgremove -f test & pvs
vgremove -f test & pvs -a
vgremove -f test & pvs /dev/sdb /dev/sdd
vgremove -f test & pvs /dev/sdg
vgremove -f test & pvs /dev/sdb /dev/sdg
The pvs command should always display /dev/sdb
and /dev/sdd, either as a part of VG test or not.
The pvs command should always print an error
indicating that /dev/sdg could not be found.
Recognize the target only 'extends' and do not enforce
'flush' in this case. Only the size reduction
still requires flush (so disables usage of no_flush flag).
If some other targets do require flush before suspend,
they have to explicitly ask for it.
While the activation code tries to evaluate which target
really needs flush with suspend and which may go without flush,
it has stayed effectively disabled by original commit:
33f732c5e9 since here
it only allows to pass non-pvmoving 'mirrors'.
So remove check for mirror LV type and only disable
no_flush for 'pvmove'..
TODO: Looking into history - it also seemed like raid target
would have always required flushing but it's been later
removed without clean explanation.
If some more targets really do need 'no_flush' it should
been handle at their 'level' - since we now stack multiple
targets over itself.
Add more functionality to size_changed function.
While 'existing' API only detected 0 for
unchanged, and !0 for changed,
new improved API will also detected if the
size has only went bigger - or there was
size reduction.
Function work for the whole dm-tree - so
no change is size is always 0.
only size extension 1.
and if some size reduction is there - returns -1.
This result can be used for better evaluation
whether we need to flush before suspend.
Use single code to evaluate if the percentage value has
crossed threshold.
Recalculate amount value to always fit bellow
threshold so there are not need any extra reiterations
to reach this state in case policy amount is too small.
Since plugin's percentage compare has been fixed,
it's now revealed wrong compare here.
The logic for threshold is - to allow to go as high
as given value e.g. 80% - so if pool is exactlu 80%
full it's still allowed to use it (dmeventd will not
resize it).
Commit 1a74171ca5 added
a check to ignore a VG that was FAILED_INCONSISTENT
if the command doesn't care if the VG is not found.
Remove that check because that case is never reached
by the current code.
The ONE_VGNAME_ARG was being passed and tested as
vg_read() flag but it's a cmd struct flag.
(It affects command arg processing in toollib,
not vg_read behavior. Flags related to command
processing are generally cmd struct flags, while
vg_read arg flags are generally related to vg_read
behavior.)
Running "vgremove -f VG & pvs" results in the pvs
command reporting that the VG is not found or is
inconsistent. If the VG is gone or being removed,
the pvs command should just skip it and not print
errors about it.
"Not found" is because the pvs command created the
list of VGs to process, including VG, then vgremove
removed the VG, then the pvs command came to to read
the VG to process it and did not find it.
An "inconsistent" error could be reported if vgremove
had only partially completed removing VG when pvs did
vg_read on the VG to process it, causing pvs to find
the VG in a partially-removed state.
This fix adds a flag that pvs uses to ignore a VG
that can't be read or is inconsistent.
When lvmetad is used and lvmcache update function (lvmcache_update_vgname_and_id)
was called to update existing lvmcache records, a condition was met
which made to retun from the update function immediately, effectively
making it NOOP.
It seems there's no reason for such condition and lvmcache should be
update appropriately even when lvmetad used as lvmcache may be reused,
most notably in lvm shell.
It's possible this is a remnant of the lvmetad development code which
didn't get removed for some reason and the bug didn't get spotted
because lvm shell is not used often (the condition dates back to 2012
or so).
Example, lvmetad and lvm shell used:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 124.00m
Before this patch:
==================
lvm> vgremove vg
Volume group "vg" successfully removed
lvm> pvs
With this patch applied:
========================
lvm> vgremove vg
Volume group "vg" successfully removed
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
The lvmcache info might be resued, most notably in lvm shell.
We need to be sure that even lvmcache_info marked as invalid
is removed from the lvmcache so it does not confuse any subsequent
code/commands executed later on.
Problematic example with the lvm shell:
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
Before this patch (/dev/sda still displayed in a way):
======================================================
lvm> pvremove /dev/sda
Labels on physical volume "/dev/sda" successfully wiped
(without lvmetad)
lvm> pvs
No physical volume label read from /dev/sda
(with lvmetad)
lvm> pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
With this patch applied:
========================
lvm> pvremove /dev/sda
Labels on physical volume "/dev/sda" successfully wiped
(without lvmetad)
lvm> pvs
(with lvmetad)
lvm> pvs
Older pthread library was missing 'trick'
in pthread_cleanup_pop() which lead to
compilation error:
error: label at end of compound statement
Use explicit ';' to fix it.
Make lvm2_disable_dmeventd_monitoring() more explicit.
As memlock_inc_daemon() is also used by clvmd, which
does changes dmeventd and suspend ignore state at
some stages - make updates of these 2 variable
tied to the call of lvm2_disable_dmeventd_monitoring().
Once this call is made dmeventd monitoring
and suspended devices are ignored.
TODO: all lvm-global settings should really be moved
to command context.
Implementing exit when 'dmeventd' is idle.
Default idle timeout set to 1 hour - after this time period
dmeventd will cleanly exit.
On systems with 'systemd' - service is automatically started with
next contact on dmeventd communication socket/fifo.
On other systems - new dmeventd starts again when lvm2 command detects
its missing and monitoring is needed.
Add support to unmonitor device when monitor recognizes there is
nothing to monitor anymore.
TODO: possibly API change with return value could be also used.
Redesign threading code:
- plugin registration runs within its new created thread for
improved parallel usage.
- wait task is created just once and used during whole plugin lifetime.
- event thread is based over 'events' filter being set - when
filter is 0, such thread is 'unused'.
- event loop is simplified.
- timeout thread is never signaling 'processing' thread.
- pending of events filter cnange is properly reported and
running event thread is signalled when possible.
- helgrind is not reporting problems.
Need here to keep control device opened while there is 'any' dso
plugin loaded - otherwise there would a race closing controlfd
inside lvm2 plugin while some other monitoring thread would
tried to execute another WAITEVENT task.
Move all DSO related function in front, so they could be easily
referenced from rest of code.
Add proper error paths with logging and error reporting.
Drop mutex locking when releasing DSO - since DSO is always
allocated and released in main 'event' processing thread.
CONVERTING status flag is a tricky one. It's not set when converting
a non-mirror LV type to the mirror type, i.e.: linear -> two leg mirror.
Also the conversion itself is instant and doesn't require to be polled.
When mirror reaches sync state there's no final update on VG metadata
for lvmpolld to be made thereby report_progress in fact doesn't report
percentage of mirror being converted but percentage of mirror
being in sync. Perhaps we should reword the lvconvert output here.
On the other hand CONVERTING is set while we upconvert the mirror
from i.e. two leg mirror to four leg mirror. In such case the operation
is required to be polled so that lvmpolld can cleanup temporary
conversion log when the conversion is over.
Ignore CONVERTING lv_type for the moment and match LVs only by uuids
during 'mirror conversion'/'waiting for a sync to finish'.
The old code made two loops through the PVs: in the first
loop it found the max PV and VG name lengths, and in the
second loop it printed each PV using the name lengths as
field widths for aligning columns.
The new code uses process_each_pv() which makes one loop
through the PVs. In the *first* call to pvscan_single(),
the max name lengths are found by looping through the
lvmcache entries which have been populated by the generic
process_each code prior to calling any _single functions.
Subsequent calls to pvscan_single() reuse the max lengths
that were found by the first call.
The new report/compact_output_cols setting has exactly the same effect
as report/compact_output setting. The difference is that with the new
setting it's possible to define which cols should be compacted exactly
in contrast to all cols in case of report/compact_output.
In case both compact_output and compact_output_cols is enabled/set,
the compact_output prevails.
For example:
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols=""
$ lvs vg
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=1
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize
lvol0 vg -wi-a----- 4.00m
dm_report_compact_given_fields is the same as dm_report_compact_fields,
but it processes only given fields, not all the fields in the report
like dm_report_compact_field does.
If a host failed while holding a sanlock lease,
sanlock_acquire will by default block and wait
for the lease to expire before returning. We
want it to return with an error so we can retry
instead of blocking, which allows us to process
other lock operations.
(Enclose this in an ifdef until the new flag
appears in a sanlock release.)
Respect lvm2_log_fn prototype. The idea of 'reusing' print_log with
plain cast is causing very strange crashes with some older 'gcc' compilers.
So just do it cleanly...
This reverts commit 1b1c01a27b.
This caused messages to get dropped instead of logged into the log file.
(The log file and log function are independent at the moment.)
Rework thread creation code to better use resources.
New code will not leak 'timeout' registered thread on error path.
Also if the thread already exist - avoid creation of thread
object and it's later destruction.
If the race is noticed during adding new monitoring thread,
such thread is put on cleanup list and -EEXIST is reported.
As we now use 'unified' logging macro system - we no longer need
to protect from change of logging function pointer - it's set
once at the start of dmeventd and not change anymore
(as lvm2 library no longer interferers here).
Some signatures are spread around the disk in several copies, mainly for
backup. Make libblkid to detect these extra copies - there was missing
"blkid_probe_step_back" fn call after successful wipe of previous signature
copy.
An example with FAT table which has copies:
$ mkfs.vfat /dev/sda1
Before this patch:
$ pvcreate /dev/sda1
WARNING: vfat signature detected on /dev/sda1 at offset 54. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
With this patch applied:
$ pvcreate /dev/sda1
WARNING: vfat signature detected on /dev/sda1 at offset 54. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
WARNING: vfat signature detected on /dev/sda1 at offset 0. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
WARNING: vfat signature detected on /dev/sda1 at offset 510. Wipe it? [y/n]: y
Wiping vfat signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
Make sure log/prefix is set to "" when getting the list of VG names.
We need this for the format to be correct so it's properly searched
through later on.
$ vgcreate vgA /dev/sda
Volume group "vgA" successfully created
$ dd if=/dev/sda of=/dev/sdb bs=1M
$ dd if=/dev/sda of=/dev/sdc bs=1M
(the new VG name is prefix of existing VG name)
$ vgimportclone -n vg /dev/sdb
(the new VG name is suffix of existing VG name)
$ vgimportclone -n gA /dev/sdc
Before this patch:
------------------
(we end up with "vg1" and "gA1" names with the "1" suffix which is not needed)
$ vgs -o vg_name
VG
gA1
vg1
vgA
With this patch applied:
------------------------
(we end up with "vg" and "gA" names as they're unique already and no extra suffix is added)
$ # vgs -o vg_name
VG
gA
vg
vgA
Of course, if the name supplied is not unique, the number is added correctly:
$ dd if=/dev/sda of=/dev/sdb bs=1M
$ vgimportclone -n vgA /dev/sdb
$ vgs -o vg_name
VG
vgA
vgA1
If lvmlockd is running, lvmetad is configured (use_lvmetad=1),
but lvmetad is not running, then commands will seg fault
when trying to send a message to lvmetad.
The difference is lvmetad being "active", not just "used".
We can replace the expressions with awk/grep/cut/tr with --select now and
more suitable reporting options and modes. Also, we don't need to check
the temporary lvm.conf generated within vgimportclone script since we're
generating it ourselves now using lvmconfig, not using sed anymore like
it was before (so we can be pretty sure it's correct - we use lvmconfig
now even for generating the lvm.conf itself).
We already have pv_count to report number of PVs that a VG has based
on metadata.
This patch exposes the information about how many of these PVs are
missing which is also useful information for a VG. Wwe could count
the sum of pv_missing reporting fields for each PV in the VG before,
but the new field is practical when reporting VG as a whole and there's
no need to process each PV from VG alone.
If 'vgcreate --shared' finds both sanlock and dlm are running,
print a more accurate error message:
"Found multiple lock managers, select one with --lock-type."
When neither is running, we still print:
"Failed to detect a running lock manager to select lock type."
Using --lock-type sanlock|dlm implies --shared.
Using --shared selects lock type sanlock|dlm
(by choosing the one that's running.)
Using both --shared and --lock-type sanlock|dlm should
also be allowed (--shared is just redundant information.)
Also, leave out the note about "circular buffer" which is
an internal imeplementation detail anyway and not quite
informational for users:
Before this patch:
$ vgcreate vg1 /dev/sda
VG vg1 metadata too large for circular buffer
Failed to write VG vg1.
With this patch applied:
$ vgcreate vg1 /dev/sda
VG vg1 metadata too large: size of metadata to write is 691 bytes while PV metadata area size on /dev/sda is 512 bytes.
Failed to write VG vg1.
Before this patch:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear public
With this patch applied:
$ lvs -a -o name,layout,role test/lvmlock
LV Layout Role
[lvmlock] linear private,lockd,sanlock
Avoid running tests, when prefix already exist in the system.
As prefix just uses PID number, we may hit a case for long
running tests, where devices from some previous runs were not
properly cleared away - detect this and fail early.
(Such machine should be inspected and fixed).
Add metadata_devices and seg_metadata_le_ranges report fields.
Currently only defined for raid, but should probably be extended
to all other segment types that don't report all their device
usage in the 'devices' field.
Correct some things, e.g. set mode and policy on
the cache lv, not the pool, lvm.conf field for
mode changed.
Add smq which was missing.
Make the sections on cache mode and cache policy
consistent in structure and style.
When lvmetad_pvscan_vg() reads VG metadata from each PV,
it compares it to the last one to verify it matches.
If the VG metadata does not match on the PVs, an error
is printed and it fails to read the VG. In this error
case, use log_debug to show the differences between
the two unmatching copies of the metadata.
One host changes a VG, making the cached VG on another
host invalid. The other host then rereads the VG from
disk to get the latest copy. If the first host removed
a PV from the VG, the second host attempts to reread the
VG from old PV when rescanning. Reading the VG from the
removed PV fails, causing vg_read to return "VG not found".
The fix is to simply not fail when a VG is not found while
rereading a PV and continue without it.
(This doesn't happen if the second host happens to first
run a command like 'vgs' that triggers a global revalidation
of metadata.)
vgchange --lock-type iterates through LVs to ensure
no LVs are active before changing the lock type of
the VG, but the loop was not checking that an LV
actually has a lock before trying it, so it would
fail if the VG had any LVs that don't use locks,
e.g it would fail on a tmeta LV from a pool.
We want most of our units to be started before any local/remote mount
points are mounted - we used {local,remote}-fs.target for this purpose
before, but it was not 100% correct as there's even {local,remote}-fs-pre.target
special systemd unit reserved for this exact purpose.
See also man 7 systemd.special and "local-fs-pre.target"/"remote-fs-pre.target"
description.
When user specifies '--force' with remove/remove_all/wipe_table
use '--noflush --nolockfs' resume flags, so the operation
will not block when device underneath is blocked.
Also make error messages more consistent:
Before this patch:
(/run/lock exists and is not a directory)
$ pvs
/run/lock/lvm: mkdir failed: Not a directory
File-based locking initialisation failed.
(/run/lock/lvm exists and is not a directory)
$ pvs
Directory "/run/lock/lvm" not found
File-based locking initialisation failed.
With this patch applied:
(/run/lock exists and is not a directory)
$ pvs
Existing path /run/lock is not a directory.
Failed to create directory /run/lock/lvm.
File-based locking initialisation failed
(/run/lock/lvm exists and is not a directory)
$ pvs
Existing path /run/lock/lvm is not a directory.
Failed to create directory /run/lock/lvm.
File-based locking initialisation failed.
When using udev, the /dev/mapper entries are symlinks - fix the code
to count with this.
This patch also fixes the dmsetup mknodes and vgmknodes to properly
repair /dev/mapper content if it sees dangling symlink in /dev/mapper.
$ lvs -o name,tags vg
LV LV Tags
lvol0
lvol1 mytag
Before this patch:
$ lvs -o name,tags vg -S 'tags=""'
Failed to parse string list value for selection field lv_tags.
Selection syntax error at 'tags=""'.
Use 'help' for selection to get more help.
(and the same for -S 'tags={}' and -S 'tags=[]')
With this patch applied:
$ lvs -o name,tags vg -S 'tags=""'
LV LV Tags
lvol0
(and the same for -S 'tags={}' and -S 'tags=[]')
If lvmlockd acquires an lv lock for a command, but the
command exits before the reply, then the command has
not activated the lv and lvmlockd should unlock it.
This only applies when the lv was not already locked.
(There will always be a chance that the lv lock is held
while the lv is not active, i.e. if the command fails in
the small window between getting the lv lock and before
doing the activation. In that case, rerunning the
activation command corrects the inconsistency.)
This commit helps by automatically clearing the
inconsistency (lv locked by not activated) in the most
common case when the lv lock operation is slow to
complete and the command is canceled by the user.
This commit also adds and cleans up references to the
client id in a bunch of log messages, which is useful
to follow processing on each independent lock request.
When using lvm shell, some structures which are cached in memory may be
reused. This happens for the struct label (a part of lvmcache_info
structure) when lvmetad is used in which case the PV scan is not
done that would normally overwrite these label structures in memory
and making them up-to-date.
This is all consequence of the fact that struct lvmcache_info and
struct label are not always assigned in the same part of the code.
For example, if lvmetad *is not* used, parts of the struct label are
reassigned in label_read fn while struct lvmcache_info is created
elsewhere. No part of the code reused struct label (and its "dev"
field) before calling label_read fn. That's why the real bug is
hidden when using lvm shell without lvmetad.
However, with lvmetad and lvm shell, the situation is a bit different.
The label_read fn is not called if lvmetad *is* used, hence the
struct label may have ended up not initialized properly.
There was missing assignment for the dev field in struct label
in _text_pv_write fn which caused this problem to appear in
lvm shell with lvmetad, for example:
Before this patch:
lvm> pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
lvm> pvs /dev/sda
PV VG Fmt Attr PSize PFree
unknown device lvm2 --- 128.00m 128.00m
With this patch applied:
lvm> pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
lvm> pvs /dev/sda
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
Also, this problem had not appeared before changes introduced
by commits e1a63905d1 through
3a6f91d713 which, among other
things, added proper label field type reporting. Before, label
reporting was the same as using struct physical_volume which
has its own dev field assigned and so this problem was not exposed.
When a command does a sequence of
vg_write + vg_commit + vg_write + vg_commit,
initialization of non-PV devices happens during the
first vg_write, and does not need to be repeated by
the second vg_write.
When creating a lockd VG, this sequence occurs because
the VG is first created, then the lockd data is created,
then the lockd data is then written to the VG metadata.
Avoid validation of free space in pool, when no messages are passed.
Patch a3c7e326c3 add new check for
pool overload - but this check should not be made if there are
no messages and transaction_id is still within 'bounds' (bigger by 1).
Commit 00b36ef06a had a typo
and missed '{' for shell variable, thus command used slightly
different 'tmp' dir name for cache dir (with extra '}').
Such change was unnoticed until a recent fix in persistent
filter, lvm2 missed to update cache file when --config
was specified.
The result was, /tmp dir was accumulating snap.XXXXX} dirs when
running vgimportclose script.
Since we may want to swap names when LVs are complex types, we cannot
avoid doing full renames on both LV stacks.
Temporarily use 'pvmove_tmeta' as unused name to prevent validation troubles.
Commit 9403edbb93 move location of
configure.h and lvm-version.h.
Let's try even better place then /conf dir which should be left
for user configurable files.
Put these files right into include dir.
This applies the same rule/logic to dlm VGs that has always
existed for sanlock VGs. Allowing a dlm VG to be removed
while its lockspace was still running on other hosts largely
worked, but there were difficult problems if another VG with
the same name was recreated. Forcing the VG lockspace to
be stopped, gives both sanlock and dlm VGs the same behavior.
This shortcut was added for an odd case that I do not
believe is relevant any more. Having an alternate
path for lockspace thread cleanup is a complication
that could lead to problems.
The code was expecting the wrong return value from
compare_config, which returns 0 when equal.
This is a problem for a lockd VG using multiple PVs
when the VG needs to be rescanned.
Somehow raid tests landed in plain cache - separte them out
so they properly check for have_raid.
Check we do not support strip option with cache-pool creation.
ATM allocation can't handle stripping and cache pool allocation.
It's not yet even clear what should be actually result.
Until resolved, disable this option (it's been coredumping
inside allocation anyway).
Certain stacks of cached LVs may have unexpected consequences.
So add a warning function called when LV is cached to detect
such caces and WARN user about them - the best we could do ATM.
When we insert layer we also move status flag-bits for certain LV types,
so internal volume_group structure remains consistent.
(Perhaps it's misuse of 'insert_layer' function and we should have
another similar function for this.)
Basically we aim to maintain the same state as after reading fresh
metadata out of volume group.
Currently we when i.e. cache 'raid' LV - this should transfer 'raidLV' flag
to _corigin LV and cache is no longer a raid.
TODO: bits for stacked devices needs more exact rules.
The dlm will often lose the lvb content, so we need to
check quite a few possibilities for lvb values that
were not being checked before.
Refactoring was required to pass the entire lvb value
back to the core code instead of the single value.
The only functional change should be detecting new
lvb states where metadata is now invalidated where
it wasn't before.
When an action is created by lvmlockd for itself,
there is no client to send the result to. Add
the NO_CLIENT flag to the action to skip sending
the result to a client.
Add a new arg to lockd_start_vg() that indicates
it is being called for a new lockd VG, so that
lvmlockd knows the lockspace being started is new.
(Will be used by a following commit.)
Commit f6473baffc introduced a new
cmd->initialized variable to keep info about which parts of the
cmd_context have been initialized.
A part of this patch was also a change in refresh_filters fn
which checks for cmd->initialized.filters variable and it does
the filter refresh *only* if the filter has already been initialized
before otherwise it's a NOOP (before, the refresh_filters also
initialized filters as a side effect in case it had not been
initialized before which was not quite correct).
However, the commit f6473baffc
did not handle the case in which configuration changes
either via --config argument or when configuration file changed
and its timestamp was higher than the timestamp of the persistent
cache file - the /etc/lvm/cache/.cache.
This patch fixes this issue and it causes the init_filters fn
in lvm_run_command fn to be called with proper value of
"load_persistent_cache" switch even if the configuration changes,
hence causing the persistent cache file to be ignored in this
case.
Ensure make clean cleans any left-over file from their previous
location so they are not in conflict with new ones.
Also hide error message when .commands file is not present.
The regex filter (controlled by devices/filter lvm.conf setting) was
evaluated as the very last filter. However, this is not optimal when
it comes to restricting disk access - users define devices/filter
as well as devices/global_filter to avoid this.
The devices/global_filter is already positioned at the beginning of the
filter chain. We need to do the same for devices/filter.
Filter chains before this patch:
A: when lvmetad is not used:
persistent_filter -> sysfs_filter -> global_regex_filter ->
type_filter -> usable->filter -> mpath_component_filter ->
partition_filter -> md_component_filter -> fw_raid_filter ->
regex_filter
B: when lvmetad is used:
B1: to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter ->
usable_filter -> mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B2: to retrieve info from lvmetad:
persistent_filter -> usable_filter -> regex_filter
From the chain list above we can see that particularly in case when
lvmetad is not used, the regex filter is the very last one that is
processed. If lvmetad is used, it doesn't matter much as there's
the global_regex_filter which is used instead when updating lvmetad
and when retrieving info from lvmetad, putting regex_filter in front
of usable_filter wouldn't change much since usabled_filter is not
reading disks directly.
This patch puts the regex filter to the front even in case lvmetad
is not used, hence reinstating the state as it was before commit
a7be3b12df (which moved the regex_filter
position in the chain). Still, the arguments for the commit
a7be3b12df still apply and they're
still satisfied since component filters (MD, mpath...) are evaluated
first just before updating lvmetad.
So with this patch, we end up with:
A: when lvmetad is not used:
persistent_filter -> sysfs_filter -> global_regex_filter ->
regex_filter -> type_filter -> usable->filter ->
mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B: when lvmetad is used:
B1: to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter ->
usable_filter -> mpath_component_filter -> partition_filter ->
md_component_filter -> fw_raid_filter
B2: to retrieve info from lvmetad:
persistent_filter -> regex_filter -> usable_filter
This way, specifying the regex_filter in non-lvmetad case causes
the devices to be filtered based on regex first before processing
any other filters which can access disks (like md_component_filter).
This patch also streamlines the code for better readability.
Hopefull 4.3 will be fixed and test will be updated to let
raid test running again.
Meanwhile using md-raid may effectively kill kernel,
so leave at least other tests running.
Split up _build_histogram_arg() into separate functions to allocate
and fill the histogram arg string and remove nested local variable
declarations from the parent function.
Coverity flags a user-after-free in _stats_histograms_destroy():
>>> Calling "dm_pool_free" frees pointer "mem->chunk" which has
>>> already been freed.
This should not be possible since the histograms are destroyed in
reverse order of allocation:
203 for (n = _nr_areas_region(region) - 1; n; n--)
204 if (region->counters[n].histogram)
205 dm_pool_free(mem, region->counters[n].histogram);
It appears that Coverity is unaware that pool->chunk is updated
during the call to dm_pool_free() and valgrind flags no errors in
this function when called with multiple allocated histograms.
Since there is no actual need to free the histograms individually
in this way simplify the code and just free the first allocated
object (which will also free all later allocated histograms in a
single call).
Put include/.symlinks_created as a prerequisite for dep calc.
Otherwise if these are not generated and user enters tests subdir and
runs 'make' he just gets endless loop of dep calculation.
Relocate generated configure.h and lvm-version.h outside
of compilable .c source tree.
The reason is behind - when compiling in builddir != srcdir
the generated file in lib/misc/configure.h was used for all compiled
source file except ones located in lib/misc dir - those would have used
configure.h file located in this dir - if there have existed one (i.e.
from some other build)
This problem was only visible, when srcdir == buildir was used before
trying to use srcdri != builddir (as configure.h appeared then in
srcdir).
The histogram changes adds a new error path to dm_stats_create().
Make sure that the dm_stats handle is properly destroyed if we fail
to create the histogram pool and check for failures setting the
program_id.
Since we are growing an object in the histogram pool the return
value of dm_pool_grow_object() must be checked and error paths need
to abandon the object before returning.
Undo the part of the recent EREMOVED change which
automatically stopped the lockspace for a remotely
removed VG. It didn't always work (would not work
when lvb content was rebuilt in the dlm). This will
be handled better when the lvb content is controlled
more strictly.
As part of fix that came with cf700151eb,
I forgot to add the check whether the result of stat was successful or
not. This bug caused uninitialized buffer to be used for entries
from .cache file which are no longer valid.
This bug may have caused these uninitialized values to be used further,
for example (see the unreal (2567,590944) representing major:minor
pair):
$ pvs
/dev/abc: stat failed: No such file or directory
Path /dev/abc no longer valid for device(2567,590944)
PV VG Fmt Attr PSize PFree
/dev/mapper/test lvm2 --- 104.00m 104.00m
/dev/vda2 rhel lvm2 a-- 9.51g 0
Older versions of gcc aren't able to track the assignments of
local variables as well as the latest versions leading to spurious
warnings like:
libdm-stats.c:2183: warning: "len" may be used uninitialized in this
function
libdm-stats.c:2177: warning: "minwidth" may be used uninitialized in
this function
Both of these variables are in fact assigned in all possible paths
through the function and later compilers do not produce these
warnings.
There's no reason to not initialize these variables though and
it makes the function slightly easier to follow.
Also fix one use of 'unsigned' for a nr_bins value.
Replace the histogram stats subcommand with a --histogram switch
to enable histogram related fields for both list and report output.
To avoid overloading the existing --histogram rename it to --bounds:
this is also a better description of the option.
Remove the optimization/shortcut for starting the dlm global
lockspace when it was already running.
Reenable automatically starting the dlm global lockspace
when a command attempts to use it and it's not yet started.
This had become disabled at some point.
Since we may easily get blocked when checking for percentage
of thin-pool - do not flush and just show current values.
This avoids holding VG locked when pool is overfilled.
Try to detect thin-pool which my block lvm2 command from furher
processing (i.e. lvextend).
Check if pool is read-only or out-of-space and in this case thins
will skipped from being scanned (so user may miss some PVs located
on thin volumes).
Fix regression introduced with commit:
2fc126b00d
This commit has moved pv_min_size() test in front
of device_is_usable(). However pv_min_size needs to open device,
so it may have actually get blocked.
So restore the original order and first validate
dm device to be usable for open.
It's worth to note that such check is not 'race-free',
but it usually eliminates 99.99% of problems ;).
Print [source:handler] in filters' debug messages only if external
device info source other than "none" is used.
$ lvmconfig --type full devices/external_device_info_source
external_device_info_source="none
Before this patch (from the -vvvv log):
filters/filter-usable.c:47 /dev/mapper/test: Skipping: Too small to hold a PV [none:(nil)]
filters/filter-md.c:33 /dev/sdb: Skipping md component device [none:(nil)]
filters/filter-partitioned.c:25 /dev/vda: Skipping: Partition table signature found [none:(nil)]
With this patch applied:
filters/filter-usable.c:44 /dev/mapper/test: Skipping: Too small to hold a PV
filters/filter-md.c:35 /dev/sdb: Skipping md component device
filters/filter-partitioned.c:27 /dev/vda: Skipping: Partition table signature found
librt doesn't have a pkgconfig file so use Libs.private: -lrt instead
to declare the dependency directly.
The same applies for -lm which is also used and which hasn't been
defined in the devmapper.pc file yet.
Make sure that correct 'dmstats create' messages are shown for all
examples and fix LV examples to use correct dmsetup output name
format (vg/lv -> vg-lv).
Improve the names and labels of stats reports columns, ensure that
the minimum field widths allow unambiguos labels to be shown and
update the man page descriptions of these fields.
Add support to dmstats to create and report histograms.
Add a --histogram switch to 'create' that accepts a string
description of bin boundaries and DR_STATS and DR_STATS_META fields
to report bin configuration and absolute and relative histogram
values:
hist_bins
hist_bounds
hist_ranges
hist_count
hist_count_bounds
hist_count_ranges
hist_percent
hist_percent_bounds
hist_percent_ranges
A new 'histogram' subcommand displays a report that emphasizes
histogram data as either counters or percentage values.
Add support for creating, parsing, and reporting dm-stats latency
histograms on kernels that support precise_timestamps.
Histograms are specified as a series of time values that give the
boundaries of the bins into which I/O counts accumulate (with
implicit lower and upper bounds on the first and last bins).
A new type, struct dm_histogram, is introduced to represent
histogram values and bin boundaries.
The boundary values may be given as either a string of values (with
optional unit suffixes) or as a zero terminated array of uint64_t
values expressing boundary times in nanoseconds.
A new bounds argument is added to dm_stats_create_region() which
accepts a pointer to a struct dm_histogram initialised with bounds
values.
Histogram data associated with a region is parsed during a call to
dm_stats_populate() and used to build a table of histogram values
that are pointed to from the containing area's counter set. The
histogram for a specified area may then be obtained and interogated
for values and properties.
This relies on kernel support to provide the boundary values in
a @stats_list response: this will be present in 4.3 and 4.2-stable. A
check for a minimum driver version of 4.33.0 is implemented to ensure
that this is present (4.32.0 has the necessary precise_timestamps and
histogram features but is unable to report these via @stats_list).
Access methods are provided to retrieve histogram values and bounds
as well as simple string representations of the counts and bin
boundaries. Methods are also available to return the total count
for a histogram and the relative value (as a dm_percent_t) of a
specified bin.
For repeating reports field widths should be re-calculated for
each report interval. Not doing so will cause a single row with
wide field data to cause all subsequent rows to share the width:
Name RgID ArID R/s W/s Histogram Bounds
vg_hex-lv_home 0 0 4522.00 834.00 0s: 991, 2ms: 152, 4ms: 161, 6ms: 4052 0s, 2ms, 4ms, 6ms
vg_hex-lv_swap 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
vg_hex-lv_root 0 0 1754.00 683.00 0s: 369, 2ms: 65, 4ms: 90, 6ms: 1913 0s, 2ms, 4ms, 6ms
luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e 0 0 4522.00 868.00 0s: 985, 2ms: 152, 4ms: 161, 6ms: 4092 0s, 2ms, 4ms, 6ms
vg_hex-lv_images 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
Name RgID ArID R/s W/s Histogram Bounds
vg_hex-lv_home 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
vg_hex-lv_swap 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
vg_hex-lv_root 0 0 0.00 2.00 0s: 1, 2ms: 0, 4ms: 0, 6ms: 1 0s, 2ms, 4ms, 6ms
luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
vg_hex-lv_images 0 0 0.00 0.00 0s: 0, 2ms: 0, 4ms: 0, 6ms: 0 0s, 2ms, 4ms, 6ms
^^^^^^^^^^^^^^^^^
This is especially significant for the current histogram fields:
depending on the time since the last clear operation the first
report iteration may contain very large values leading to a very
large minimum field width. Without resetting field widths this
large minimum field width value is used for all subsequent rows.
Previously all stderr messages issued by spawned lvpoll command were reported
as INFO only. This made all such messages invisible in syslog or lvmpolld log
while running default configuration.
All lvpoll stderr messages are loged with WARN priority now and lvpoll
command exiting with retcode != 0 is logged with ERROR priority in
syslog and lvmpolld log
Include both the VG uuid and name in the lvmetad
set_vg_info message. This works around an obscure
problem where the VG uuid in lvmlockd is wrong
when one host removes a dlm VG, then creates a new
VG with the same name. If the dlm lockspace for
the initial VG was never stopped on another host,
that other host will be using the old uuid in its
lvmetad set_vg_info message. (That can be
corrected with a larger change, but this is an
effective workaround.)
set_vg_info previously accepted only vg uuid,
now accept both vg uuid and vg name. If the
uuid is provided, it's used just as before,
but if the uuid is not provided, or if it's
not found, then fall back to using the vg
name if that is provided.
lvmlockd would fail to recognize that the global lockspace
failed to start if the dlm wasn't running, so future attempts
to start the dlm global lockspace would do nothing, thinking
it was already running.
This was only used to return two flags indicating specific
reasons for a lock failure so that a more specific error
message could be printed by the command (lockspace had been
stopped, or lockspace had an error starting.)
Remove the list, given its limited usefulness, the fact it
would easily become inaccurate, and the fact it was causing
misleading error messages. The error conditions it was meant
to help could be reported differently.
Previously, a command would only rescan a lockd VG
when lvmetad returned the "vg_invalid" flag indicating
that the cached copy was invalid (which is done by
lvmlockd.) This is still the only usual reason for
rescanning a lockd VG, but two new special cases are
added where we also do the rescan:
. When the --shared option is used to display lockd VGs
from hosts not using lvmlockd. This is the same case
as using --foreign to display foreign VGs, but --shared
was missing the corresponding bits to rescan the VGs.
. When a lockd VG is allowed to be read for displaying
after failing to acquire the lock from lvmlockd. In
this case, the usual mechanism for validating the
cache is missed, so assume the cache would have been
invalidated. (This had been a previous todo item
that was lost during other cleanup.)
These were long-standing todos that were lost track of.
This makes lvmlockd removal steps for dlm VGs closely match
sanlock VGs. Because dlm lockspaces are not required to be
stopped on all hosts before vgremove, there is an extra bit
for dlm lockspaces, where a flag is set in the VG lock lvb
indicating that the VG was removed. If other hosts happen
to use the VG lock they will see this flag and stop their
lockspace.
Remove the existing lock type using the same functions
used to remove the lockd components during vgremove.
This results in a "clean" VG and lvmlockd state after
the vgchange, i.e. no bits left over from previous
lock type.
Originally when vgdisplay encountered an exported VG it issued a
WARNING. Commit d6b1de30 replaced this with an error message
but still exited with success (incorrect). A backtrace was recently
added in commit b193809987.
As vgdisplay already states that the VG is exported in its output,
just drop these messages completely.
dm_stats_create_region is now assigned to DM_1_02_106 by default:
the DM_1_02_104 .exported_symbols file entry was moved into
libdm-stats.c as:
DM_EXPORT_SYMBOL(dm_stats_create_region, 1_02_104)
so delete it from .exported_symbols.DM_1_02_104.
Since commit 797c18d543 some internal symbols
have been exported in shared libraries by mistake because 'local: *' got
lost. Fix the shell script not to compare the whole filename with
'Base'
All cache args could be specified when caching LV
(means converting LV to cached).
When --cachemode arg is given during cache-pool conversion,
store it in the metadata.
https://bugzilla.redhat.com/show_bug.cgi?id=1255184
Since cache-pool actualy keeps info about caching,
display this info for cache-pool LV as well
(matches info for cache LV when cache-pool is asociated with it).
Commit 82a27a8 introduced a change to the symbol versioning macros
that allows a new version of a function to be introduced while
keeping the old behaviour via a versioned symbol export. The new
symbol is listed in the current .exported_symbols.DM_* file and a
default (@@VERSION) binding is created during linking.
This broke the build on RHEL5, RHEL6 and Debian Lenny. This is
because the make version in these distros returns results from the
$(wildcard *) command in a different order to the RHEL7 and F22
versions: this affects the ordering of the generated .export.sym
version script:
RHEL7/F22
for i in ./.exported_symbols.Base ./.exported_symbols.DM_1_02_99
./.exported_symbols.DM_1_02_98 ./.exported_symbols.DM_1_02_97
./.exported_symbols.DM_1_02_106 ./.exported_symbols.DM_1_02_105
./.exported_symbols.DM_1_02_103 ./.exported_symbols.DM_1_02_101
./.exported_symbols.DM_1_02_104 ./.exported_symbols.DM_1_02_100
290: 000000000003d101 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region_v1_02_104
*388: 000000000003cfc7 314 FUNC GLOBAL DEFAULT 12 dm_stats_create_region@@DM_1_02_106
391: 000000000003d101 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region@DM_1_02_104
*552: 000000000003cfc7 314 FUNC GLOBAL DEFAULT 12 dm_stats_create_region
944: 000000000003d101 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region_v1_02_104
992: 000000000003d101 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region@DM_1_02_104
RHEL6:
for i in ./.exported_symbols.Base ./.exported_symbols.DM_1_02_100
./.exported_symbols.DM_1_02_101 ./.exported_symbols.DM_1_02_103
./.exported_symbols.DM_1_02_104 ./.exported_symbols.DM_1_02_105
./.exported_symbols.DM_1_02_106 ./.exported_symbols.DM_1_02_97
./.exported_symbols.DM_1_02_98 ./.exported_symbols.DM_1_02_99; do\
290: 000000000003d0e1 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region_v1_02_104
390: 000000000003d0e1 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region@DM_1_02_104
*479: 000000000003cfa7 314 FUNC LOCAL DEFAULT 12 dm_stats_create_region
944: 000000000003d0e1 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region_v1_02_104
992: 000000000003d0e1 106 FUNC GLOBAL DEFAULT 12 dm_stats_create_region@DM_1_02_104
The F22 build has the correct behaviour (although the sort order is
inconsistent) but on RHEL6 the 1_02_106 symbol file appears after
version 1_02_104 which introduced the original symbol. This causes
the later version of the symbol to lose its version binding and be
reduced to local scope.
If using un-versioned exports of the current version of a symbol
(i.e. exported with the plain symbol name and no macro) and using
the linker script to set the symbol version, the current version
node must appear first in the version script: the un-versioned
symbol will be bound to the first version node found that contains
it.
On RHEL6 and the other older distros the original version of the
dm_stats_create_region() call sorted before the current version
(DM_1_02_104 vs. DM_1_02_106) leading to a subsequent link error for
the later symbol version:
dmsetup.o: In function `_do_stats_create_regions':
/root/src/git/lvm2/tools/dmsetup.c:4658: undefined reference to
`dm_stats_create_region'
Ensure that the ordering of entries in the version script is
consistent to avoid an old implementation shadowing a newer one by
sorting the list of file names before the loop:
$$(echo $(EXPORTED_SYMBOLS) | tr ' ' '\n' | sort -rnt_ -k5 )
This only sorts by patch level but this is sufficient to maintain
the correct order for current version files.
Tested on RHEL5, 6, 7 and F22.
Add support for the kernel precise_timestamps feature. This allows
regions to be created using counters with nanosecond precision.
A new dm_stats method, dm_stats_set_precise_timestamps() causes all
future regions created with this handle to attempt to enable precise
counters.
Fix the version export macros to make it possible to export two
different DM_* versions of a symbol: currently it is only possible for a
DM_* symbol to override a symbol in Base. Attempting to export two
symbols at different DM_* version levels (e.g. DM_1_02_104 and
DM_1_02_106) leads to a linker error due to a duplicate symbol
definition.
This is because the DM_EXPORTED_SYMBOL macro makes each exported symbol
the default (@@VERSION):
__asm__(".symver " #func "_v" #ver ", " #func "@@DM_" #ver )
Fix the macro to use a single '@' for a symbols exported in multiple
versions and rename the macros to DM_EXPORT_*:
DM_EXPORT_SYMBOL(func,ver)
DM_EXPORT_SYMBOL_BASE(func,ver)
For functions that have multiple implementations these macros control
symbol export and versioning.
Function definitions that exist in only one version never need to use
these macros.
Backwards compatible implementations must include a version tag of
the form "_v1_02_104" as a suffix to the function name and use the
macro DM_EXPORT_SYMBOL to export the function and bind it to the
specified version string.
Since versioning is only available when compiling with GCC the entire
compatibility version should be enclosed in '#if defined(__GNUC__)',
for example:
int dm_foo(int bar)
{
return bar;
}
#if defined(__GNUC__)
// Backward compatible dm_foo() version 1.02.104
int dm_foo_v1_02_104(void);
int dm_foo_v1_02_104(void)
{
return 0;
}
DM_EXPORT_SYMBOL(dm_foo,1_02_104)
#endif
A prototype for the compatibility version is required as these
functions must not be declared static.
The DM_EXPORT_SYMBOL_BASE macro is only used to export the base
versions of library symbols prior to the introduction of symbol
versioning: it must never be used for new symbols.
Single messages sent over unix sockets are limited in
size to /proc/sys/net/core/wmem_max, so send the 1MB
debug buffer in smaller chunks to avoid EMSGSIZE.
Also look for EAGAIN and retry sending for a limited
time when the reader is slower than the writer.
Also shift the location of that code so it's the same
as other requests.
With clusters larger than 3 nodes, the 32-byte debug buffer in
cpg_join_callback() is too small to contain all the node IDs, because
32-bit identifiers are generally rendered in 10 decimal digits. No fixed
size is good in all cases, but this is conditionally logged debug info,
so we can simply truncate it. Double the size, nevertheless.
Add a function to test whether the kernel precise_timestamps
feature is available in the current device-mapper driver version.
Presence of precise_timestamps also implies the availability of
latency histograms.
This mainly makes the description text use 80 columns.
There are a few minor adjustments to wording to help
the text layout, and a couple minor improvements to
descriptions.
The unlock call will fail in expected and normal cases,
and should not cause the command to fail. (An actual
unlock in the lock manager should never fail.)
We already use -lm functions in a couple of places (these are
satisfied by gcc built-ins for most builds): add a configure.in
check and explicitly link to -lm.
This reverts commit 70db1d523d.
Since we use 'strncpy' even for case where it exactly matches
the buffer size and \0 is not expected to be added there.
Before printing a commented automatic config value,
print a line describing what it is. Otherwise, the
commented value can look like it's a part of an
example preceding it.
The timerfd guarantees that it will return 8 bytes when a read(2)
is issued (a uint64_t giving the number of timer events during the
call). Check that it does so and log a non-fatal error if the byte
count is not 8.
Several interfaced in libdm-stats return a uint64_t when it is
only used to signal success/failure: change all these uses to
return a simple int instead.
Move code which runtime detects settings for cache_policy
out of config dir to cache seg handling code.
Also mark cache_mode as command profilable setting.
Revert back to already existing behavior which has been slightly
modified by a900d150e4.
At the end however it seem to be equal to change TID right with first
metadata write.
Existing code missed handling for 'unused' thin-pool which would
require to also check empty message list for TID==0.
So with the fix we now again preserve 'active' thin-pool volume
when first thin volume is created - this property was lost and caused
problems in cluster, where the lock was hold, but volume was no longer
active on the node.
Another missing part was the proper support for already increased,
but unfinished TID change.
So going back here with existing logic -
TID is increased with first MDA update.
Code allows start with either same TID or (TID-1).
If there are messages, TID must be lower by 1 for sending,
otherwise messages were already posted.
As cache_policy is evaluated in runtime, we no longer should use
CFG_COMMENTED, but have to switch to CFG_UNDEFINED.
So as long as the value is undefined, it's runtime evaluated.
Once it's set - it's always respected (no runtime fallback).
Also fix version of introduced settings to 2.2.128.
Commit f10ad95 introduced a regression causing the size of regions
passed in on the command line to be truncated to zero. Initialise
the 'this_len' variable to the supplied length to correct this.
Commit f10ad95 introduced a regression in the calculation of the
number of areas in a region created with the --areasize switch:
vg_hex-lv_home: Created new region with 0 area(s) as region ID 1
vg_hex-lv_swap: Created new region with 0 area(s) as region ID 1
Fis this by using the correct region size when calculating the
value.
When dmstats is run with -v or higher enable a per-area reporting
mode for statistics regions. This will output one row per area
(rather than one row per region) and adds additional fields of use
when viewing areas:
area_id - index within the region assigned by libdm-stats
area_start - the start location of the area in the containing
device.
Add a method to retrieve the offset of an area within the
containing region (rather than the offset within the containing
device returned by dm_stats_get_area_start()).
Although users of the library can calculate this themselves it is
better to provide this through a method call to avoid users making
assumptions about the structure of regions and areas.
The dm_stats_get_area_start (and its '_current_' variant) methods
are expected to return the start sector of the area in the
containing device.
Make sure the call adds region->start to the returned value.
Add a '--raw' switch to stats reports that causes us to report the
basic counter values rather than derived metrics for each visible
statistics region.
Add prefixes to all dmsetup report types to allow the 'group_all'
option to be effective:
DR_NAME name_
DR_INFO info_
DR_DEPS deps_
DR_TREE tree_
DR_NAME splitname_
When run with full verbosity dmsetup or dmstats reports will
output a figure that tracks a moving average over a window of the
last two intervals:
Interval #3 time delta: 999991087ns
Interval #3 mean duration: 999907064ns, current err: -8913ns
End interval #3 duration: 999991087ns
Adjusted sample interval duration: 999991087ns
Due to the narrow window this is a very crude estimate and is only
of use to someone debugging or modifying the stats clock: remove
the value and the global variables used to track it.
Anyone with a particular use for this information can construct a
better mean by calculating the value of a greater number of
intervals.
Unlike 'info -c' and 'stats report' the 'dmstats list' subcommand
does its own report processing. This complicates the handling of
the DR_STATS and DR_STATS_META fields and leads to inconsistent
behaviour between the different commands. In particular it causes
'stats list' to segfault when using 'all' field options:
Segmentation fault (core dumped)
Delete _stats_list() entirely and adapt _stats_report so that it
can correctly format a DR_STATS_META-only report request.
This requires passing the subcommand into _report_init() where it
is used in addition to the command name to select the default set
of report fields for the 'list' and 'report' stats subcommands.
With this change both 'list' and 'report' dmstats report will use
the correct report object type and ensure that it is initialised
appropriately for the field selection in use.
Although statistics and meta fields (region and area properties) share
the same object type the state of the handle they expect differs: meta
only expects a dm_stats_list() operation to have been performed whereas
statistics require a fully populated handle.
Distinguish between these requirements by separating the fields into
two distinct report types:
DR_STATS = 32,
DR_STATS_META = 64
The new category is described as "Mapped Device Statistics Region
Information" in the help text.
Make the use of the this_start and this_len variables easier to
follow and clarify the use of zero start and len arguments to
request a whole-device region.
Add a pair of fields to expose the current per-interval duation
estimate. The 'interval' field provides a real value in units of
seconds and the 'interval_ns' field provides the same quantity
expressed as a whole number of nanoseconds.
Do not include bits/time.h as it is an internal libc header file.
A comment at the top of the glibc specific bits/time.h says:
"Never include this file directly; use <time.h> instead."
This fixes the following build error with musl libc:
libdm-timestamp.c:37:23: fatal error: bits/time.h: No such file or directory
---
Compile tested with Alpine Linx (musl libc) and ubuntu 15.04
libdm/libdm-timestamp.c | 1 -
1 file changed, 1 deletion(-)
Introduce enums and global variables to record cleanly which command we
are processing and eliminate the historically inconsistent use of the
shifted argv[0] and fix assorted bugs discovered along the way.
Add dm_report_is_empty() to indicate there is no data awaiting output
and use this to suppress dmsetup report headings when no data is output
so we don't get a stray line saying 'Help' at the end of reporting help.
Define a report type (as the interface requires) so -o all selects
the right fields in splitname. (A fix for stats list will follow.)
Exit immediately if no device is supplied to dmsetup wipe_table instead
of hitting errors later and failing.
Adjust the command name printed in usage/help output to match command
invoked (most of the time).
The '--force' switch is only used by dmstats to allow either
creation or deletion of one or more regions on all devices.
These operations do not carry any risk: just a possible mess of
region IDs to be cleaned up.
Remove the use of '--force' for stats commands and change current
uses to a new '--alldevices' switch.
The region creation message just outputs the new region_id, e.g.:
Created region: 0
This is fine when the device is unambigous (as above) but produces
unhelpful output when creating multiple regions, or regions on
multiple devices:
Created region: 0
Created region: 0
Created region: 1
Created region: 2
Created region: 0
To address this refactor _stats_create_segments() (previously only
used when creating one-region-per-target for --segments) into a
more general _do_stats_create_regions() that can create regions
for each segment, or a single region spanning either the entire
device or a specied start/len range.
This allows us to output all region creation messages from a
single point where both the device name and all information needed
to derive the number of areas is available.
This allows us to log all these facts in the resulting messages:
vg_hex-lv_home: Created new region with 13 area(s) as region ID 0
vg_hex-lv_home: Created new region with 4 area(s) as region ID 1
vg_hex-lv_home: Created new region with 1 area(s) as region ID 2
vg_hex-lv_swap: Created new region with 1 area(s) as region ID 0
vg_hex-lv_root: Created new region with 10 area(s) as region ID 0
luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e: Created new region with 17 area(s) as region ID 0
vg_hex-lv_images: Created new region with 20 area(s) as region ID 0
vg_hex-lv_images: Created new region with 4 area(s) as region ID 1
Don't use cryptic abbreviations and make sure that all values can
be understood by someone not familiar with the clock internals.
Include the current interval number (inverse of the _count) in all
interval update messages and attempt to align interval timestamp
logs for interval counts < 99,999.
If _stats_report fails (e.g. due to an invalid device on the
command line) destroy the _report to prevent stats columns headings
from being displayed.
This also requires a change in main to test the return from
_perform_command_for_all_repeatable_args inside the interval loop
and exit immediately in case of error.
The _update_interval_times() function is called once per reported
object: when shutting down at the end of a run only the first call
should free timestamps. Clear the timestamp pointers after free
and use this to signal to other callers that the clock is already
shut down.
If the Linux timerfd interface to POSIX timers is available at compile
time use it for all report interval timekeeping. This gives more
accurate interval timing when the per-interval processing time is less
than the configured interval and simplifies the timestamp bookkeeping
required to keep accurate time.
For systems without timerfd support fall back to the simple usleep based
timer.
Change logic and naming of some internal API functions.
cache_set_mode() and cache_set_policy() both take segment.
cache mode is now correctly 'masked-in'.
If the passed segment is 'cache' segment - it will automatically
try to find 'defaults' according to profiles if the are NOT
specified on command line or they are NOT already set for cache-pool.
These defaults are never set for cache-pool.
Add code to detect available cache features.
Support policy_mq & policy_smq features which might be disabled.
Introduce global_cache_disabled_features_CFG.
Add new profilable configurables:
allocation/cache_policy
allocation/cache_settings
and mark allocation/cache_pool_chunk_size as profilable as well.
Obsolete allocation/cache_pool_cachemode and
introduce new allocation/cache_mode instead.
Rename DEFAULT_CACHE_POOL_POLICY to DEFAULT_CACHE_POLICY.
Request a transient LV lock from lvmlockd when
converting an LV. If the LV is inactive when
lvconvert is run, the LV lock will be acquired
and then released when the command is done.
If the LV is active, a persistent lock exists
already and the transient lock request does nothing.
This fixes the issue that had been mentioned in the
comment previously.
lvrename should not be done if the LV is active on another host.
This check was mistakenly removed when the code was changed to
use LV uuids in locks rather than LV names.
Since libdm-stats only uses fmemopen'd FILE objects the only way
that a close can fail is corruption of the memory containing the
FILE: check for this case and emit a backtrace if it occurs.
libdm/libdm-stats.c: 338 in _stats_parse_list()
libdm/libdm-stats.c: 341 in _stats_parse_list()
libdm/libdm-stats.c: 481 in _stats_parse_region()
libdm/libdm-stats.c: 487 in _stats_parse_region()
libdm/libdm-stats.c: 487 in _stats_parse_region()
- Calling "fclose" without checking return value
Remove an unneccessary conditional operator and simplify the logic
in _nr_areas:
libdm/libdm-stats.c: 501 in _nr_areas() - Control flow issues (DEADCODE)
The error path of _stats_list frees the task and stats objects:
don't try to branch to it before they have been allocated.
tools/dmsetup.c: 4589 in _stats_help() - Null pointer dereferences (FORWARD_NULL)
Make sure the newly created handle is freed if we are unable to
also create the pool for it.
tools/dmsetup.c: 4255 in _stats_list() - Variable "dms" going out of scope leaks the storage it points to.
Make sure comm is closed in the error path of _program_id_from_proc().
libdm/libdm-stats.c: 98 in dm_stats_create() - Variable "comm" going out of scope leaks the storage it points to.
There's no point testing _report here in _stats_report: it's always
initialised before the function is called and if the check did fail
we'd end up freeing an uninitialized dm_task in the error path.
tools/dmsetup.c: 4389 in _stats_report() - Declaring variable "dmt" without initializer.
The check for other sanlock lockspaces was not checking
that the lockspace type was sanlock, so if dlm lockspaces
were visible, they were wrongly included.
When vgremove is used to remove multiple VGs in one command,
e.g. vgremove foo bar, the first VG (foo) that is removed
may have held the sanlock global lock. In this case,
do not continue removing further VGs (bar) without the
global lock.
Add the libdm-stats module to libdm: this implements a simple interface
for creating, managing and interrogating I/O statistics regions and
areas on device-mapper devices.
The library interface is documented in libdevmapper.h and provides a
'dm_stats' handle that is used to perform statistics operations and
obtain data.
Public methods are provided to create and destroy handles and to list,
create, and destroy statistics regions as well as to obtain and parse
counter data and calculate rate-based metrics.
This commit also adds a 'dmsetup stats' (aka 'dmstats') command with
'clear', 'create', 'delete', 'list', 'print', and 'report' sub-commands.
See the library documentation and the dmstats.8 manual page for detailed
API and command descriptions.
Don't do interval management and external timekeeping for stats in
dm_report: let applications handle this on their own.
Since this has not been included in a release remove it from the
library entirely and handle report timing directly inside dmsetup.
Add a function to print column headings regardless of whether they
have already been output. This will be used by dmstats to issue
periodic reminders of the column headings.
This patch removes a check for RH_HEADINGS_PRINTED from
_report_headings that prevents headings being displayed if the flag
is already set; this check is redundant since the only existing
caller (_output_as_columns()) already tests the flag before
calling the function.
Not releasing objects back to the pool is fine for short-lived
pools since the memory will be freed when dm_pool_destroy() is
called.
Any pool that may be long-lived needs to be more careful to free
objects back to the pool to avoid leaking memory that will not be
reclaimed until the pool is destroyed at process exit time.
The report pool currently leaks each headings line and some row
data.
Although dm_report_output() tries to free the first allocated row
this may end up freeing a later row due to sorting of the row list
while reporting. Store a pointer to the first allocated row from
_do_report_obect() instead and free this at the end of
_output_as_columns(), _output_as_rows(), and dm_report_clear().
Also make sure to call dm_pool_free() for the headings line built
in _report_headings().
When dmstats is introduced it will maintain dm_report objects for
the whole lifetime of the process: without these changes a stats
report could leak around 600k in 10m (exact rate depends on field
selection and data values):
top - 12:11:32 up 4 days, 3:16, 15 users, load average: 0.01, 0.12, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6473 root 20 0 130196 3124 2792 S 0.0 0.0 0:00.00 dmstats
top - 12:22:04 up 4 days, 3:26, 15 users, load average: 0.06, 0.11, 0.13
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6498 root 20 0 130836 3712 2752 S 0.0 0.0 0:00.60 dmstats
With this patch no increase in RSS is seen:
top - 13:54:58 up 4 days, 4:59, 15 users, load average: 0.12, 0.14, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.0 0.0 0:00.00 dmstats
top - 14:04:31 up 4 days, 5:09, 15 users, load average: 1.02, 0.67, 0.36
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.3 0.0 0:00.32 dmstats
This also affects report output for repeating reports in the
DM_REPORT_OUTPUT_COLUMNS_AS_ROWS case; row state is not fully cleared for
the next iteration leading to progressive growth of the heading width:
vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
253:253:253:253:253
2:0:1:4:3
L--w:L--w:L--w:L--w:L--w
1:2:1:1:1
3:1:1:1:2
0:0:0:0:0
LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
:::::vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
:::::253:253:253:253:253
:::::2:0:1:4:3
:::::L--w:L--w:L--w:L--w:L--w
:::::1:2:1:1:1
:::::3:1:1:1:2
:::::0:0:0:0:0
:::::LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
This adds the infrastructure, code paths, error reporting,
etc. to handle storage errors, or storage loss, under the
sanlock leases in a VG that is being used. The loss of
storage means sanlock cannot renew its leases, which means
that the host needs to stop using the shared VG before its
leases expire.
This still requires manually shutting down a VG that has
lost lease storage, e.g. unmounting file systems,
deactivating LVs in the VG. The next step is to
automatically use a command like blkdeactivate to do that.
tools/polldaemon.c:465: uninit_use_in_call: Using uninitialized value "id.vg_name" when calling "print_log".
tools/polldaemon.c:465: uninit_use_in_call: Using uninitialized value "id.lv_name" when calling "print_log".
/lib/log/log.c:88: warning[invalidScanfArgType_int]: %llu in format string (no. 2) requires 'unsigned long long *' but the argument type is 'long long *'.
daemons/lvmlockd/lvmlockd-core.c:791: error[uninitstring]: Dangerous usage of 'version' (strncpy doesn't always null-terminate it).
The dlm global lockspace is automatically added when the
first dlm VG lockspace is added. Reverse this by removing
the dlm global lockspace after the last dlm VG lockspace
is removed. (Remove old non-working code that did this
based on an old command that could explicitly add/remove
the dlm global lockspace.)
Whenver reporting field name is registered with libdevmapper and if
the field name contains any number of underscores ('_'), libdm
can now automatically recognize any of its variant without any
underscores used.
For example:
..for underscores in prefixes:
pvs -o pv_name
pvs -o name
pvs -o pvname (newly recognized besides pvname)
..for underscores in the name:
lvs -o cache_mode
lvs -o cachemode
..or even multiple underscores:
pvs -o pv___na___me
It's all variant of the same field name.
No commands set has_subcommands yet.
Move multiple device loop to separate function because we'll
soon want to call it repeatedly.
(Based on patch from bmr.)
Use refresh_filters instead of destroy_filters and init_filters
in refresh_toolcontext fn which deals with cmd->initialized.filters
correctly on refresh.
Just shuffle the items and put them into logical groups so it's
visible at first sight what each group contains - it makes it a bit
easier to make heads and tails of the whole cmd_context monster.
When a command is flagged with NO_METADATA_PROCESSING flag, it means
such command does not process any metadata and hence it doens't require
lvmetad, lvmpolld and it can get away with no locking too. These are
mostly simple commands (like lvmconfig/dumpconfig, version, types,
segtypes and other builtin commands that do not process metadata
in any way).
At first, when lvm command is executed, create toolcontext without
initializing connections (lvmetad,lvmpolld) and without initializing
filters (which depend on connections init). Instead, delay this
initialization until we know we need this. That is, until the
lvm_run_command fn is called in which we know what the actual
command to run is and hence we can avoid any connection, filter
or locking initiliazation for commands that would not make use
of it anyway.
For all the other create_toolcontext calls, we keep the original
behaviour - the filters and connections are initialized together
with the toolcontext.
Make it possible to decide whether we want to initialize connections and
filters together with toolcontext creation.
Add "filters" and "connections" fields to struct
cmd_context_initialized_parts and set these in cmd_context.initialized
instance accordingly.
(For now, all create_toolcontext calls do initialize connections and
filters, we'll change that in subsequent patch appropriately.)
Move original lvmetad and lvmpolld initialization code from
_process_config fn to their own functions _init_lvmetad and
_init_lvmpolld (both covered with single _init_connections fn).
Add struct cmd_context_initialized_parts to wrap up information
about which cmd context pieces are initialized and add variable
of this struct type into struct cmd_context.
Also, move existing "config_initialized" variable that was directly
part of cmd_context into the new cmd_context.initialized wrapper.
We'll be adding more items into the struct cmd_context_initialized_parts
with subsequent patches...
This tries harder to avoid creating duplicate global locks in
sanlock VGs by refusing to create a new sanlock VG with a
global lock if other sanlock VGs exist that may have a gl.
vgsummary information contains provisional VG information
that is obtained without holding the VG lock. This info
can be used to lock the VG, and then read it with vg_read().
After the VG is read properly, the vgsummary info should
be verified.
Add the VG lock_type to the vgsummary. It needs to be
known before the VG can be locked and read.
This is a regression introduced by commit
6c0e44d5a2 which changed
the way dev_cache_get fn works - before this patch, when a
device was not found, it fired a full rescan to correct the
cache. However, the change coming with that commit missed
this full_rescan call, causing the lvmcache to still contain
info about PVs which should be filtered now.
Such situation may have happened by coincidence of using
old persistent cache (/etc/lvm/cache/.cache) which does not
reflect the actual state anymore, a device name/symlink which
now points to a device which should be filtered and a fact we
keep info about usable DM devices in .cache no matter what
the filter setting is.
This bug could be hidden though by changes introduced in
commit f1a000a477 as it
calls full_rescan earlier before this problem is hit.
But we need to fix this anyway for the dev_cache_get
to be correct if we happen to use the same code path
again somewhere sometime.
For example, simple reproducer was (before commit
1a000a477558e157532d5f2cd2f9c9139d4f87c):
- /dev/sda contains a PV header with UUID y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M
- lvm.conf: filter = [ "r|.*|" ]
- rm -f .cache (to start with clean state)
- dmsetup create test --table "0 8388608 linear /dev/sda 0" (8388608 is
just the size of the /dev/sda device I use in the reproducer)
- pvs (this will create .cache file which contains
"/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M"
as well as "/dev/mapper/test" and the target node "/dev/dm-1" - all the
usable DM mappings (and their symlinks) get into the .cache file even
though the filter "is set to "ignore all" - we do this - so far it's OK)
- dmsetup remove test (so we end up with /dev/disk/by-id/lvm-pv-uuid-...
pointing to the /dev/sda now since it's the underlying device
containing the actual PV header)
- now calling "pvs" with such .cache file and we get:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M vg lvm2 a-- 4.00g 0
Even though we have set filter = [ "r|.*|" ] in the lvm.conf file!
Moved out from lib/display and a little documentation added.
It's tuned to LVM's requirements historically and its behaviour
might not always be what you would expect.
There is no longer an "enable" option for the global lock,
so remove the bit of code that was checking for it. It
was an optional variation anyway, and not one that was likely
to be used.
Also update the corresponding comment describing global lock
creation.
Stop removing hyphens when = is seen. With an option
like --profile=thin-performance, the hyphen removal
will stop at = and will not remove - after thin.
Stop removing hyphens altogether when a stand alone arg
of -- appears.
Simply running concurrent copies of 'pvscan | true' is enough to make
clvmd freeze: pvscan exits on the EPIPE without first releasing the
global lock.
clvmd notices the client disappear but because the cleanup code that
releases the locks is triggered from within some processing after the
next select() returns, and that processing can 'break' after doing just
one action, it sometimes never releases the locks to other clients.
Move the cleanup code before the select.
Check all fds after select().
Improve some debug messages and warn in the unlikely event that
select() capacity could soon be exceeded.
When there are duplicate global locks, check if the gl
is still enabled each time a gl or vg lock is acquired
in the lockspace. Once one of the duplicates is disabled,
then other hosts will recognize that the issue is resolved
without needing to restart the lockspaces.
Move the DEBUG_MEM decision inside libdevmapper.so instead of exposing
it in libdevmapper.h which causes failures if the binary and library
were compiled with opposite debugging settings.
pvscan autoactivation does not work for lockd VGs because
lock start is needed on a lockd VG before locking can be
done for it. Add a check to skip the attempt at autoactivate
rather than calling it, knowing it will fail.
Add a comment explaining why pvscan --cache works fine for
lockd VGs without locks, and why autoactivate is not done.
. the poll check will eventually call finish which will
write the VG, so an ex VG lock is needed from lvmlockd.
. fix missing unlock on poll error path
. remove the lockd locking while monitoring the progress
of the command, as suggested by the earlier FIXME comment,
as it's not needed.
The CFG_SECTION_NO_CHECK flag can be used to mark a section
and its whole subtree as containing settings where checks
won't be made (lvmconfig --validate).
These are setting where we don't know the names and and type
in advance and they're recognized in runtime. As we don't know
the type and name in advance, we can't do any checks here
of course.
Use this flag with great care as it disables config checks
for the whole config subtree found under such section.
This flag is going to be used by subsequent patches from
Zdenek to support some cache settings...
Recent change to move the polling outside of core lvconvert
code was wrongly using 'lv' and 'vg' structs which can't be
used outside of the core code, which caused seg fault.
Properly isolate all use of lv structs within the core of
the lvconvert code, saving any information necessary,
(esp lvid). After the core of lvconvert is done, use
the saved information to do polling.
FIXME: the need for is_merging_origin and is_merging_origin_thin
in this patch is ugly, and a cleaner way should be found to deal
with that than what is done here.
Also it effectively removed all hacks in _lvconvert_merge_single
performing ugly: VG reread, unlock, polling, lock sequence.
Moreover all polling operations are postponed after all conversions
are finished.
lvm2 (while locking via lvmlockd) should now be able to run with
or without lvmpolld while performing poll operations originating
in lvconvert command.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Comply with the rules we have for log_error and log_warn...
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate /dev/sda1 --force
WARNING: Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
libblkid may return the list of signatures found, but it may not
provide offset and size for each signature detected. This may
happen in case signatures are mixed up or there are more, possibly
overlapping, signatures found.
Make lvm commands pass if such situation happens and we're using
--force (or any stronger force method).
For example:
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate --force /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
The vgchange/lvchange activation commands read the VG, and
don't write it, so they acquire a shared VG lock from lvmlockd.
When other commands fail to acquire a shared VG lock from
lvmlockd, a warning is printed and they continue without it.
(Without it, the VG metadata they display from lvmetad may
not be up to date.)
vgchange/lvchange -a shouldn't continue without the shared
lock for a couple reasons:
. Usually they will just continue on and fail to acquire the
LV locks for activation, so continuing is pointless.
. More importantly, without the sh VG lock, the VG metadata
used by the command may be stale, and the LV locks shown
in the VG metadata may no longer be current. In the
case of sanlock, this would result in odd, unpredictable
errors when lvmlockd doesn't find the expected lock on
disk. In the case of dlm, the invalid LV lock could be
granted for the non-existing LV.
The solution is to not continue after the shared lock fails,
in the same way that a command fails if an exclusive lock fails.
When lvmlockd is compiled without support for one of the
lock managers (sanlock or dlm), and a command tries to use
one of them, explain that in the error message.
Don't abort test when clvmd processes two VGs concurrently.
CLVMD: ioctl/libdm-iface.c:1940 Internal error: Performing unsafe table load while 3 device(s) are known to be suspended: (253:19)
When lvm is built without lvmlockd support, vgcreate using a
shared lock type would succeed and create a local VG (the
--shared option was effectively ignored). Make it fail.
Fix the same issue when using vgchange to change a VG to a
shared lock type.
Make the error messages consistent.
A segfault was reported when extending an LV with a smaller number of
stripes than originally used. Under unusual circumstances, the cling
detection code could successfully find a match against the excess
stripe positions and think it had finished prematurely leading to an
allocation being pursued with a length of zero.
Rename ix_offset to num_positional_areas and move it to struct
alloc_state so that _is_condition() can obtain access to it.
In _is_condition(), areas_size can no longer be assumed to match the
number of positional slots being filled so check this newly-exposed
num_positional_areas directly instead. If the slot is outside the
range we are trying to fill, just ignore the match for now.
(Also note that the code still only performs cling detection against
the first segment of the LV.)
Replace misleading "not found" in the log message when
devices/preferred_names is set to empty array:
Really not found:
device/dev-cache.c:689 devices/preferred_names not found in config: using built-in preferences
Found, but empty:
config/config.c:1431 Setting devices/preferred_names to preferred_names = [ ]
device/dev-cache.c:689 devices/preferred_names is empty: using built-in preferences
Commit 7e728fe1a1 added a log call
directly in find_config_tree_array when defaults are used.
This patch also adds the log for the value which is found in
existing configuration and for which defaults are not used.
For example:
Defaults used:
config/config.c:1428 devices/scan not found in config: defaulting to scan = [ "/dev" ]
Value defined in configuration used:
config/config.c:1431 Setting devices/scan to scan = [ "/dev", "/mydev", "/abc" ]
This makes the logging consistent with the other find_config_tree_* functions.
Policy name has to be always defined.
Capture it as an internal error before write.
When reading metadata without defined policy name, use default defined policy.
TODO: Unsure, but it might have to be actually always 'mq' in this case.
Keep policy name separate from policy settings and avoid
to mangling and demangling this string from same config tree.
Ensure policy_name is always defined.
Use find_config_tree_array for all config arrays. Also, add
INTERNAL_ERROR in case there should have been at least default
value defined for a setting but it was not returned for some
reason (either config_settings.h misconfiguration or other config
tree error printed by functions called by find_config_tree_array).
Both lock_start filters were being skipped when any lock-opt
values were used. The "auto" lock-opt should cause the
auto_lock_start_list to be used. The lock_start_list should
always be used.
The behavior of lock_start_list/auto_lock_start_list are tested
and verified to behave like volume_list/auto_activation_volume_list.
Since the default was changed to wait for lock-start to finish,
the "wait" and "autowait" lock-opt values are not needed, but a
new "autonowait" is added to the existing "nowait" avoid the
default waiting.
There are two different failure conditions detected in
access_vg_lock_type() that should have different error
messages. This adds another failure flag so the two
cases can be distinguished to avoid printing a misleading
error message.
Require global/{thin,cache}_{check,repair}_options to be always defined.
If not defined directly by user in the configuration and if there's no
concrete default option to use, make "" (empty string) the default one -
it's then clearly visible in the "lvmconfig --type default" (and
generated lvm.conf) and also it makes its handling in the code more
straightforward so we don't need to handle undefined values.
This means, if there are no default values for these settings defined,
we end up with this generated now:
{thin,cache}_{check,repair}_options = [ "" ]
So the value is never undefined and if it is, it's an error.
(The cache_repair_options is actually not used in the code at the moment,
but once the code using this setting is in, it will follow the same logic
as used for thin_repair_options.)
The "exported" state of the VG can be useful with lockd VGs
because the exported state keeps a VG from being used in general.
It's a way to keep a VG protected and out of the way.
Also fix the command flags: ALL_VGS_IS_DEFAULT is not true for
vgimport/vgexport, since they both return errors immediately if
no VG args are specified. LOCKD_VG_SH is not true for vgexport
beause it must use an ex lock to write the VG.
When --nolocking is used (by vgs, lvs, pvs):
. don't use lvmlockd at all (set use_lvmlockd to 0)
. allow lockd VGs to be read
When --readonly is used (by vgs, lvs, pvs, vgdisplay, lvdisplay,
pvdisplay, lvmdiskscan, lvscan, pvscan, vgcfgbackup):
. skip actual lvmlockd locking calls
. allow lockd VGs to be read
. check that only shared gl/vg locks are being requested
(even though the actually locking is being skipped)
. check that no LV locks are requested, because no LVs
should be activated or used in readonly mode
. disable using lvmetad so VGs are read from disk
It is important to note the limited commands that accept
the --nolocking and --readonly options, i.e. no commands
that change/write a VG or change/activate LVs accept these
options, only commands that read VGs.
A new lockd lock needs to be created for the new LV
created by split mirror and split snapshot. Disallow
these options in lockd VGs until that is implemented.
There are at least a couple instances where
the lock_args check does not work correctly,
(listed in the comment), so disable the
NULL check for lock_args until those are
resolved.
log_warn was added recently because no known code used
the given condition, but running pvcreate on an existing
PV uses this case, and should not produce a warning.
Put the change from commit #10d27998b3d2f6100e9e29e83d1d99948c55875f
back so we have working tree again for now. This code needs a bit of
a cleanup to return proper return value to check...
lib/format1/import-export.c:167: var_deref_op: Dereferencing null pointer "vg->lvm1_system_id"
lib/cache/lvmetad.c:1023: var_deref_op: Dereferencing null pointer "this"
daemons/lvmlockd/lvmlockd-core.c:2659: check_after_deref: Null-checking "act" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
/daemons/lvmetad/lvmetad-core.c:1024: check_after_deref: Null-checking "pvmeta" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
If running lvmconf ... --startstopservice --mirrorservice in systemd
environment, handle lvm2-cmirrord accordingly. A typo in the script
caused the lvm2-cmirrord to not start/stop immediately, it was
only enabled/disabled (so the --startstopservice was ignored in this
case).
This prevents 'lvremove vgname' from attempting to remove the
hidden sanlock LV. Only vgremove should remove the hidden
sanlock LV holding the sanlock locks.
tools/polldaemon.c:457: array_null: Comparing an array to null is not useful: "lv->lvid.s"
The lv->lvid.s is never NULL. The check was supposed to be *lv->lvid.s
to check if the string is not empty.
... Using uninitialized value "lockd_state" when calling "lockd_vg"
(even though lockd_vg assigns 0 to the lockd_state, but it looks at
previous state of lockd_state just before that so we need to have
that properly initialized!)
libdm/libdm-report.c:2934: uninit_use_in_call: Using uninitialized value "tm". Field "tm.tm_gmtoff" is uninitialized when calling "_get_final_time".
daemons/lvmlockd/lvmlockctl.c:273: uninit_use_in_call: Using uninitialized element of array "r_name" when calling "format_info_r_action". (just added FIXME as this looks unfinished?)
lib/log/log.c:115: leaked_storage: Variable "st" going out of scope leaks the storage it points to
daemons/lvmpolld/lvmpolld-core.c:573: leaked_storage: Variable "cmdargv" going out of scope leaks the storage it points to
daemons/lvmlockd/lvmlockd-core.c:5341: leaked_handle: Handle variable "fd" going out of scope leaks the handle
daemons/lvmlockd/lvmlockctl.c:575: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:571: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:385: leaked_handle: Handle variable "s" going out of scope leaks the handle
Before, we used general find_config_tree_node function to retrieve
array values. This had a downside where if the node was not found,
we had to insert default values directly in-situ after the
find_config_tree_node call. This way, we had two copies of default
values - one in config_settings.h and the other one directly in the
code where we found out that find_config_tree_node returned NULL and
hence we needed to fall back to defaults.
With separate find_config_tree_array used for array config values,
we keep all the defaults centrally in config_settings.h because
the new find_config_tree_array automatically returns these defaults
if it can't find any value set in the configuration.
This patch just makes the behaviour exactly the same for arrays as
for any other non-array type where we call find_config_tree_<type>
already, hence making the internal interface for handling array
values consistent with the rest of the config types.
if lvm2 is built with debug memory options dm_free() is not
mapped directly to std library's free(). This may cause memory corruption
as a line buffer may get reallocated in getline with realloc.
This is a temporary hotfix. Other debug memory failure needs to
be investigated and explained.
including the allow_override_lock_modes setting.
It was not possible to override default lock modes any longer,
since the command line options had already been removed.
A mechanism will probably be required later that puts part of
this back.
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
Add support for sending message in suspend tree for thin-pools.
When this operation is requested whole subtree suspend is then skipped.
This is experimantal support for new lvm2 code for sending message
in suspend phase where 'thin-pool origin-only suspend' will send
messages instead of really suspending thin-pool tree.
When suspening thin volume origin-only - only thin volume is suspended,
then messages are posted and thin-pool suspend is skipped.
Recognize date and time specification within selection criteria
that is formulated in a more free-form way besides to the original
basic YYYY-MM-DD HH:MM format that libdevmapper supports.
Currently, this free-form format is recognized for lv_time field.
Users are able to use expressions from this set:
- weekday names ("Sunday" - "Saturday" or abbreviated as "Sun" - "Sat")
- labels for points in time ("noon", "midnight")
- labels for a day relative to current day ("today", "yesterday")
- points back in time with relative offset from today (N is a number)
( "N" "seconds"/"minutes"/"hours"/"days"/"weeks"/"years" "ago")
( "N" "secs"/"mins"/"hrs" ... "ago")
( "N" "s"/"m"/"h" ... "ago")
- time specification either in hh:mm:ss format or with AM/PM suffixes
- month names ("January" - "December" or abbreviated as "Jan" - "Dec")
For example:
$ date
Fri Jul 3 10:11:13 CEST 2015
$ lvmconfig --type full report/time_format
time_format="%a %Y-%m-%d %T %z %Z [%s]"
$ lvs
LV VG Time
lvol0 vg Fri 2014-08-22 21:25:41 +0200 CEST [1408735541]
lvol2 vg Sun 2015-04-26 14:52:20 +0200 CEST [1430052740]
root fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
swap fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time=yesterday'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "June 30"'
LV VG Time
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "noon June 30"'
LV VG Time
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 9AM"'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 1PM"'
LV VG Time
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
...and so on.
Wire the dm_report_reserved_handler instance call in reporting/selection
infrastructure to handle reserved value actions (currently only
DM_REPORT_RESERVED_PARSE_FUZZY_NAME and DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
actions).
With fuzzy names we mean the names for which it's hard or even impossible
to enumerate all possible variations of the name - the name needs to
be evaluated. An example of fuzzy name is a name which has a base
(substring) which matches and it can contain arbitrary variations
around this base. We can cover human language better with fuzzy
names as people may use several different names (or sentences) to
denote the same thing.
With dynamic values we mean the values which are not constants
and they need to be evaluated in runtime. An example of dynamic
value is a value which depends on current system state (e.g. time,
current configuration or any other state which may change and it
needs runtime evaluation).
There's a handler that can be registered with reporting/selection
using dm_report_reserved_handler instance. This is a central point
in which the computation/evaluation happens when processing reserved
values. Currently, there are two actions declared:
DM_REPORT_RESERVED_PARSE_FUZZY_NAME
(translates fuzzy name into canonical name)
DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
(gets value for canonical name)
The handler is then registered as value in struct
dm_report_reserved_value (see explaining comments besided
the struct dm_report_reserved_value in libdevmapper.h).
Also, this patch provides support for simple caching of values
used during report/selection via dm_report_value_cache_{set,get}.
This is supposed to be used mainly in the dm_report_reserved_handler
instances to save values among calls so all the handler calls work
with the same base value used in computation/evaluation and/or
possibly to save resources if the evaluation is more time-consuming.
The cache is attached to the dm_report handle and so the cache is
dropped one dm_report is dropped.
Generic numbers and time values share some operators so make sure
we have the flags correctly adjusted based on expected type if
we're using reserved values.
dm_snprintf() returns upon success the number of characters printed
(excluding the null byte used to end output to strings).
So add extra byte to preserve \0.
This fixes regression when displaying more then a single lv name.
_node_name() prepares into dm_tree internal buffer device
name and it (major:minor) for easy usage for debug messages.
To avoid any allocation a small buffer in struct dm_tree is preallocated
to store this message.
This patch adds support for time values used in reporting fields.
The raw values are always stored as number of seconds since epoch.
The support that comes with this patch is the basic one which allows
only for recognition of strictly formatted date and time in selection
criteria (the format follows a subset of formats defined by ISO 8601):
date time timezone
date:
YYYY-MM-DD (or shortly YYYYMMDD)
YYYY-MM (shortly YYYYMM), auto DD=1
YYYY, auto MM=01 and DD=01
time:
hh:mm:ss (or shortly hhmmss)
hh:mm (or shortly hhmm), auto ss=0
hh (or shortly hh), auto mm=0, auto ss=0
timezone (always with + or - sign):
+hh:mm or -hh:mm (or shortly +hhmm or -hhmm)
+hh or -hh
Or directly the time (number of seconds) since Epoch (1970-01-01 00:00:00 UTC)
when the number value is prefixed by "@":
@number_of_seconds_since_epoch
This patch also adds aliases for comparison operators
used together with time values which are more intuitive
to use:
since (as alias for >=)
after (as alias for >)
until (as alias for <=)
before (as alias for <)
For example:
$ lvmconfig --type full report/time_format
time_format="%Y-%m-%d %T %z %Z [%s]"
$ lvs -o name,time vg
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol2 2015-04-26 14:52:20 +0200 CEST [1430052740]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30 6:00"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
$ lvs vg -o name,time -S 'time since @1435519541'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
This is basic time recognition support that is directly a part of
libdevmapper. Recognition of more free-form expressions will be a
part of subsequent patches.
Just like we have DEFAULT_USE_LVMETAD (or DEFUALT_USE_LVMPOLLD), use
fallback_to_lvm1=1 lvm.conf setting if we configured lvm2 with
--enable-lvm1-fallback and use fallback_to_lvm1=0 otherwise.
Also, generate proper lvm.conf.in with unconfigured value.
This patch allows for registration and recognition of reserved
values which are ranges, so they're composed of two values actually
to denote the lower and upper bound for the range (stored as an array
with exactly two items to define the boundaries).
Also, this patch allows for flagging reserved values as named-only
which means that such values are not strictly reserved. The strictly
reserved values are reserved values as used before this patch.
Distinction between strictly-reserved and named-only values
is clearly visible with comparisons. Normally, strictly reserved
value is not accounted for if we do "greater than" or "lower than"
comparisons, for example:
1 2 3 ....
|
abc
- we have "abc" as reserved value for field with value "2"
- the value reported for the field is "abc" (or "2", it doesn't matter here)
- the selection we're processing is -S 'field < abc'
- the result of the selection gives nothing as "abc" is strictly
reserved value (bound to "2") and there's no order defined for
it and it would only match if we directly compared the value
(so -S 'field = abc' would match)
With named-only values, the "abc" is named-only value for "2",
so selection -S 'field < abc" is the same as using -S 'field < 2'.
The "abc" is just an alias for some value so the value or its
assigned name can be used equally in selection criteria.
Make it possible to define format for time that is displayed.
The way the format is defined is equal to the way that is used
for strftime function, although not all formatting options as
used in strftime are available for LVM2 - the set is restricted
(e.g. we do not allow newline to be printed). The lvm.conf
comments contain the whole list that LVM2 accepts for time format
together with brief description (copied from strftime man page).
For example:
(defaults used - the format is the same as used before this patch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m 2015-06-25 16:18:34 +0200
lvol1 vg -wi-a----- 4.00m 2015-06-29 09:17:11 +0200
(using 'time_format = "@%s"' in lvm.conf - number of seconds
since the Epoch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m @1435241914
lvol1 vg -wi-a----- 4.00m @1435562231
Commit e587b0677b broke the build on
systems where /bin/sh is Dash, for example.
Origin patch by Ferenc Wágner <wferi@niif.hu> changed later to
avoid using shell call, so makefile add 'server' target when
one of metad or polld daemon is requested.
Do not display settings with undefined default values, but do display
these settings in case the value is defined directly in any part of
the existing config cascade.
For example, the lvmconfig --type current always displays these settings
(as it's somewhere in "current" configuration cascade that makes it defined).
The lvmconfig --type full displays these settings only if it's defined
somewhere in the cascade, but not if default value is used instead
The lvmconfig --type default never displays these settings...
More concrete example - let's have activation/volume_list directly
set in lvm.conf and activation/read_only_volume_list not set.
Both of these settings have *undefined default* values.
$lvmconfig --type full activation/volume_list activation/read_only_volume_list
volume_list="/dev/vg/lv"
(...only volume_list is defined, hence it's printed)
However, the comments will display more info (see also previous commit):
$lvmconfig --type full activation/volume_list activation/read_only_volume_list --withsummary
# Configuration option activation/volume_list.
# Only LVs selected by this list are activated.
# This configuration option does not have a default value defined.
# Value defined in existing configuration has been used for this setting.
volume_list="/dev/vg/lv"
# Configuration option activation/read_only_volume_list.
# LVs in this list are activated in read-only mode.
# This configuration option does not have a default value defined.
Display comment abour value from existing config being used. For example:
$ lvmconfig --type full --withsummary report/compact_output report/buffered
# Configuration option report/compact_output.
# Do not print empty report fields.
# Value defined in existing configuration has been used for this setting.
compact_output=1
# Configuration option report/buffered.
# Buffer report output.
buffered=1
The lvmconfig --type full is actually a combination of --type current
and --type missing together with --mergedconfig options used.
The overall outcome is a configuration tree with settings as LVM sees
it when it looks for the values - that means, if the setting is defined
in some config source (lvm.conf, --config, lvmlocal.conf or any profile
that is used), the setting is used. Otherwise, if the setting is not
defined in any part of the config cascade, the defaults are used.
The --type full displays exactly this final tree with all the values
defined, either coming from configuration tree or from defaults.
Synchronize with udev logic before reusing device as snapshot.
This patch tries to fix the problem with udev, where we manage
to 'active' LV for clearing, then we deactivate such device and
active again as member of 'origin&snapshot' tree all in 1 step.
There needs to be a sync point where udev has time to remove all links,
otherwise we race with scans and we may end-up with mysterious 'free'
links in the system pointing to wrong dm names.
This patch tries to fix failing topology cluster tests..
We shouldn't be adding spaces by default in output as that
may be be used already in scripts and especially for the eval
in shell scripts where spaces are not allowed between key
and value!
Add --withspaces option to lvmconfig for pretty output with
more space in for readability.
It's not an error to define empty values for
{thin,cache}_{check,repair}_options - such empty value means no
options are passed when these external commands are called.
If blkid wiping is possible, than set use_blkid_wiping=1 and
use_blkid_wiping=0 otherwise for its default value. If blkid wiping
is disabled during configure and use_blkid_wiping=1 is set by chance,
it's simply ignored - this patch is just a cleanup that makes it more
obvious for the user (we use similar logic for use_lvmetad and
use_lvmpolld settings).
Default value for lvmetad and lvmpolld has hooks in configure script,
the "lvmconfig --type default --unconfigured" should display:
use_lvmetad = @DEFAULT_USE_LVMETAD@
use_lvmpolld = @DEFAULT_USE_LVMPOLLD@
Note that these settings are not of string type. Recent change (the
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES formatting flag) makes it
possible to recognize that the setting is not of string type and if
there's unconfigured value defined for it, the enclosing " " is
automatically removed on output.
Do not use "#S" (blank string) as default value as that ends up with
'key = [ "" ]' to be generated which is not what we want in most cases.
Also, fix default values for global/{thin,cache}_{check,repair}_options
and avoid assigning blank values. For example, the thin_check_options
had this set as default value previously:
"#S" DEFAULT_THIN_CHECK_OPTION1 "#S" DEFAULT_THIN_CHECK_OPTION2
If any (or both) of DEFAULT_THIN_CHECK_OPTION* variables was set
to "", we ended up with clumsy default value generated like:
thin_check_options = [ "-q", "" ]
With this patch, we end up with correct:
thin_check_options = [ "-q" ]
or, if all options are undefined:
thin_check_options = [ ]
Which is the correct way to express this.
There are two basic groups of formatting flags (32 bits):
- common ones applicable for all config value types (lower 16 bits)
- type-related formatting flags (higher 16 bits)
With this patch, we initially support four new flags that
modify the the way the config value is displayed:
Common flags:
=============
DM_CONFIG_VALUE_FMT_COMMON_ARRAY - causes array config values
to be enclosed in "[ ]" even if there's only one item
(previously, there was no way to recognize an array with one
item and scalar value, hence array values with one member
were always displayed without "[ ]" which libdm accepted
when reading, but it may have been misleading for users)
DM_CONFIG_VALUE_FMT_COMMON_EXTRA_SPACE - causes extra spaces to
be inserted in "key = value" (or key = [ value, value, ... ] in
case of arrays), compared to "key=value" seen on output before.
This makes the output more readable for users.
Type-related flags:
===================
DM_CONFIG_VALUE_FMT_INT_OCTAL - prints integers in octal form with
"0" as a prefix (libdm's config reading code can handle this via
strtol just fine so it's properly recognized as number in octal
form already if there's "0" used as prefix)
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES - makes it possible to print
strings without enclosing " "
This patch also adds dm_config_value_set_format_flags and
dm_config_value_get_format_flags functions to set and get
these formatting flags.
This is the client side handling of the global_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
The function added here:
. checks if the global state in lvmetad is invalid
. if so, scans disks to update the state in lvmetad
. clears the global_invalid flag in lvmetad
. updates the local udev db to reflect any changes
and update the lvmetad copy after it is reread from disk.
This is the client side handling of the vg_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
Add the ability to invalidate global or individual VG metadata.
The invalid state is returned to lvm commands along with the metadata.
This allows lvm commands to detect stale metadata from the cache and
reread the latest metadata from disk (in a subsequent patch.)
These changes do not change the protocol or compatibility between
lvm commands and lvmetad.
Global information
------------------
Global information refers to metadata that is not isolated
to a single VG , e.g. the list of vg names, or the list of pvs.
When an external system, e.g. a locking system, detects that global
information has been changed from another host (e.g. a new vg has been
created) it sends lvmetad the message: set_global_info: global_invalid=1.
lvmetad sets the global invalid flag to indicate that its cached data is
stale.
When lvm commands request information from lvmetad, lvmetad returns the
cached information, along with an additional top-level config node called
"global_invalid". This new info tells the lvm command that the cached
information is stale.
When an lvm command sees global_invalid from lvmated, it knows it should
rescan devices and update lvmetad with the latest information. When this
is complete, it sends lvmetad the message: set_global_info:
global_invalid=0, and lvmetad clears the global invalid flag. Further lvm
commands will use the lvmetad cache until it is invalidated again.
The most common commands that cause global invalidation are vgcreate and
vgextend. These are uncommon compared to commands that report global
information, e.g. vgs. So, the percentage of lvmetad replies containing
global_invalid should be very small.
VG information
--------------
VG information refers to metadata that is isolated to a single VG,
e.g. an LV or the size of an LV.
When an external system determines that VG information has been changed
from another host (e.g. an lvcreate or lvresize), it sends lvmetad the
message: set_vg_info: uuid=X version=N. X is the VG uuid, and N is the
latest VG seqno that was written. lvmetad checks the seqno of its cached
VG, and if the version from the message is newer, it sets an invalid flag
for the cached VG. The invalid flag, along with the newer seqno are saved
in a new vg_info struct.
When lvm commands request VG metadata from lvmetad, lvmetad includes the
invalid flag along with the VG metadata. The lvm command checks for this
flag, and rereads the VG from disk if set. The VG read from disk is sent
to lvmetad. lvmetad sees that the seqno in the new version matches the
seqno from the last set_vg_info message, and clears the vg invalid flag.
Further lvm commands will use the VG metadata from lvmetad until it is
next invalidated.
Since our test environment runs also in non-real-udev world,
it's using /etc/.cache file with scanned files.
So in this case it is mandatory the user runs 'vgscan'
after a device reappears in the system.
This 'first' lvm2 command then fixes metadata (just like vgs did).
Use of display_lvname() in plain log_debug() may accumulate memory in
command context mempool. Use instead small ringbuffer which allows to
store cuple (10 ATM) names so upto 10 full names can be used at one.
We are not keeping full VG/LV names as it may eventually consume larger
amount of RAM resouces if vgname is longer and lots of LVs are in use.
Note: if there would be ever needed for displaing more names at once,
the limit should be raised (e.g. log_debug() would need to print more
then 10 LVs on a single line).
With thin-pool kernel target module 1.13 it's now support usage of
external origin with sizes which are not 'alligned' with chunk size
of thin-pool.
Enable lvm2 support for this and also fix reporting of data_percent
usage for case sizes are not alligned.
Note that this is just a quick fix and it needs more robust fix to
encompass any combination, not just the (old) snapshot one!
This started with this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1219222
If we have devices/ignore_suspended_devices=1 set based on which we
filter out suspended devices as unusable (or if we ignore suspended
devices by force, e.g. during lvconvert called from dmeventd) and
when we have snapshot and snapshot origin devices in the play, we
need to look at their components unerneath (*-real and *-cow) to
check if they're not suspended. If they are, the snapshot/snapshot
origin is not usable as well and hence it needs to be filtered out
by filter-usable.c code which does suspended device filtering.
Not going into much details here, more details are in the bugzilla
mentioned above. However, this is a quick fix since snapshot
and this exact situation is not the only one. So this is
something that needs to be revisited and fixed properly
with full dm tree and checking the whole stack to state
whether the device at the very top is usable or not.
This patch fixes segfault which was caused by incorrectly marking some
settings CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED - the
ones which have NULL default value, hence they're really undefined.
A regression caused by a98ceceb1d.
For example:
$ lvmconfig log/file
file="/a"
Before this patch:
$ lvmconfig --type diff
Segmentation fault (core dumped)
With this patch applied:
$ lvmconfig --type diff
log {
file="/a"
}
The same applies for these settings:
log/activate_file
global/library_dir
global/system_id_file
<disk_area>/disk_area_id
There were also other settings with NULL default value and marked as
CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED, but they were
cfg_array config settings where the NULL value was not causing segfault
(NULL == empty array).
Just as 'e' means activation with an exclusive lock,
add an 's' to mean activation with a shared lock.
This allows the existing but implicit behavior of '-ay'
of clvm LVs to be specified explicitly. For local VGs,
asy simply means ay, just like aey means ay.
For local VGs, ay == aey == asy
For clvm VGs, ay == asy, aey == aey, asy == asy
The hyphens are removed from long option names before
being read. This means that:
- Option name specifications in args.h must not include hyphens.
(The hyphen in 'use-policies' is removed.)
- A user can include hyphens anywhere in the option name.
All the following are equivalent:
--vgmetadatacopies,
--vg-metadata-copies,
--v-g-m-e-t-a-d-a-t-a-c-o-p-i-e-s-
lvmetad_init() should not be called with open connection to the daemon.
Doing so is considered to be an internall error within lvm2 code.
Such coincidence can't occur within current code. Let's assure us it won't
ever happen.
Some of descritpions were misleading at least. Some were completely
off the reality.
lvmetad_init doesn't re-establish or initialise a connection
lvmetad_active and lvmetad_connect_or_warn can do so.
There are reports of unexplained ioctl failures when using dmeventd.
An explanation might be that the wrong value of errno is being used.
Change libdevmapper to store an errno set by from dm ioctl() directly
and provide it to the caller through a new dm_task_get_errno() function.
[Replaced f9510548667754d9209b232348ccd2d806c0f1d8]
Commit b00711e312 improperly
convert _area_missing() replacment and moved check for
AREA_PV seg_type() into same if() section.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
If lvmetad is not used, we generate lvm2-activation{-early,-net}.service
systemd services to activate any VGs found on the system. So far we used
--sysinit during this activation as polling was still forked off of the
lvm activation command.
This has changed with lvmpolld - we have proper lvmpolld systemd
service now (activated via its socket unit). As such, we don't need
to use --sysinit anymore during activation in systemd environment
as polling was the only barrier to remove the need for --sysinit.
There's a race when asking lvmpolld about progress_status and
actually reading the progress info from kernel:
Even with lvmpolld being used we read status info from
LVM2 command issued by a user (client side from lvmpolld perspective).
The whole cycle may look like following:
1) set up an operation that requires polling (i.e. pvmove /dev/sda)
2) notify lvmpolld about such operation (lvmpolld_poll_init())
3) in case 1) was not called with --background it would continue with:
4) Ask lvmpolld about progress status. it may respond with one of:
a) in_progress
b) not_found
c) finished
d) any low level error
5) provided the answer was 4a) try to read progress info from polling LV
(i.e. vg00/pvmove1). Repeat steps 4) and 5) until the answer is != 4a).
And now we got into racy configuration: lvmpolld answered with in_progress
but it may be the that in_between 4) and 5) the operation has already
finished and polling LV is already gone or there's nothing to ask for.
Up to now, 5) would report warning and it could print such warning many
times if --interval was set to 0.
We don't want to scary users by warnings in such situation so let's just
print these messages in verbose mode. Error messages due to error while
reading kernel status info (on existing, active and locked LV) remained
the same.
Avoid using make's $(shell invocation since the eval order is
then somewhat different and use $$( subshell.
This also fixes a problem when more then one symbol is found,
since target shell has been given separate word list
so the 'R' assignment would need "" around it.
currently in wait_for_single_lv() fn trying to poll missing pvmove LV
is considered success. It may have been already finished by another
instance of polldaemon. either by another forked off polldaemon
or by lvmpolld.
Let's try to handle the mirror conversion and snapshot merge the same
way.
These wrappers have been replaced by direct calls
to vg_read() and find_lv() in previous commits.
This commit should have no functional impact since
all bits were already unreachable.
let's call dev_close_all() only before we're about to 'sleep'
for at least one second during the polling.
(it's questionable whether to call dev_close_all() at all in
polldaemon code. Natural extension would be to drop it completely)
Previous patch incorrectly skipped replace of @LOCALEDIR@.
The standard option is --localedir so use --with-localedir
as backward compatible option and set localedir if it's not
yet been set (if the could ever happen).
Use double-eval to translate $datarootdir to $prefix to real dir.
More exact clean of library exported symbols files.
Also use $(firstword) test to check for empty string
so 'make clean' has now cleaner condensed look.
Clean also created include links.
Possibly easier to follow - to have just a single dependency line
and use if() within rule.
Also replace $(words) with $(firstword) which is more commonly used.
Set LVM_TEST_THIN_REPAIR_CMD to /bin/false for test which
doesn't need it.
This way - even if on the system there is no such tool present,
test will not result with warning about missing tool.
Also remove from Makefile settings of TEST vars which are set in
through /lib/paths - this also allows to override them in test.
as of now lvmpolld works as client utility for
querying running instance of lvmpolld server
on metadata, state, etc.
Currently the only request implemented is the '--dump'.
It prints out full lvmpolld state (mimics lvmdump -p command).
we don't want to fail properly set pvmove after metadata
update. failure to copy id components could end with dangling
mirror moving PV segments but no monitoring from lvmpolld or
classical polldaemon.
lvpoll now process passed LV name properly. It respects
LVM_VG_NAME env. variable and is able to process LV name
passed in various formats:
- VG/LV
- LV name only (with LVM_VG_NAME set)
- /dev/mapper/VG-LV
- /dev/VG/LV
Use CFG_DEFAULT_COMMENTED and CFG_DEFAULT_UNDEFINED to
replicate the existing comments in example.conf.
Fix host_list to be cfg_array.
UNDEFINED is only used if the value depends on other
system/kernel values outside of lvm. The most common
case is when dm-thin or dm-cache have built-in default
settings in the kernel, and lvm will use those built-in
default values unless the corresponding lvm config setting
is set.
COMMENTED is used to comment out the default setting in
lvm.conf. The effect is that if the LVM version is
upgraded, and the new version of LVM has new built-in
default values, the new defaults are used by LVM unless
the previous default value was set (uncommented) in lvm.conf.
Introduce new implmentation of dm_task_get_info() function
with support for reading internal_suspend.
.
This time it is done in a 'versioned' way.
We keep the old fashion dm_task_get_info(Base) to implement
the old behavior of 1.02.95 libdm code.
libdm version 1.02.96 introduced 'macro' wrapper
dm_task_get_info_with_deferred_remove with new implementation
of dm_task_get_info() - we cannot do anything else then to
provide compatible version of this symbol.
Now in version 1.02.97 we add new versioned implementation of
dm_task_get_info(DM_1_02_97) symbol.
This has the effect that i.e. rpm build will finaly resolve proper
dependency on a new symbol - so it will be no longer possible,
to build a new binary and use old library
(rpm -q --provides will show libdevmapper.so.1.02(DM_1_02_97)(64bit))
Also the history is now tracked. If a new function is added (or
reimplemented), it needs to be placed in proper file,
so it could be exported with right versioning symbol.
File .exported_symbols.Base should and any existing older DM
should be treated as read-only after a release.
Also - only libdm has been currently enhanced with versioned .Base
file, as soon as other libs (liblvm, libdevmapper-event) needs changes
they should also get their exported symbol files - meanwhile
make.tmpl handles both cases.
Since now we enable those by default when compiled with those daemons,
explicitely disable them in tests when needed.
Alphabetically sort configurables.
Basic support for upstream 'build' of rpm packages.
Make spec file generated.
2 new simple targets:
make dist - create LVM2.MAJOR.MINOR.PATCHLEVEL.tgz from git files.
make rpm - some generic rpmbuilder using spec files.
Create packages in build/ subdir.
DO NOT USE created rpms in any distribution!
Configure provides proper settings for
use_lvmetad and use_lvmpolld conf setttings.
When the build of polld & lvmetad, these settings
are enabled by default unless explicitelly disabled
with --disable-use-lvmetad/--disable-use-lvmpolld.
This is an alternative/equivalent to commit
ca67cf84df
The problem (wrong label->dev after a new preferred
duplicate device is chosen) was isolated to the lvmetad
case (non-lvmetad worked fine), and this fixes the problem
by setting the new label->dev in the lvmetad-specific
code rather than in the general lvmcache code.
In process_each_{vg,lv,pv} when no vgname args are given,
the first step is to get a list of all vgid/vgname on the
system. This is exactly what lvmetad returns from a
vg_list request. The current code is doing a vg_lookup
on each VG after the vg_list and populating lvmcache with
the info for each VG. These preliminary vg_lookup's are
unnecessary, because they will be done again when the
processing functions call vg_read. This patch eliminates
the initial round of vg_lookup's, which can roughly cut in
half the number of lvmetad requests and save a lot of extra work.
Use 64bit arithmentic for PV size calculation (Coverity).
Also remove sector shift for compared PV size, since all
values are already held in sectors.
This fixes validatio of PV size when restoring PV
from vg metadata backup file.
Improve the python unit test case to cover all of the properties of a LV and
the properties of a LV segment.
In addition we also add a 'tag' to the lv so that we can retrieve it
using the 'lv_tags' property to ensure that this works as expected.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Synopsis: STR_LIST needs to be treated as STR for properties.
For any lvm property that was internally 'typed' as a string list we were failing
to return a string in the property API. This was due to the fact that for the
properties to work the value needs to either be evaulated as a string or a
number. This change corrects the macro used to build the memory array of
structures so that the string bitfield is set as needed to ensure that the value
is a string.
https://bugzilla.redhat.com/show_bug.cgi?id=1139920
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When retrieving a property value that is a string, if the character pointer in C
was NULL, we would segfault. This change checks for non-null before creating a
python string representation. In the case where the character pointer is NULL
we will return a python 'None' for the value.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
With the lvm2app C API adding the ability to determine when a property is
signed we can then use this information to construct the correct representation
of the number for python which will maintain value and sign. Previously, we
only represented the numbers in python as positive integers.
Python long type exceeds the range for unsigned and signed integers, we just
need to use the appropriate parsing code to build correctly.
Python part of the fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=838257
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Currently lvm2app properties have the following structure:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t padding:28;
union {
const char *string;
uint64_t integer;
} value;
} lvm_property_value_t;
which assumes that numerical values were in the range of 0 to 2**64-1. However,
some of the properties were 'signed', like LV major/minor numbers and some
reserved values for properties that represent percentages. Thus when the
values were retrieved they were in two's complement notation. So for a -1
major number the API user would get a value of 18446744073709551615. The
API user could cast the returned value to an int64_t to handle this, but that
requires the API developer to look at the source code and determine when it
should be done.
This change modifies the return property structure to:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t is_signed:1;
uint32_t padding:27;
union {
const char *string;
uint64_t integer;
int64_t signed_integer;
} value;
} lvm_property_value_t;
With this addition the API user can interrogate that the value is numerical,
(is_integer = 1) and subsequently check if it's signed (is_signed = 1) too.
If signed, then the API developer should use the union's signed_integer to
avoid casting.
This change maintains backwards compatibility as the structure size remains
unchanged and integer value remains unchanged. Only the additional bit
taken from the pad is utilized.
Bugzilla reference:
https://bugzilla.redhat.com/show_bug.cgi?id=838257
Signed-off-by: Tony Asleson <tasleson@redhat.com>
querying future lvmpolld with zero wait time is highly undesirable
and can cause serious performance drop of the future daemon. The new
wrapper function may avoid immediate return from syscal by
introducing minimal wait time on demand.
Routines responsible for polling of in-progress pvmove, snapshot merge
or mirror conversion each used custom lookup functions to find vg and
lv involved in polling.
Especially pvmove used pvname to lookup pvmove in-progress. The future
lvmpolld will poll each operation by vg/lv name (internally by lvid).
Also there're plans to make pvmove able to move non-overlaping ranges
of extents instead of single PVs as of now. This would also require
to identify the opertion in different manner.
The poll_operation_id structure together with daemon_parms structure they
identify unambiguously the polling task.
Waiting even after _check_lv_status returned success and
'finished' flag was set to true doesn't make much sense.
Note that while we skip the wait() we also skip the
init_full_scan_done(0) inside the routine. This should
have no impact as long as the code after _wait_for_single_lv
doesn't presume anything about the state of the cache.
as a part of bigger effort to unify polling intefaces
poll_get_copy_lv should be able to look up LVs based
on theirs lv->status field.
Effective after pvmove starts using poll_get_copy_lv
fn as well.
When kernel target reports sync status as 0% it might as well mean
it's 100% in sync, just the target is in some race inconsistent
state - so reread once again and take a more optimistic value ;)
Patch tries to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=1210637
Reinstate config settings matching the last release until every
case where the generator produces different output has been reviewed
and fresh decisions made about which defaults to expose as protection
against changes in newer releases. We should be trying to reduce, not
increase, this number.
Introduce LVM_TEST_LVMETAD_DEBUG_OPTS to allow to override
default debug opts for lvmetad.
However could be still overloaded on command line:
make check_lvmetad LVM_TEST_LVMETAD_DEBUG_OPTS="-l all"...
Better name for aux function.
First use normal -TERM, and only after a while use -KILL
(leaving some time to correctly finish)
Print INFO about killed processes.
This patch adds supporting code for handling deprecated settings.
Deprecated settings are not displayed by default in lvmconfig output
(except for --type current and --type diff). There's a new
"--showdeprecated" lvmconfig option to display them if needed.
Also, when using lvmconfig --withcomments, the comments with info
about deprecation are displayed for deprecated settings and with
lvmconfig --withversions, the version in which the setting was
deprecated is displayed in addition to the version of introduction.
If using --atversion with a version that is lower than the one
in which the setting was deprecated, the setting is then considered
as not deprecated (simply because at that version it was not
deprecated).
For example:
$ lvmconfig --type default activation
activation {
...
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated
activation {
...
mirror_region_size=512
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withversions
activation {
...
# Available since version 1.0.0.
# Deprecated since version 2.2.99.
mirror_region_size=512
# Available since version 2.2.99.
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withcomments
activation {
...
# Configuration option activation/mirror_region_size.
# This has been replaced by the activation/raid_region_size
# setting.
# Size (in KB) of each copy operation when mirroring.
# This configuration option is deprecated.
mirror_region_size=512
# Configuration option activation/raid_region_size.
# Size in KiB of each raid or mirror synchronization region.
# For raid or mirror segment types, this is the amount of
# data that is copied at once when initializing, or moved
# at once by pvmove.
raid_region_size=512
...
}
$ lvmconfig --type default activation --withcomments --atversion 2.2.98
activation {
...
# Configuration option activation/mirror_region_size.
# Size (in KB) of each copy operation when mirroring.
mirror_region_size=512
...
}
A preparatory code for marking configuration nodes as deprecated:
- struct cfg_def_item gains 2 new fields ("deprecated_since_version" and "deprecation_comment"
- cfg* macros to handle new fields
- related config_settings.h edits to add new fields for each item (null for all at the moment)
Patch with implementation will follow...
Before this patch:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad
With this patch applied:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad - [2.2.93]
We're commenting out settings with undefined default values.
The comment character '#' was printed at the very beginning of
the line, it should be placed just at the beginning of the setting,
after the space/tab prefix is printed.
Before this patch:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
With this patch applied:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
New lvmconf function is using bash associative arrays - however
older systems like RHEL5 doesn't provide this feature. In this case
stay with older variant.
Restore support for use case like this:
aux lvmconf 'tags/@foo {}'
works with systemd activated daemons only as of now
each daemon implementation may decide to signalize its
internal idle state (i.e. all background tasks unrelated to
client threads are finished)
These settings are in the "unsupported" group:
devices/loopfiles
log/activate_file
metadata/disk_areas (section)
metadata/disk_areas/<disk_area> (section)
metadata/disk_areas/<disk_area>/size
metadata/disk_areas/<disk_area>/id
These settings are in the "advanced" group:
devices/dir
devices/scan
devices/types
global/proc
activation/missing_stripe_filler
activation/mlock_filter
metadata/pvmetadatacopies
metadata/pvmetadataignore
metadata/stripesize
metadata/dirs
Also, this patch causes the --ignoreunsupported and --ignoreadvanced
switches to be honoured for all config types (lvmconfig --type).
By default, the --type current and --type diff display unsupported
settings, the other types ignore them - this patch also introduces
--showunsupported switch for all these other types to display even
unsupported settings in their output if needed.
lvmconfig --type list displays plain list of configuration settings.
Some of the existing decorations can be used (--withsummary and
--withversions) as well as existing options/switches (--ignoreadvanced,
--ignoreunsupported, --ignorelocal, --atversion).
For example (displaying only "config" section so the list is not long):
$lvmconfig --type list config
config/checks
config/abort_on_errors
config/profile_dir
$ lvmconfig --type list --withsummary config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig -l config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig --type list --withsummary --withversions config
config/checks - If enabled, any LVM configuration mismatch is reported. [2.2.99]
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found. [2.2.99]
config/profile_dir - Directory where LVM looks for configuration profiles. [2.2.99]
Example with --atversion (displaying global section):
$ lvmconfig --type list global
global/umask
global/test
global/units
global/si_unit_consistency
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/etc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/prioritise_write_locks
global/library_dir
global/locking_library
global/abort_on_internal_errors
global/detect_internal_vg_cache_corruption
global/metadata_read_only
global/mirror_segtype_default
global/raid10_segtype_default
global/sparse_segtype_default
global/lvdisplay_shows_full_device_path
global/use_lvmetad
global/thin_check_executable
global/thin_dump_executable
global/thin_repair_executable
global/thin_check_options
global/thin_repair_options
global/thin_disabled_features
global/cache_check_executable
global/cache_dump_executable
global/cache_repair_executable
global/cache_check_options
global/cache_repair_options
global/system_id_source
global/system_id_file
$ lvmconfig --type list global --atversion 2.2.50
global/umask
global/test
global/units
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/library_dir
global/locking_library
some tests left dangling bg processes originating in
lvm2 commands being able to spawn any bg polling process
(lvchange, vgchange, pvmove, lvconvert...)
Initial fn 'add_to_kill_list' should collect processes with
specific parameters (proc's command line and parent processes ID).
After testing finishes the fn kill_listed_processes should remove these
listed by 'add_to_kill_list'.
Unfortunately it proved to be prone to an error especially in scenarios
where cmd line of initiating command contained characters required to
be espaced before passing to shell script to make it work correctly.
(Or if cmd spawned more than one bg process with same cmd line. i.e.:
vgchange or lvchange).
The new implementation is much simpler. It uses env. variable (LVM_TEST_TAG)
for marking a process desired to be killed later or during test env. teardown.
(i.e.: LVM_TEST_TAG=kill_me_$PREFIX to kill only processes related to
current test environment)
'lvm dumpconfig' now does a lot more than just dumping configuration
information and is no longer only a support tool. Users now need
to run it to find out about configuration information that has been
removed from the lvm.conf man page so we need to promote this to full
command line status as 'lvmconfig'. Also accept 'lvm config' and mention
it in the usage information of lvmconf (which should also get merged in
eventually).
Example:
/dev/loop0 and /dev/loop1 are duplicates,
created by copying one backing file to the
other.
'identity /dev/loopX' creates an identity
mapping for loopX named idmloopX, which
adds a duplicate for the named device.
The duplicate selection code for lvmetad is
incomplete, and lvmetad is disabled for this
example.
[~]# losetup -f loopfile0
[~]# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 foo lvm2 a-- 308.00m 296.00m
[~]# losetup -f loopfile1
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop1 foo lvm2 a-- 308.00m 308.00m
[~]# ./identity /dev/loop0
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 without holders, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop0 foo lvm2 a-- 308.00m 296.00m
[~]# ./identity /dev/loop1
[~]# pvs
WARNING: duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV is being used from both devices /dev/loop0 and /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop1 not /dev/mapper/idmloop0
Using duplicate PV /dev/mapper/idmloop1 which is more recent, replacing /dev/mapper/idmloop0
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop1 foo lvm2 a-- 308.00m 308.00m
Describe
thin_check_options, thin_repair_options,
cache_check_option, scache_repair_options
as a "list of options", rather than a "string of options"
because a single string, e.g. "-q --clear-needs-check-flag"
does not work, and needs to be entered as a list,
e.g. ["-q", "--clear-needs-check-flag"]
The settings which have their default value evaluated in runtime should
have their 'unconfigured' counterparts also evaluated in runtime since
those values can be constructed by using other settings.
For example, before this patch:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="/etc/lvm/cache/.cache
With this patch applied:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@/.cache"
The @something@ used for unconfigured default value is not bound to
CFG_TYPE_STRING settings defined in config_settings.h, it can be
used for any other config type too.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.