IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add the libdm-stats module to libdm. This implements a simple interface
for creating, managing and interrogating I/O statistics regions and
areas on device-mapper devices.
The library interface is documented in libdevmapper.h and provides a
'dm_stats' handle that is used to perform statistics operations and
obtain data. Methods are provided to return baisc count values and to
derive time-based metrics when a suitable interval estimate is provided.
The dm_stats handle contains a pointer to a table of one or more
dm_stats_region objects representing the regions registered with the
@stats_create message. These in turn point to a table of one or more
dm_stats_counters objects containing the counter sets for each defined
area within the region:
dm_stats->dm_stats_region[nr_regions]->dm_stats_counters[nr_areas]
This structure is private to the library and may change in future
versions: all users should make use of the public interface and treat
the dm_stats type as an opaque handle.
Regions and counter sets are stored in order of increasing region_id.
Public methods are provided to create and destroy handles and to
list, create, and destroy, statistics regions as well as to obtain and
parse the actual counter data.
Linux iostat-style derived performance metrics are provided to return
higher-level performance metrics.
This commit also adds arguments, report types, and a 'stats' command to
dmsetup and implements 'clear', 'create', 'delete', 'list', 'print', and
'report' sub-commands.
The dmsetup _display_info_cols() function is adapted to allow reporting
of statistics with the DR_STATS report type: since a single object
(device) may have many rows of statistics to report the call to
dm_report_object() is placed inside a loop over each statistics area
present (for non-stats reports or for devices with a single region
spanning the entire device the body of the loop will be executed exactly
once).
Don't do interval management and external timekeeping for stats in
dm_report: let applications handle this on their own.
Since this has not been included in a release remove it from the
library entirely and handle report timing directly inside dmsetup.
Add a function to print column headings regardless of whether they
have already been output. This will be used by dmstats to issue
periodic reminders of the column headings.
This patch removes a check for RH_HEADINGS_PRINTED from
_report_headings that prevents headings being displayed if the flag
is already set; this check is redundant since the only existing
caller (_output_as_columns()) already tests the flag before
calling the function.
Not releasing objects back to the pool is fine for short-lived
pools since the memory will be freed when dm_pool_destroy() is
called.
Any pool that may be long-lived needs to be more careful to free
objects back to the pool to avoid leaking memory that will not be
reclaimed until the pool is destroyed at process exit time.
The report pool currently leaks each headings line and some row
data.
Although dm_report_output() tries to free the first allocated row
this may end up freeing a later row due to sorting of the row list
while reporting. Store a pointer to the first allocated row from
_do_report_obect() instead and free this at the end of
_output_as_columns(), _output_as_rows(), and dm_report_clear().
Also make sure to call dm_pool_free() for the headings line built
in _report_headings().
When dmstats is introduced it will maintain dm_report objects for
the whole lifetime of the process: without these changes a stats
report could leak around 600k in 10m (exact rate depends on field
selection and data values):
top - 12:11:32 up 4 days, 3:16, 15 users, load average: 0.01, 0.12, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6473 root 20 0 130196 3124 2792 S 0.0 0.0 0:00.00 dmstats
top - 12:22:04 up 4 days, 3:26, 15 users, load average: 0.06, 0.11, 0.13
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6498 root 20 0 130836 3712 2752 S 0.0 0.0 0:00.60 dmstats
With this patch no increase in RSS is seen:
top - 13:54:58 up 4 days, 4:59, 15 users, load average: 0.12, 0.14, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.0 0.0 0:00.00 dmstats
top - 14:04:31 up 4 days, 5:09, 15 users, load average: 1.02, 0.67, 0.36
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.3 0.0 0:00.32 dmstats
This also affects report output for repeating reports in the
DM_REPORT_OUTPUT_COLUMNS_AS_ROWS case; row state is not fully cleared for
the next iteration leading to progressive growth of the heading width:
vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
253:253:253:253:253
2:0:1:4:3
L--w:L--w:L--w:L--w:L--w
1:2:1:1:1
3:1:1:1:2
0:0:0:0:0
LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
:::::vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
:::::253:253:253:253:253
:::::2:0:1:4:3
:::::L--w:L--w:L--w:L--w:L--w
:::::1:2:1:1:1
:::::3:1:1:1:2
:::::0:0:0:0:0
:::::LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
This adds the infrastructure, code paths, error reporting,
etc. to handle storage errors, or storage loss, under the
sanlock leases in a VG that is being used. The loss of
storage means sanlock cannot renew its leases, which means
that the host needs to stop using the shared VG before its
leases expire.
This still requires manually shutting down a VG that has
lost lease storage, e.g. unmounting file systems,
deactivating LVs in the VG. The next step is to
automatically use a command like blkdeactivate to do that.
tools/polldaemon.c:465: uninit_use_in_call: Using uninitialized value "id.vg_name" when calling "print_log".
tools/polldaemon.c:465: uninit_use_in_call: Using uninitialized value "id.lv_name" when calling "print_log".
/lib/log/log.c:88: warning[invalidScanfArgType_int]: %llu in format string (no. 2) requires 'unsigned long long *' but the argument type is 'long long *'.
daemons/lvmlockd/lvmlockd-core.c:791: error[uninitstring]: Dangerous usage of 'version' (strncpy doesn't always null-terminate it).
The dlm global lockspace is automatically added when the
first dlm VG lockspace is added. Reverse this by removing
the dlm global lockspace after the last dlm VG lockspace
is removed. (Remove old non-working code that did this
based on an old command that could explicitly add/remove
the dlm global lockspace.)
Whenver reporting field name is registered with libdevmapper and if
the field name contains any number of underscores ('_'), libdm
can now automatically recognize any of its variant without any
underscores used.
For example:
..for underscores in prefixes:
pvs -o pv_name
pvs -o name
pvs -o pvname (newly recognized besides pvname)
..for underscores in the name:
lvs -o cache_mode
lvs -o cachemode
..or even multiple underscores:
pvs -o pv___na___me
It's all variant of the same field name.
No commands set has_subcommands yet.
Move multiple device loop to separate function because we'll
soon want to call it repeatedly.
(Based on patch from bmr.)
Use refresh_filters instead of destroy_filters and init_filters
in refresh_toolcontext fn which deals with cmd->initialized.filters
correctly on refresh.
Just shuffle the items and put them into logical groups so it's
visible at first sight what each group contains - it makes it a bit
easier to make heads and tails of the whole cmd_context monster.
When a command is flagged with NO_METADATA_PROCESSING flag, it means
such command does not process any metadata and hence it doens't require
lvmetad, lvmpolld and it can get away with no locking too. These are
mostly simple commands (like lvmconfig/dumpconfig, version, types,
segtypes and other builtin commands that do not process metadata
in any way).
At first, when lvm command is executed, create toolcontext without
initializing connections (lvmetad,lvmpolld) and without initializing
filters (which depend on connections init). Instead, delay this
initialization until we know we need this. That is, until the
lvm_run_command fn is called in which we know what the actual
command to run is and hence we can avoid any connection, filter
or locking initiliazation for commands that would not make use
of it anyway.
For all the other create_toolcontext calls, we keep the original
behaviour - the filters and connections are initialized together
with the toolcontext.
Make it possible to decide whether we want to initialize connections and
filters together with toolcontext creation.
Add "filters" and "connections" fields to struct
cmd_context_initialized_parts and set these in cmd_context.initialized
instance accordingly.
(For now, all create_toolcontext calls do initialize connections and
filters, we'll change that in subsequent patch appropriately.)
Move original lvmetad and lvmpolld initialization code from
_process_config fn to their own functions _init_lvmetad and
_init_lvmpolld (both covered with single _init_connections fn).
Add struct cmd_context_initialized_parts to wrap up information
about which cmd context pieces are initialized and add variable
of this struct type into struct cmd_context.
Also, move existing "config_initialized" variable that was directly
part of cmd_context into the new cmd_context.initialized wrapper.
We'll be adding more items into the struct cmd_context_initialized_parts
with subsequent patches...
This tries harder to avoid creating duplicate global locks in
sanlock VGs by refusing to create a new sanlock VG with a
global lock if other sanlock VGs exist that may have a gl.
vgsummary information contains provisional VG information
that is obtained without holding the VG lock. This info
can be used to lock the VG, and then read it with vg_read().
After the VG is read properly, the vgsummary info should
be verified.
Add the VG lock_type to the vgsummary. It needs to be
known before the VG can be locked and read.
This is a regression introduced by commit
6c0e44d5a2 which changed
the way dev_cache_get fn works - before this patch, when a
device was not found, it fired a full rescan to correct the
cache. However, the change coming with that commit missed
this full_rescan call, causing the lvmcache to still contain
info about PVs which should be filtered now.
Such situation may have happened by coincidence of using
old persistent cache (/etc/lvm/cache/.cache) which does not
reflect the actual state anymore, a device name/symlink which
now points to a device which should be filtered and a fact we
keep info about usable DM devices in .cache no matter what
the filter setting is.
This bug could be hidden though by changes introduced in
commit f1a000a477 as it
calls full_rescan earlier before this problem is hit.
But we need to fix this anyway for the dev_cache_get
to be correct if we happen to use the same code path
again somewhere sometime.
For example, simple reproducer was (before commit
1a000a477558e157532d5f2cd2f9c9139d4f87c):
- /dev/sda contains a PV header with UUID y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M
- lvm.conf: filter = [ "r|.*|" ]
- rm -f .cache (to start with clean state)
- dmsetup create test --table "0 8388608 linear /dev/sda 0" (8388608 is
just the size of the /dev/sda device I use in the reproducer)
- pvs (this will create .cache file which contains
"/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M"
as well as "/dev/mapper/test" and the target node "/dev/dm-1" - all the
usable DM mappings (and their symlinks) get into the .cache file even
though the filter "is set to "ignore all" - we do this - so far it's OK)
- dmsetup remove test (so we end up with /dev/disk/by-id/lvm-pv-uuid-...
pointing to the /dev/sda now since it's the underlying device
containing the actual PV header)
- now calling "pvs" with such .cache file and we get:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M vg lvm2 a-- 4.00g 0
Even though we have set filter = [ "r|.*|" ] in the lvm.conf file!
Moved out from lib/display and a little documentation added.
It's tuned to LVM's requirements historically and its behaviour
might not always be what you would expect.
There is no longer an "enable" option for the global lock,
so remove the bit of code that was checking for it. It
was an optional variation anyway, and not one that was likely
to be used.
Also update the corresponding comment describing global lock
creation.
Stop removing hyphens when = is seen. With an option
like --profile=thin-performance, the hyphen removal
will stop at = and will not remove - after thin.
Stop removing hyphens altogether when a stand alone arg
of -- appears.
Simply running concurrent copies of 'pvscan | true' is enough to make
clvmd freeze: pvscan exits on the EPIPE without first releasing the
global lock.
clvmd notices the client disappear but because the cleanup code that
releases the locks is triggered from within some processing after the
next select() returns, and that processing can 'break' after doing just
one action, it sometimes never releases the locks to other clients.
Move the cleanup code before the select.
Check all fds after select().
Improve some debug messages and warn in the unlikely event that
select() capacity could soon be exceeded.
When there are duplicate global locks, check if the gl
is still enabled each time a gl or vg lock is acquired
in the lockspace. Once one of the duplicates is disabled,
then other hosts will recognize that the issue is resolved
without needing to restart the lockspaces.
Move the DEBUG_MEM decision inside libdevmapper.so instead of exposing
it in libdevmapper.h which causes failures if the binary and library
were compiled with opposite debugging settings.
pvscan autoactivation does not work for lockd VGs because
lock start is needed on a lockd VG before locking can be
done for it. Add a check to skip the attempt at autoactivate
rather than calling it, knowing it will fail.
Add a comment explaining why pvscan --cache works fine for
lockd VGs without locks, and why autoactivate is not done.
. the poll check will eventually call finish which will
write the VG, so an ex VG lock is needed from lvmlockd.
. fix missing unlock on poll error path
. remove the lockd locking while monitoring the progress
of the command, as suggested by the earlier FIXME comment,
as it's not needed.
The CFG_SECTION_NO_CHECK flag can be used to mark a section
and its whole subtree as containing settings where checks
won't be made (lvmconfig --validate).
These are setting where we don't know the names and and type
in advance and they're recognized in runtime. As we don't know
the type and name in advance, we can't do any checks here
of course.
Use this flag with great care as it disables config checks
for the whole config subtree found under such section.
This flag is going to be used by subsequent patches from
Zdenek to support some cache settings...
Recent change to move the polling outside of core lvconvert
code was wrongly using 'lv' and 'vg' structs which can't be
used outside of the core code, which caused seg fault.
Properly isolate all use of lv structs within the core of
the lvconvert code, saving any information necessary,
(esp lvid). After the core of lvconvert is done, use
the saved information to do polling.
FIXME: the need for is_merging_origin and is_merging_origin_thin
in this patch is ugly, and a cleaner way should be found to deal
with that than what is done here.
Also it effectively removed all hacks in _lvconvert_merge_single
performing ugly: VG reread, unlock, polling, lock sequence.
Moreover all polling operations are postponed after all conversions
are finished.
lvm2 (while locking via lvmlockd) should now be able to run with
or without lvmpolld while performing poll operations originating
in lvconvert command.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Comply with the rules we have for log_error and log_warn...
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate /dev/sda1 --force
WARNING: Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
libblkid may return the list of signatures found, but it may not
provide offset and size for each signature detected. This may
happen in case signatures are mixed up or there are more, possibly
overlapping, signatures found.
Make lvm commands pass if such situation happens and we're using
--force (or any stronger force method).
For example:
$ pvcreate /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
1 existing signature left on the device.
Aborting pvcreate on /dev/sda1.
$ pvcreate --force /dev/sda1
Failed to get offset of the xfs_external_log signature on /dev/sda1.
Physical volume "/dev/sda1" successfully created
The vgchange/lvchange activation commands read the VG, and
don't write it, so they acquire a shared VG lock from lvmlockd.
When other commands fail to acquire a shared VG lock from
lvmlockd, a warning is printed and they continue without it.
(Without it, the VG metadata they display from lvmetad may
not be up to date.)
vgchange/lvchange -a shouldn't continue without the shared
lock for a couple reasons:
. Usually they will just continue on and fail to acquire the
LV locks for activation, so continuing is pointless.
. More importantly, without the sh VG lock, the VG metadata
used by the command may be stale, and the LV locks shown
in the VG metadata may no longer be current. In the
case of sanlock, this would result in odd, unpredictable
errors when lvmlockd doesn't find the expected lock on
disk. In the case of dlm, the invalid LV lock could be
granted for the non-existing LV.
The solution is to not continue after the shared lock fails,
in the same way that a command fails if an exclusive lock fails.
When lvmlockd is compiled without support for one of the
lock managers (sanlock or dlm), and a command tries to use
one of them, explain that in the error message.
Don't abort test when clvmd processes two VGs concurrently.
CLVMD: ioctl/libdm-iface.c:1940 Internal error: Performing unsafe table load while 3 device(s) are known to be suspended: (253:19)
When lvm is built without lvmlockd support, vgcreate using a
shared lock type would succeed and create a local VG (the
--shared option was effectively ignored). Make it fail.
Fix the same issue when using vgchange to change a VG to a
shared lock type.
Make the error messages consistent.
A segfault was reported when extending an LV with a smaller number of
stripes than originally used. Under unusual circumstances, the cling
detection code could successfully find a match against the excess
stripe positions and think it had finished prematurely leading to an
allocation being pursued with a length of zero.
Rename ix_offset to num_positional_areas and move it to struct
alloc_state so that _is_condition() can obtain access to it.
In _is_condition(), areas_size can no longer be assumed to match the
number of positional slots being filled so check this newly-exposed
num_positional_areas directly instead. If the slot is outside the
range we are trying to fill, just ignore the match for now.
(Also note that the code still only performs cling detection against
the first segment of the LV.)
Replace misleading "not found" in the log message when
devices/preferred_names is set to empty array:
Really not found:
device/dev-cache.c:689 devices/preferred_names not found in config: using built-in preferences
Found, but empty:
config/config.c:1431 Setting devices/preferred_names to preferred_names = [ ]
device/dev-cache.c:689 devices/preferred_names is empty: using built-in preferences
Commit 7e728fe1a1 added a log call
directly in find_config_tree_array when defaults are used.
This patch also adds the log for the value which is found in
existing configuration and for which defaults are not used.
For example:
Defaults used:
config/config.c:1428 devices/scan not found in config: defaulting to scan = [ "/dev" ]
Value defined in configuration used:
config/config.c:1431 Setting devices/scan to scan = [ "/dev", "/mydev", "/abc" ]
This makes the logging consistent with the other find_config_tree_* functions.
Policy name has to be always defined.
Capture it as an internal error before write.
When reading metadata without defined policy name, use default defined policy.
TODO: Unsure, but it might have to be actually always 'mq' in this case.
Keep policy name separate from policy settings and avoid
to mangling and demangling this string from same config tree.
Ensure policy_name is always defined.
Use find_config_tree_array for all config arrays. Also, add
INTERNAL_ERROR in case there should have been at least default
value defined for a setting but it was not returned for some
reason (either config_settings.h misconfiguration or other config
tree error printed by functions called by find_config_tree_array).
Both lock_start filters were being skipped when any lock-opt
values were used. The "auto" lock-opt should cause the
auto_lock_start_list to be used. The lock_start_list should
always be used.
The behavior of lock_start_list/auto_lock_start_list are tested
and verified to behave like volume_list/auto_activation_volume_list.
Since the default was changed to wait for lock-start to finish,
the "wait" and "autowait" lock-opt values are not needed, but a
new "autonowait" is added to the existing "nowait" avoid the
default waiting.
There are two different failure conditions detected in
access_vg_lock_type() that should have different error
messages. This adds another failure flag so the two
cases can be distinguished to avoid printing a misleading
error message.
Require global/{thin,cache}_{check,repair}_options to be always defined.
If not defined directly by user in the configuration and if there's no
concrete default option to use, make "" (empty string) the default one -
it's then clearly visible in the "lvmconfig --type default" (and
generated lvm.conf) and also it makes its handling in the code more
straightforward so we don't need to handle undefined values.
This means, if there are no default values for these settings defined,
we end up with this generated now:
{thin,cache}_{check,repair}_options = [ "" ]
So the value is never undefined and if it is, it's an error.
(The cache_repair_options is actually not used in the code at the moment,
but once the code using this setting is in, it will follow the same logic
as used for thin_repair_options.)
The "exported" state of the VG can be useful with lockd VGs
because the exported state keeps a VG from being used in general.
It's a way to keep a VG protected and out of the way.
Also fix the command flags: ALL_VGS_IS_DEFAULT is not true for
vgimport/vgexport, since they both return errors immediately if
no VG args are specified. LOCKD_VG_SH is not true for vgexport
beause it must use an ex lock to write the VG.
When --nolocking is used (by vgs, lvs, pvs):
. don't use lvmlockd at all (set use_lvmlockd to 0)
. allow lockd VGs to be read
When --readonly is used (by vgs, lvs, pvs, vgdisplay, lvdisplay,
pvdisplay, lvmdiskscan, lvscan, pvscan, vgcfgbackup):
. skip actual lvmlockd locking calls
. allow lockd VGs to be read
. check that only shared gl/vg locks are being requested
(even though the actually locking is being skipped)
. check that no LV locks are requested, because no LVs
should be activated or used in readonly mode
. disable using lvmetad so VGs are read from disk
It is important to note the limited commands that accept
the --nolocking and --readonly options, i.e. no commands
that change/write a VG or change/activate LVs accept these
options, only commands that read VGs.
A new lockd lock needs to be created for the new LV
created by split mirror and split snapshot. Disallow
these options in lockd VGs until that is implemented.
There are at least a couple instances where
the lock_args check does not work correctly,
(listed in the comment), so disable the
NULL check for lock_args until those are
resolved.
log_warn was added recently because no known code used
the given condition, but running pvcreate on an existing
PV uses this case, and should not produce a warning.
Put the change from commit #10d27998b3d2f6100e9e29e83d1d99948c55875f
back so we have working tree again for now. This code needs a bit of
a cleanup to return proper return value to check...
lib/format1/import-export.c:167: var_deref_op: Dereferencing null pointer "vg->lvm1_system_id"
lib/cache/lvmetad.c:1023: var_deref_op: Dereferencing null pointer "this"
daemons/lvmlockd/lvmlockd-core.c:2659: check_after_deref: Null-checking "act" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
/daemons/lvmetad/lvmetad-core.c:1024: check_after_deref: Null-checking "pvmeta" suggests that it may be null, but it has already been dereferenced on all paths leading to the check
If running lvmconf ... --startstopservice --mirrorservice in systemd
environment, handle lvm2-cmirrord accordingly. A typo in the script
caused the lvm2-cmirrord to not start/stop immediately, it was
only enabled/disabled (so the --startstopservice was ignored in this
case).
This prevents 'lvremove vgname' from attempting to remove the
hidden sanlock LV. Only vgremove should remove the hidden
sanlock LV holding the sanlock locks.
tools/polldaemon.c:457: array_null: Comparing an array to null is not useful: "lv->lvid.s"
The lv->lvid.s is never NULL. The check was supposed to be *lv->lvid.s
to check if the string is not empty.
... Using uninitialized value "lockd_state" when calling "lockd_vg"
(even though lockd_vg assigns 0 to the lockd_state, but it looks at
previous state of lockd_state just before that so we need to have
that properly initialized!)
libdm/libdm-report.c:2934: uninit_use_in_call: Using uninitialized value "tm". Field "tm.tm_gmtoff" is uninitialized when calling "_get_final_time".
daemons/lvmlockd/lvmlockctl.c:273: uninit_use_in_call: Using uninitialized element of array "r_name" when calling "format_info_r_action". (just added FIXME as this looks unfinished?)
lib/log/log.c:115: leaked_storage: Variable "st" going out of scope leaks the storage it points to
daemons/lvmpolld/lvmpolld-core.c:573: leaked_storage: Variable "cmdargv" going out of scope leaks the storage it points to
daemons/lvmlockd/lvmlockd-core.c:5341: leaked_handle: Handle variable "fd" going out of scope leaks the handle
daemons/lvmlockd/lvmlockctl.c:575: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:571: overwrite_var: Overwriting "able_vg_name" in "able_vg_name = strdup(optarg)" leaks the storage that "able_vg_name" points to
daemons/lvmlockd/lvmlockctl.c:385: leaked_handle: Handle variable "s" going out of scope leaks the handle
Before, we used general find_config_tree_node function to retrieve
array values. This had a downside where if the node was not found,
we had to insert default values directly in-situ after the
find_config_tree_node call. This way, we had two copies of default
values - one in config_settings.h and the other one directly in the
code where we found out that find_config_tree_node returned NULL and
hence we needed to fall back to defaults.
With separate find_config_tree_array used for array config values,
we keep all the defaults centrally in config_settings.h because
the new find_config_tree_array automatically returns these defaults
if it can't find any value set in the configuration.
This patch just makes the behaviour exactly the same for arrays as
for any other non-array type where we call find_config_tree_<type>
already, hence making the internal interface for handling array
values consistent with the rest of the config types.
if lvm2 is built with debug memory options dm_free() is not
mapped directly to std library's free(). This may cause memory corruption
as a line buffer may get reallocated in getline with realloc.
This is a temporary hotfix. Other debug memory failure needs to
be investigated and explained.
including the allow_override_lock_modes setting.
It was not possible to override default lock modes any longer,
since the command line options had already been removed.
A mechanism will probably be required later that puts part of
this back.
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
Add support for sending message in suspend tree for thin-pools.
When this operation is requested whole subtree suspend is then skipped.
This is experimantal support for new lvm2 code for sending message
in suspend phase where 'thin-pool origin-only suspend' will send
messages instead of really suspending thin-pool tree.
When suspening thin volume origin-only - only thin volume is suspended,
then messages are posted and thin-pool suspend is skipped.
Recognize date and time specification within selection criteria
that is formulated in a more free-form way besides to the original
basic YYYY-MM-DD HH:MM format that libdevmapper supports.
Currently, this free-form format is recognized for lv_time field.
Users are able to use expressions from this set:
- weekday names ("Sunday" - "Saturday" or abbreviated as "Sun" - "Sat")
- labels for points in time ("noon", "midnight")
- labels for a day relative to current day ("today", "yesterday")
- points back in time with relative offset from today (N is a number)
( "N" "seconds"/"minutes"/"hours"/"days"/"weeks"/"years" "ago")
( "N" "secs"/"mins"/"hrs" ... "ago")
( "N" "s"/"m"/"h" ... "ago")
- time specification either in hh:mm:ss format or with AM/PM suffixes
- month names ("January" - "December" or abbreviated as "Jan" - "Dec")
For example:
$ date
Fri Jul 3 10:11:13 CEST 2015
$ lvmconfig --type full report/time_format
time_format="%a %Y-%m-%d %T %z %Z [%s]"
$ lvs
LV VG Time
lvol0 vg Fri 2014-08-22 21:25:41 +0200 CEST [1408735541]
lvol2 vg Sun 2015-04-26 14:52:20 +0200 CEST [1430052740]
root fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
swap fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time=yesterday'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "June 30"'
LV VG Time
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "noon June 30"'
LV VG Time
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 9AM"'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 1PM"'
LV VG Time
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
...and so on.
Wire the dm_report_reserved_handler instance call in reporting/selection
infrastructure to handle reserved value actions (currently only
DM_REPORT_RESERVED_PARSE_FUZZY_NAME and DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
actions).
With fuzzy names we mean the names for which it's hard or even impossible
to enumerate all possible variations of the name - the name needs to
be evaluated. An example of fuzzy name is a name which has a base
(substring) which matches and it can contain arbitrary variations
around this base. We can cover human language better with fuzzy
names as people may use several different names (or sentences) to
denote the same thing.
With dynamic values we mean the values which are not constants
and they need to be evaluated in runtime. An example of dynamic
value is a value which depends on current system state (e.g. time,
current configuration or any other state which may change and it
needs runtime evaluation).
There's a handler that can be registered with reporting/selection
using dm_report_reserved_handler instance. This is a central point
in which the computation/evaluation happens when processing reserved
values. Currently, there are two actions declared:
DM_REPORT_RESERVED_PARSE_FUZZY_NAME
(translates fuzzy name into canonical name)
DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
(gets value for canonical name)
The handler is then registered as value in struct
dm_report_reserved_value (see explaining comments besided
the struct dm_report_reserved_value in libdevmapper.h).
Also, this patch provides support for simple caching of values
used during report/selection via dm_report_value_cache_{set,get}.
This is supposed to be used mainly in the dm_report_reserved_handler
instances to save values among calls so all the handler calls work
with the same base value used in computation/evaluation and/or
possibly to save resources if the evaluation is more time-consuming.
The cache is attached to the dm_report handle and so the cache is
dropped one dm_report is dropped.
Generic numbers and time values share some operators so make sure
we have the flags correctly adjusted based on expected type if
we're using reserved values.
dm_snprintf() returns upon success the number of characters printed
(excluding the null byte used to end output to strings).
So add extra byte to preserve \0.
This fixes regression when displaying more then a single lv name.
_node_name() prepares into dm_tree internal buffer device
name and it (major:minor) for easy usage for debug messages.
To avoid any allocation a small buffer in struct dm_tree is preallocated
to store this message.
This patch adds support for time values used in reporting fields.
The raw values are always stored as number of seconds since epoch.
The support that comes with this patch is the basic one which allows
only for recognition of strictly formatted date and time in selection
criteria (the format follows a subset of formats defined by ISO 8601):
date time timezone
date:
YYYY-MM-DD (or shortly YYYYMMDD)
YYYY-MM (shortly YYYYMM), auto DD=1
YYYY, auto MM=01 and DD=01
time:
hh:mm:ss (or shortly hhmmss)
hh:mm (or shortly hhmm), auto ss=0
hh (or shortly hh), auto mm=0, auto ss=0
timezone (always with + or - sign):
+hh:mm or -hh:mm (or shortly +hhmm or -hhmm)
+hh or -hh
Or directly the time (number of seconds) since Epoch (1970-01-01 00:00:00 UTC)
when the number value is prefixed by "@":
@number_of_seconds_since_epoch
This patch also adds aliases for comparison operators
used together with time values which are more intuitive
to use:
since (as alias for >=)
after (as alias for >)
until (as alias for <=)
before (as alias for <)
For example:
$ lvmconfig --type full report/time_format
time_format="%Y-%m-%d %T %z %Z [%s]"
$ lvs -o name,time vg
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol2 2015-04-26 14:52:20 +0200 CEST [1430052740]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30 6:00"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
$ lvs vg -o name,time -S 'time since @1435519541'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
This is basic time recognition support that is directly a part of
libdevmapper. Recognition of more free-form expressions will be a
part of subsequent patches.
Just like we have DEFAULT_USE_LVMETAD (or DEFUALT_USE_LVMPOLLD), use
fallback_to_lvm1=1 lvm.conf setting if we configured lvm2 with
--enable-lvm1-fallback and use fallback_to_lvm1=0 otherwise.
Also, generate proper lvm.conf.in with unconfigured value.
This patch allows for registration and recognition of reserved
values which are ranges, so they're composed of two values actually
to denote the lower and upper bound for the range (stored as an array
with exactly two items to define the boundaries).
Also, this patch allows for flagging reserved values as named-only
which means that such values are not strictly reserved. The strictly
reserved values are reserved values as used before this patch.
Distinction between strictly-reserved and named-only values
is clearly visible with comparisons. Normally, strictly reserved
value is not accounted for if we do "greater than" or "lower than"
comparisons, for example:
1 2 3 ....
|
abc
- we have "abc" as reserved value for field with value "2"
- the value reported for the field is "abc" (or "2", it doesn't matter here)
- the selection we're processing is -S 'field < abc'
- the result of the selection gives nothing as "abc" is strictly
reserved value (bound to "2") and there's no order defined for
it and it would only match if we directly compared the value
(so -S 'field = abc' would match)
With named-only values, the "abc" is named-only value for "2",
so selection -S 'field < abc" is the same as using -S 'field < 2'.
The "abc" is just an alias for some value so the value or its
assigned name can be used equally in selection criteria.
Make it possible to define format for time that is displayed.
The way the format is defined is equal to the way that is used
for strftime function, although not all formatting options as
used in strftime are available for LVM2 - the set is restricted
(e.g. we do not allow newline to be printed). The lvm.conf
comments contain the whole list that LVM2 accepts for time format
together with brief description (copied from strftime man page).
For example:
(defaults used - the format is the same as used before this patch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m 2015-06-25 16:18:34 +0200
lvol1 vg -wi-a----- 4.00m 2015-06-29 09:17:11 +0200
(using 'time_format = "@%s"' in lvm.conf - number of seconds
since the Epoch)
$ lvs -o+time vg/lvol0 vg/lvol1
LV VG Attr LSize Time
lvol0 vg -wi-a----- 4.00m @1435241914
lvol1 vg -wi-a----- 4.00m @1435562231
Commit e587b0677b broke the build on
systems where /bin/sh is Dash, for example.
Origin patch by Ferenc Wágner <wferi@niif.hu> changed later to
avoid using shell call, so makefile add 'server' target when
one of metad or polld daemon is requested.
Do not display settings with undefined default values, but do display
these settings in case the value is defined directly in any part of
the existing config cascade.
For example, the lvmconfig --type current always displays these settings
(as it's somewhere in "current" configuration cascade that makes it defined).
The lvmconfig --type full displays these settings only if it's defined
somewhere in the cascade, but not if default value is used instead
The lvmconfig --type default never displays these settings...
More concrete example - let's have activation/volume_list directly
set in lvm.conf and activation/read_only_volume_list not set.
Both of these settings have *undefined default* values.
$lvmconfig --type full activation/volume_list activation/read_only_volume_list
volume_list="/dev/vg/lv"
(...only volume_list is defined, hence it's printed)
However, the comments will display more info (see also previous commit):
$lvmconfig --type full activation/volume_list activation/read_only_volume_list --withsummary
# Configuration option activation/volume_list.
# Only LVs selected by this list are activated.
# This configuration option does not have a default value defined.
# Value defined in existing configuration has been used for this setting.
volume_list="/dev/vg/lv"
# Configuration option activation/read_only_volume_list.
# LVs in this list are activated in read-only mode.
# This configuration option does not have a default value defined.
Display comment abour value from existing config being used. For example:
$ lvmconfig --type full --withsummary report/compact_output report/buffered
# Configuration option report/compact_output.
# Do not print empty report fields.
# Value defined in existing configuration has been used for this setting.
compact_output=1
# Configuration option report/buffered.
# Buffer report output.
buffered=1
The lvmconfig --type full is actually a combination of --type current
and --type missing together with --mergedconfig options used.
The overall outcome is a configuration tree with settings as LVM sees
it when it looks for the values - that means, if the setting is defined
in some config source (lvm.conf, --config, lvmlocal.conf or any profile
that is used), the setting is used. Otherwise, if the setting is not
defined in any part of the config cascade, the defaults are used.
The --type full displays exactly this final tree with all the values
defined, either coming from configuration tree or from defaults.
Synchronize with udev logic before reusing device as snapshot.
This patch tries to fix the problem with udev, where we manage
to 'active' LV for clearing, then we deactivate such device and
active again as member of 'origin&snapshot' tree all in 1 step.
There needs to be a sync point where udev has time to remove all links,
otherwise we race with scans and we may end-up with mysterious 'free'
links in the system pointing to wrong dm names.
This patch tries to fix failing topology cluster tests..
We shouldn't be adding spaces by default in output as that
may be be used already in scripts and especially for the eval
in shell scripts where spaces are not allowed between key
and value!
Add --withspaces option to lvmconfig for pretty output with
more space in for readability.
It's not an error to define empty values for
{thin,cache}_{check,repair}_options - such empty value means no
options are passed when these external commands are called.
If blkid wiping is possible, than set use_blkid_wiping=1 and
use_blkid_wiping=0 otherwise for its default value. If blkid wiping
is disabled during configure and use_blkid_wiping=1 is set by chance,
it's simply ignored - this patch is just a cleanup that makes it more
obvious for the user (we use similar logic for use_lvmetad and
use_lvmpolld settings).
Default value for lvmetad and lvmpolld has hooks in configure script,
the "lvmconfig --type default --unconfigured" should display:
use_lvmetad = @DEFAULT_USE_LVMETAD@
use_lvmpolld = @DEFAULT_USE_LVMPOLLD@
Note that these settings are not of string type. Recent change (the
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES formatting flag) makes it
possible to recognize that the setting is not of string type and if
there's unconfigured value defined for it, the enclosing " " is
automatically removed on output.
Do not use "#S" (blank string) as default value as that ends up with
'key = [ "" ]' to be generated which is not what we want in most cases.
Also, fix default values for global/{thin,cache}_{check,repair}_options
and avoid assigning blank values. For example, the thin_check_options
had this set as default value previously:
"#S" DEFAULT_THIN_CHECK_OPTION1 "#S" DEFAULT_THIN_CHECK_OPTION2
If any (or both) of DEFAULT_THIN_CHECK_OPTION* variables was set
to "", we ended up with clumsy default value generated like:
thin_check_options = [ "-q", "" ]
With this patch, we end up with correct:
thin_check_options = [ "-q" ]
or, if all options are undefined:
thin_check_options = [ ]
Which is the correct way to express this.
There are two basic groups of formatting flags (32 bits):
- common ones applicable for all config value types (lower 16 bits)
- type-related formatting flags (higher 16 bits)
With this patch, we initially support four new flags that
modify the the way the config value is displayed:
Common flags:
=============
DM_CONFIG_VALUE_FMT_COMMON_ARRAY - causes array config values
to be enclosed in "[ ]" even if there's only one item
(previously, there was no way to recognize an array with one
item and scalar value, hence array values with one member
were always displayed without "[ ]" which libdm accepted
when reading, but it may have been misleading for users)
DM_CONFIG_VALUE_FMT_COMMON_EXTRA_SPACE - causes extra spaces to
be inserted in "key = value" (or key = [ value, value, ... ] in
case of arrays), compared to "key=value" seen on output before.
This makes the output more readable for users.
Type-related flags:
===================
DM_CONFIG_VALUE_FMT_INT_OCTAL - prints integers in octal form with
"0" as a prefix (libdm's config reading code can handle this via
strtol just fine so it's properly recognized as number in octal
form already if there's "0" used as prefix)
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES - makes it possible to print
strings without enclosing " "
This patch also adds dm_config_value_set_format_flags and
dm_config_value_get_format_flags functions to set and get
these formatting flags.
This is the client side handling of the global_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
The function added here:
. checks if the global state in lvmetad is invalid
. if so, scans disks to update the state in lvmetad
. clears the global_invalid flag in lvmetad
. updates the local udev db to reflect any changes
and update the lvmetad copy after it is reread from disk.
This is the client side handling of the vg_invalid state
added to lvmetad in commit c595b50cec8a6b95c6ac4988912d1412f3cc0237.
Add the ability to invalidate global or individual VG metadata.
The invalid state is returned to lvm commands along with the metadata.
This allows lvm commands to detect stale metadata from the cache and
reread the latest metadata from disk (in a subsequent patch.)
These changes do not change the protocol or compatibility between
lvm commands and lvmetad.
Global information
------------------
Global information refers to metadata that is not isolated
to a single VG , e.g. the list of vg names, or the list of pvs.
When an external system, e.g. a locking system, detects that global
information has been changed from another host (e.g. a new vg has been
created) it sends lvmetad the message: set_global_info: global_invalid=1.
lvmetad sets the global invalid flag to indicate that its cached data is
stale.
When lvm commands request information from lvmetad, lvmetad returns the
cached information, along with an additional top-level config node called
"global_invalid". This new info tells the lvm command that the cached
information is stale.
When an lvm command sees global_invalid from lvmated, it knows it should
rescan devices and update lvmetad with the latest information. When this
is complete, it sends lvmetad the message: set_global_info:
global_invalid=0, and lvmetad clears the global invalid flag. Further lvm
commands will use the lvmetad cache until it is invalidated again.
The most common commands that cause global invalidation are vgcreate and
vgextend. These are uncommon compared to commands that report global
information, e.g. vgs. So, the percentage of lvmetad replies containing
global_invalid should be very small.
VG information
--------------
VG information refers to metadata that is isolated to a single VG,
e.g. an LV or the size of an LV.
When an external system determines that VG information has been changed
from another host (e.g. an lvcreate or lvresize), it sends lvmetad the
message: set_vg_info: uuid=X version=N. X is the VG uuid, and N is the
latest VG seqno that was written. lvmetad checks the seqno of its cached
VG, and if the version from the message is newer, it sets an invalid flag
for the cached VG. The invalid flag, along with the newer seqno are saved
in a new vg_info struct.
When lvm commands request VG metadata from lvmetad, lvmetad includes the
invalid flag along with the VG metadata. The lvm command checks for this
flag, and rereads the VG from disk if set. The VG read from disk is sent
to lvmetad. lvmetad sees that the seqno in the new version matches the
seqno from the last set_vg_info message, and clears the vg invalid flag.
Further lvm commands will use the VG metadata from lvmetad until it is
next invalidated.
Since our test environment runs also in non-real-udev world,
it's using /etc/.cache file with scanned files.
So in this case it is mandatory the user runs 'vgscan'
after a device reappears in the system.
This 'first' lvm2 command then fixes metadata (just like vgs did).
Use of display_lvname() in plain log_debug() may accumulate memory in
command context mempool. Use instead small ringbuffer which allows to
store cuple (10 ATM) names so upto 10 full names can be used at one.
We are not keeping full VG/LV names as it may eventually consume larger
amount of RAM resouces if vgname is longer and lots of LVs are in use.
Note: if there would be ever needed for displaing more names at once,
the limit should be raised (e.g. log_debug() would need to print more
then 10 LVs on a single line).
With thin-pool kernel target module 1.13 it's now support usage of
external origin with sizes which are not 'alligned' with chunk size
of thin-pool.
Enable lvm2 support for this and also fix reporting of data_percent
usage for case sizes are not alligned.
Note that this is just a quick fix and it needs more robust fix to
encompass any combination, not just the (old) snapshot one!
This started with this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1219222
If we have devices/ignore_suspended_devices=1 set based on which we
filter out suspended devices as unusable (or if we ignore suspended
devices by force, e.g. during lvconvert called from dmeventd) and
when we have snapshot and snapshot origin devices in the play, we
need to look at their components unerneath (*-real and *-cow) to
check if they're not suspended. If they are, the snapshot/snapshot
origin is not usable as well and hence it needs to be filtered out
by filter-usable.c code which does suspended device filtering.
Not going into much details here, more details are in the bugzilla
mentioned above. However, this is a quick fix since snapshot
and this exact situation is not the only one. So this is
something that needs to be revisited and fixed properly
with full dm tree and checking the whole stack to state
whether the device at the very top is usable or not.
This patch fixes segfault which was caused by incorrectly marking some
settings CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED - the
ones which have NULL default value, hence they're really undefined.
A regression caused by a98ceceb1d.
For example:
$ lvmconfig log/file
file="/a"
Before this patch:
$ lvmconfig --type diff
Segmentation fault (core dumped)
With this patch applied:
$ lvmconfig --type diff
log {
file="/a"
}
The same applies for these settings:
log/activate_file
global/library_dir
global/system_id_file
<disk_area>/disk_area_id
There were also other settings with NULL default value and marked as
CFG_DEFAULT_COMMENTED instead of CFG_DEFAULT_UNDEFINED, but they were
cfg_array config settings where the NULL value was not causing segfault
(NULL == empty array).
Just as 'e' means activation with an exclusive lock,
add an 's' to mean activation with a shared lock.
This allows the existing but implicit behavior of '-ay'
of clvm LVs to be specified explicitly. For local VGs,
asy simply means ay, just like aey means ay.
For local VGs, ay == aey == asy
For clvm VGs, ay == asy, aey == aey, asy == asy
The hyphens are removed from long option names before
being read. This means that:
- Option name specifications in args.h must not include hyphens.
(The hyphen in 'use-policies' is removed.)
- A user can include hyphens anywhere in the option name.
All the following are equivalent:
--vgmetadatacopies,
--vg-metadata-copies,
--v-g-m-e-t-a-d-a-t-a-c-o-p-i-e-s-
lvmetad_init() should not be called with open connection to the daemon.
Doing so is considered to be an internall error within lvm2 code.
Such coincidence can't occur within current code. Let's assure us it won't
ever happen.
Some of descritpions were misleading at least. Some were completely
off the reality.
lvmetad_init doesn't re-establish or initialise a connection
lvmetad_active and lvmetad_connect_or_warn can do so.
There are reports of unexplained ioctl failures when using dmeventd.
An explanation might be that the wrong value of errno is being used.
Change libdevmapper to store an errno set by from dm ioctl() directly
and provide it to the caller through a new dm_task_get_errno() function.
[Replaced f9510548667754d9209b232348ccd2d806c0f1d8]
Commit b00711e312 improperly
convert _area_missing() replacment and moved check for
AREA_PV seg_type() into same if() section.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
If lvmetad is not used, we generate lvm2-activation{-early,-net}.service
systemd services to activate any VGs found on the system. So far we used
--sysinit during this activation as polling was still forked off of the
lvm activation command.
This has changed with lvmpolld - we have proper lvmpolld systemd
service now (activated via its socket unit). As such, we don't need
to use --sysinit anymore during activation in systemd environment
as polling was the only barrier to remove the need for --sysinit.
There's a race when asking lvmpolld about progress_status and
actually reading the progress info from kernel:
Even with lvmpolld being used we read status info from
LVM2 command issued by a user (client side from lvmpolld perspective).
The whole cycle may look like following:
1) set up an operation that requires polling (i.e. pvmove /dev/sda)
2) notify lvmpolld about such operation (lvmpolld_poll_init())
3) in case 1) was not called with --background it would continue with:
4) Ask lvmpolld about progress status. it may respond with one of:
a) in_progress
b) not_found
c) finished
d) any low level error
5) provided the answer was 4a) try to read progress info from polling LV
(i.e. vg00/pvmove1). Repeat steps 4) and 5) until the answer is != 4a).
And now we got into racy configuration: lvmpolld answered with in_progress
but it may be the that in_between 4) and 5) the operation has already
finished and polling LV is already gone or there's nothing to ask for.
Up to now, 5) would report warning and it could print such warning many
times if --interval was set to 0.
We don't want to scary users by warnings in such situation so let's just
print these messages in verbose mode. Error messages due to error while
reading kernel status info (on existing, active and locked LV) remained
the same.
Avoid using make's $(shell invocation since the eval order is
then somewhat different and use $$( subshell.
This also fixes a problem when more then one symbol is found,
since target shell has been given separate word list
so the 'R' assignment would need "" around it.
currently in wait_for_single_lv() fn trying to poll missing pvmove LV
is considered success. It may have been already finished by another
instance of polldaemon. either by another forked off polldaemon
or by lvmpolld.
Let's try to handle the mirror conversion and snapshot merge the same
way.
These wrappers have been replaced by direct calls
to vg_read() and find_lv() in previous commits.
This commit should have no functional impact since
all bits were already unreachable.
let's call dev_close_all() only before we're about to 'sleep'
for at least one second during the polling.
(it's questionable whether to call dev_close_all() at all in
polldaemon code. Natural extension would be to drop it completely)
Previous patch incorrectly skipped replace of @LOCALEDIR@.
The standard option is --localedir so use --with-localedir
as backward compatible option and set localedir if it's not
yet been set (if the could ever happen).
Use double-eval to translate $datarootdir to $prefix to real dir.
More exact clean of library exported symbols files.
Also use $(firstword) test to check for empty string
so 'make clean' has now cleaner condensed look.
Clean also created include links.
Possibly easier to follow - to have just a single dependency line
and use if() within rule.
Also replace $(words) with $(firstword) which is more commonly used.
Set LVM_TEST_THIN_REPAIR_CMD to /bin/false for test which
doesn't need it.
This way - even if on the system there is no such tool present,
test will not result with warning about missing tool.
Also remove from Makefile settings of TEST vars which are set in
through /lib/paths - this also allows to override them in test.
as of now lvmpolld works as client utility for
querying running instance of lvmpolld server
on metadata, state, etc.
Currently the only request implemented is the '--dump'.
It prints out full lvmpolld state (mimics lvmdump -p command).
we don't want to fail properly set pvmove after metadata
update. failure to copy id components could end with dangling
mirror moving PV segments but no monitoring from lvmpolld or
classical polldaemon.
lvpoll now process passed LV name properly. It respects
LVM_VG_NAME env. variable and is able to process LV name
passed in various formats:
- VG/LV
- LV name only (with LVM_VG_NAME set)
- /dev/mapper/VG-LV
- /dev/VG/LV
Use CFG_DEFAULT_COMMENTED and CFG_DEFAULT_UNDEFINED to
replicate the existing comments in example.conf.
Fix host_list to be cfg_array.
UNDEFINED is only used if the value depends on other
system/kernel values outside of lvm. The most common
case is when dm-thin or dm-cache have built-in default
settings in the kernel, and lvm will use those built-in
default values unless the corresponding lvm config setting
is set.
COMMENTED is used to comment out the default setting in
lvm.conf. The effect is that if the LVM version is
upgraded, and the new version of LVM has new built-in
default values, the new defaults are used by LVM unless
the previous default value was set (uncommented) in lvm.conf.
Introduce new implmentation of dm_task_get_info() function
with support for reading internal_suspend.
.
This time it is done in a 'versioned' way.
We keep the old fashion dm_task_get_info(Base) to implement
the old behavior of 1.02.95 libdm code.
libdm version 1.02.96 introduced 'macro' wrapper
dm_task_get_info_with_deferred_remove with new implementation
of dm_task_get_info() - we cannot do anything else then to
provide compatible version of this symbol.
Now in version 1.02.97 we add new versioned implementation of
dm_task_get_info(DM_1_02_97) symbol.
This has the effect that i.e. rpm build will finaly resolve proper
dependency on a new symbol - so it will be no longer possible,
to build a new binary and use old library
(rpm -q --provides will show libdevmapper.so.1.02(DM_1_02_97)(64bit))
Also the history is now tracked. If a new function is added (or
reimplemented), it needs to be placed in proper file,
so it could be exported with right versioning symbol.
File .exported_symbols.Base should and any existing older DM
should be treated as read-only after a release.
Also - only libdm has been currently enhanced with versioned .Base
file, as soon as other libs (liblvm, libdevmapper-event) needs changes
they should also get their exported symbol files - meanwhile
make.tmpl handles both cases.
Since now we enable those by default when compiled with those daemons,
explicitely disable them in tests when needed.
Alphabetically sort configurables.
Basic support for upstream 'build' of rpm packages.
Make spec file generated.
2 new simple targets:
make dist - create LVM2.MAJOR.MINOR.PATCHLEVEL.tgz from git files.
make rpm - some generic rpmbuilder using spec files.
Create packages in build/ subdir.
DO NOT USE created rpms in any distribution!
Configure provides proper settings for
use_lvmetad and use_lvmpolld conf setttings.
When the build of polld & lvmetad, these settings
are enabled by default unless explicitelly disabled
with --disable-use-lvmetad/--disable-use-lvmpolld.
This is an alternative/equivalent to commit
ca67cf84df
The problem (wrong label->dev after a new preferred
duplicate device is chosen) was isolated to the lvmetad
case (non-lvmetad worked fine), and this fixes the problem
by setting the new label->dev in the lvmetad-specific
code rather than in the general lvmcache code.
In process_each_{vg,lv,pv} when no vgname args are given,
the first step is to get a list of all vgid/vgname on the
system. This is exactly what lvmetad returns from a
vg_list request. The current code is doing a vg_lookup
on each VG after the vg_list and populating lvmcache with
the info for each VG. These preliminary vg_lookup's are
unnecessary, because they will be done again when the
processing functions call vg_read. This patch eliminates
the initial round of vg_lookup's, which can roughly cut in
half the number of lvmetad requests and save a lot of extra work.
Use 64bit arithmentic for PV size calculation (Coverity).
Also remove sector shift for compared PV size, since all
values are already held in sectors.
This fixes validatio of PV size when restoring PV
from vg metadata backup file.
Improve the python unit test case to cover all of the properties of a LV and
the properties of a LV segment.
In addition we also add a 'tag' to the lv so that we can retrieve it
using the 'lv_tags' property to ensure that this works as expected.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Synopsis: STR_LIST needs to be treated as STR for properties.
For any lvm property that was internally 'typed' as a string list we were failing
to return a string in the property API. This was due to the fact that for the
properties to work the value needs to either be evaulated as a string or a
number. This change corrects the macro used to build the memory array of
structures so that the string bitfield is set as needed to ensure that the value
is a string.
https://bugzilla.redhat.com/show_bug.cgi?id=1139920
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When retrieving a property value that is a string, if the character pointer in C
was NULL, we would segfault. This change checks for non-null before creating a
python string representation. In the case where the character pointer is NULL
we will return a python 'None' for the value.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
With the lvm2app C API adding the ability to determine when a property is
signed we can then use this information to construct the correct representation
of the number for python which will maintain value and sign. Previously, we
only represented the numbers in python as positive integers.
Python long type exceeds the range for unsigned and signed integers, we just
need to use the appropriate parsing code to build correctly.
Python part of the fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=838257
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Currently lvm2app properties have the following structure:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t padding:28;
union {
const char *string;
uint64_t integer;
} value;
} lvm_property_value_t;
which assumes that numerical values were in the range of 0 to 2**64-1. However,
some of the properties were 'signed', like LV major/minor numbers and some
reserved values for properties that represent percentages. Thus when the
values were retrieved they were in two's complement notation. So for a -1
major number the API user would get a value of 18446744073709551615. The
API user could cast the returned value to an int64_t to handle this, but that
requires the API developer to look at the source code and determine when it
should be done.
This change modifies the return property structure to:
typedef struct lvm_property_value {
uint32_t is_settable:1;
uint32_t is_string:1;
uint32_t is_integer:1;
uint32_t is_valid:1;
uint32_t is_signed:1;
uint32_t padding:27;
union {
const char *string;
uint64_t integer;
int64_t signed_integer;
} value;
} lvm_property_value_t;
With this addition the API user can interrogate that the value is numerical,
(is_integer = 1) and subsequently check if it's signed (is_signed = 1) too.
If signed, then the API developer should use the union's signed_integer to
avoid casting.
This change maintains backwards compatibility as the structure size remains
unchanged and integer value remains unchanged. Only the additional bit
taken from the pad is utilized.
Bugzilla reference:
https://bugzilla.redhat.com/show_bug.cgi?id=838257
Signed-off-by: Tony Asleson <tasleson@redhat.com>
querying future lvmpolld with zero wait time is highly undesirable
and can cause serious performance drop of the future daemon. The new
wrapper function may avoid immediate return from syscal by
introducing minimal wait time on demand.
Routines responsible for polling of in-progress pvmove, snapshot merge
or mirror conversion each used custom lookup functions to find vg and
lv involved in polling.
Especially pvmove used pvname to lookup pvmove in-progress. The future
lvmpolld will poll each operation by vg/lv name (internally by lvid).
Also there're plans to make pvmove able to move non-overlaping ranges
of extents instead of single PVs as of now. This would also require
to identify the opertion in different manner.
The poll_operation_id structure together with daemon_parms structure they
identify unambiguously the polling task.
Waiting even after _check_lv_status returned success and
'finished' flag was set to true doesn't make much sense.
Note that while we skip the wait() we also skip the
init_full_scan_done(0) inside the routine. This should
have no impact as long as the code after _wait_for_single_lv
doesn't presume anything about the state of the cache.
as a part of bigger effort to unify polling intefaces
poll_get_copy_lv should be able to look up LVs based
on theirs lv->status field.
Effective after pvmove starts using poll_get_copy_lv
fn as well.
When kernel target reports sync status as 0% it might as well mean
it's 100% in sync, just the target is in some race inconsistent
state - so reread once again and take a more optimistic value ;)
Patch tries to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=1210637
Reinstate config settings matching the last release until every
case where the generator produces different output has been reviewed
and fresh decisions made about which defaults to expose as protection
against changes in newer releases. We should be trying to reduce, not
increase, this number.
Introduce LVM_TEST_LVMETAD_DEBUG_OPTS to allow to override
default debug opts for lvmetad.
However could be still overloaded on command line:
make check_lvmetad LVM_TEST_LVMETAD_DEBUG_OPTS="-l all"...
Better name for aux function.
First use normal -TERM, and only after a while use -KILL
(leaving some time to correctly finish)
Print INFO about killed processes.
This patch adds supporting code for handling deprecated settings.
Deprecated settings are not displayed by default in lvmconfig output
(except for --type current and --type diff). There's a new
"--showdeprecated" lvmconfig option to display them if needed.
Also, when using lvmconfig --withcomments, the comments with info
about deprecation are displayed for deprecated settings and with
lvmconfig --withversions, the version in which the setting was
deprecated is displayed in addition to the version of introduction.
If using --atversion with a version that is lower than the one
in which the setting was deprecated, the setting is then considered
as not deprecated (simply because at that version it was not
deprecated).
For example:
$ lvmconfig --type default activation
activation {
...
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated
activation {
...
mirror_region_size=512
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withversions
activation {
...
# Available since version 1.0.0.
# Deprecated since version 2.2.99.
mirror_region_size=512
# Available since version 2.2.99.
raid_region_size=512
...
}
$ lvmconfig --type default activation --showdeprecated --withcomments
activation {
...
# Configuration option activation/mirror_region_size.
# This has been replaced by the activation/raid_region_size
# setting.
# Size (in KB) of each copy operation when mirroring.
# This configuration option is deprecated.
mirror_region_size=512
# Configuration option activation/raid_region_size.
# Size in KiB of each raid or mirror synchronization region.
# For raid or mirror segment types, this is the amount of
# data that is copied at once when initializing, or moved
# at once by pvmove.
raid_region_size=512
...
}
$ lvmconfig --type default activation --withcomments --atversion 2.2.98
activation {
...
# Configuration option activation/mirror_region_size.
# Size (in KB) of each copy operation when mirroring.
mirror_region_size=512
...
}
A preparatory code for marking configuration nodes as deprecated:
- struct cfg_def_item gains 2 new fields ("deprecated_since_version" and "deprecation_comment"
- cfg* macros to handle new fields
- related config_settings.h edits to add new fields for each item (null for all at the moment)
Patch with implementation will follow...
Before this patch:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad
With this patch applied:
$ lvmconfig --type list --withversions --withsummary global/use_lvmetad
global/use_lvmetad - Use lvmetad to cache metadata and reduce disk scanning. [2.2.93]
$ lvmconfig --type list --withversions global/use_lvmetad
global/use_lvmetad - [2.2.93]
We're commenting out settings with undefined default values.
The comment character '#' was printed at the very beginning of
the line, it should be placed just at the beginning of the setting,
after the space/tab prefix is printed.
Before this patch:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
With this patch applied:
$ lvmconfig --type default activation
activation {
...
# volume_list=[]
...
}
New lvmconf function is using bash associative arrays - however
older systems like RHEL5 doesn't provide this feature. In this case
stay with older variant.
Restore support for use case like this:
aux lvmconf 'tags/@foo {}'
works with systemd activated daemons only as of now
each daemon implementation may decide to signalize its
internal idle state (i.e. all background tasks unrelated to
client threads are finished)
These settings are in the "unsupported" group:
devices/loopfiles
log/activate_file
metadata/disk_areas (section)
metadata/disk_areas/<disk_area> (section)
metadata/disk_areas/<disk_area>/size
metadata/disk_areas/<disk_area>/id
These settings are in the "advanced" group:
devices/dir
devices/scan
devices/types
global/proc
activation/missing_stripe_filler
activation/mlock_filter
metadata/pvmetadatacopies
metadata/pvmetadataignore
metadata/stripesize
metadata/dirs
Also, this patch causes the --ignoreunsupported and --ignoreadvanced
switches to be honoured for all config types (lvmconfig --type).
By default, the --type current and --type diff display unsupported
settings, the other types ignore them - this patch also introduces
--showunsupported switch for all these other types to display even
unsupported settings in their output if needed.
lvmconfig --type list displays plain list of configuration settings.
Some of the existing decorations can be used (--withsummary and
--withversions) as well as existing options/switches (--ignoreadvanced,
--ignoreunsupported, --ignorelocal, --atversion).
For example (displaying only "config" section so the list is not long):
$lvmconfig --type list config
config/checks
config/abort_on_errors
config/profile_dir
$ lvmconfig --type list --withsummary config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig -l config
config/checks - If enabled, any LVM configuration mismatch is reported.
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found.
config/profile_dir - Directory where LVM looks for configuration profiles.
$ lvmconfig --type list --withsummary --withversions config
config/checks - If enabled, any LVM configuration mismatch is reported. [2.2.99]
config/abort_on_errors - Abort the LVM process if a configuration mismatch is found. [2.2.99]
config/profile_dir - Directory where LVM looks for configuration profiles. [2.2.99]
Example with --atversion (displaying global section):
$ lvmconfig --type list global
global/umask
global/test
global/units
global/si_unit_consistency
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/etc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/prioritise_write_locks
global/library_dir
global/locking_library
global/abort_on_internal_errors
global/detect_internal_vg_cache_corruption
global/metadata_read_only
global/mirror_segtype_default
global/raid10_segtype_default
global/sparse_segtype_default
global/lvdisplay_shows_full_device_path
global/use_lvmetad
global/thin_check_executable
global/thin_dump_executable
global/thin_repair_executable
global/thin_check_options
global/thin_repair_options
global/thin_disabled_features
global/cache_check_executable
global/cache_dump_executable
global/cache_repair_executable
global/cache_check_options
global/cache_repair_options
global/system_id_source
global/system_id_file
$ lvmconfig --type list global --atversion 2.2.50
global/umask
global/test
global/units
global/suffix
global/activation
global/fallback_to_lvm1
global/format
global/format_libraries
global/segment_libraries
global/proc
global/locking_type
global/wait_for_locks
global/fallback_to_clustered_locking
global/fallback_to_local_locking
global/locking_dir
global/library_dir
global/locking_library
some tests left dangling bg processes originating in
lvm2 commands being able to spawn any bg polling process
(lvchange, vgchange, pvmove, lvconvert...)
Initial fn 'add_to_kill_list' should collect processes with
specific parameters (proc's command line and parent processes ID).
After testing finishes the fn kill_listed_processes should remove these
listed by 'add_to_kill_list'.
Unfortunately it proved to be prone to an error especially in scenarios
where cmd line of initiating command contained characters required to
be espaced before passing to shell script to make it work correctly.
(Or if cmd spawned more than one bg process with same cmd line. i.e.:
vgchange or lvchange).
The new implementation is much simpler. It uses env. variable (LVM_TEST_TAG)
for marking a process desired to be killed later or during test env. teardown.
(i.e.: LVM_TEST_TAG=kill_me_$PREFIX to kill only processes related to
current test environment)
'lvm dumpconfig' now does a lot more than just dumping configuration
information and is no longer only a support tool. Users now need
to run it to find out about configuration information that has been
removed from the lvm.conf man page so we need to promote this to full
command line status as 'lvmconfig'. Also accept 'lvm config' and mention
it in the usage information of lvmconf (which should also get merged in
eventually).
Example:
/dev/loop0 and /dev/loop1 are duplicates,
created by copying one backing file to the
other.
'identity /dev/loopX' creates an identity
mapping for loopX named idmloopX, which
adds a duplicate for the named device.
The duplicate selection code for lvmetad is
incomplete, and lvmetad is disabled for this
example.
[~]# losetup -f loopfile0
[~]# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 foo lvm2 a-- 308.00m 296.00m
[~]# losetup -f loopfile1
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop1 foo lvm2 a-- 308.00m 308.00m
[~]# ./identity /dev/loop0
[~]# pvs
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 without holders, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop0 foo lvm2 a-- 308.00m 296.00m
[~]# ./identity /dev/loop1
[~]# pvs
WARNING: duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV is being used from both devices /dev/loop0 and /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/loop1 not /dev/loop0
Using duplicate PV /dev/loop1 which is more recent, replacing /dev/loop0
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop0 not /dev/loop1
Using duplicate PV /dev/mapper/idmloop0 from subsystem DM, replacing /dev/loop1
Found duplicate PV LnSOEqzEYED3RvIOa5PZP2s7uyuBLmAV: using /dev/mapper/idmloop1 not /dev/mapper/idmloop0
Using duplicate PV /dev/mapper/idmloop1 which is more recent, replacing /dev/mapper/idmloop0
PV VG Fmt Attr PSize PFree
/dev/mapper/idmloop1 foo lvm2 a-- 308.00m 308.00m
Describe
thin_check_options, thin_repair_options,
cache_check_option, scache_repair_options
as a "list of options", rather than a "string of options"
because a single string, e.g. "-q --clear-needs-check-flag"
does not work, and needs to be entered as a list,
e.g. ["-q", "--clear-needs-check-flag"]
The settings which have their default value evaluated in runtime should
have their 'unconfigured' counterparts also evaluated in runtime since
those values can be constructed by using other settings.
For example, before this patch:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="/etc/lvm/cache/.cache
With this patch applied:
$ lvm dumpconfig --type default --unconfigured devices/cache_dir devices/cache
cache_dir="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@"
cache="@DEFAULT_SYS_DIR@/@DEFAULT_CACHE_SUBDIR@/.cache"
The @something@ used for unconfigured default value is not bound to
CFG_TYPE_STRING settings defined in config_settings.h, it can be
used for any other config type too.
When test is executed on real device - lets try a more complete
cleanup - discard whole device first and try to wipe any
headers it might be left from previous test.
Show full chain of ancestors and descendants for snapshots
(both thick and thin - in case of thick, the "ancestor" field
is actually equal to "origin" field as snapshots can't be
chained for thick snapshots).
These fields display current state as it is, they do not
display any history! If the snapshot chain is broken in
the middle, we don't report the historical origin (this
is going to be a part of another patch and a different
set of fields or just a switch for existing fields to
show ancestors and descendants with history included).
For example:
(origin --> snapshot)
lvol1 --> lvol2 --> lvol3 --> lvol4
\
--> lvol5 --> lvol6 --> lvol7 --> lvol8
$ lvs -o name,pool_lv,origin,ancestors,descendants vg
LV Pool Origin Ancestors Descendants
lvol1 pool lvol2,lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol2 pool lvol1 lvol1 lvol3,lvol4,lvol5,lvol6,lvol7,lvol8
lvol3 pool lvol2 lvol2,lvol1 lvol4
lvol4 pool lvol3 lvol3,lvol2,lvol1
lvol5 pool lvol2 lvol2,lvol1 lvol6,lvol7,lvol8
lvol6 pool lvol5 lvol5,lvol2,lvol1 lvol7,lvol8
lvol7 pool lvol6 lvol6,lvol5,lvol2,lvol1 lvol8
lvol8 pool lvol7 lvol7,lvol6,lvol5,lvol2,lvol1
Scenario:
$ vgs -o+vg_mda_copies
VG #PV #LV #SN Attr VSize VFree #VMdaCps
fedora 1 2 0 wz--n- 9.51g 0 unmanaged
vg 16 9 0 wz--n- 1.94g 1.83g 2
$ lvs -o+read_ahead vg/lvol6 vg/lvol7
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Before this patch:
$vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
VG #VMdaCps
vg 2
Problem:
Reserved values can be only used with exact match = or !=, not <,<=,>,>=.
In the example above, the "unamanaged" is internally represented as
18446744073709551615, but this should be ignored while not comparing
field directly with "unmanaged" reserved name with = or !=. Users
should not be aware of this internal mapping of the reserved value
name to its internal value and hence it doesn't make sense for such
reserved value to take place in results of <,<=,> and >=.
There's no order defined for reserved values!!! It's a special
*reserved* value that is taken out of the usual value range
of that type.
This is very similar to what we have already fixed with
2f7f6932dc, but it's the other way round
now - we're using reserved value name in selection criteria now
(in the patch 2f7f693, we had concrete value and we compared it
with the reserved value). So this patch completes patch 2f7f693.
This patch also fixes this problem:
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Problem:
In the example above, the internal reserved value "auto" is in the
range of selection "> 32k" - it shouldn't match as well. Here the
"auto" is internally represented as MAX_DBL and of course, numerically,
MAX_DBL > 256k. But for users, the reserved value should be uncomparable
to any number so the mapping of the reserved value name to its interna
value is transparent to users. Again, there's no order defined for
reserved values and hence it should never match if using <,<=,>,>=
operators.
This is actually exactly the same problem as already described in
2f7f6932dc, but that patch failed for
size field types because of incorrect internal representation used.
With this patch applied, both problematic scenarios mentioned
above are fixed now:
$ vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
(blank)
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Rahead
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
When test suite is used from installed rpm package
we need to handle things better.
This patch is rather first approach - expecting few more
tweaks needed.
At this moment LVM_LOG_FILE_EPOCH with
LVM_EXPECTED_EXIT_STATUS properly deletes debug logs
only for real commands - support for lvm2 API does not yet
exists
By default these are empty strings, so the config settings
should be flagged as undefined, so they will be commented
out of the generated config. Otherwise, the lines:
thin_repair_options=""
cache_repair_options=""
in the dump output cause a warning when processed since
lvm doesn't want an empty string.
Also regenerate lvm.conf.in.
The specific config settings have been removed
from the lvm.conf(5) man page, and replaced with
a description of how to use lvm dumpconfig to
view the settings.
The sample lvm.conf and lvmlocal.conf files are now generated:
example.conf.base - initial ungenerated part of the file
example.conf.gen - generated portion from dumpconfig
example.conf.in - combination of .base and .gen files
example.conf - result of configure processing .in file
lvmlocal.conf.base - initial ungenerated part of the file
lvmlocal.conf.gen - generated portion from dumpconfig
lvmlocal.conf.in - combination of .base and .gen files
lvmlocal.conf - result of configure processing .in file
Do not edit the .in files, but edit config_settings.h
or the .base files, and then use 'make generate' to create
the new .in files.
- configure
with options
- make
creates tools/lvm
- make generate
uses tools/lvm to create example.conf.in and lvmlocal.conf.in
by combining .base files with dumpconfig output.
- configure
with same options as above
creates example.conf and lvmlocal.conf from .in files
With use_lvmetad=0, duplicate PVs /dev/loop0 and /dev/loop1,
where in this example, /dev/loop1 is the cached device
referenced by pv->dev, the command 'pvs /dev/loop0' reports:
Failed to find physical volume "/dev/loop0".
This is because the duplicate PV detection by pvid is
not working because _get_all_devices() is not setting
any dev->pvid for any entries. This is because the
pvid information has not yet been saved in lvmcache.
This is fixed by calling _get_vgnameids_on_system()
before _get_all_devices(), which has the effect of
caching the necessary pvid information.
With this fix, running pvs /dev/loop0, or pvs /dev/loop1,
produces no error and one line of output for the PV (the
device printed is the one cached in pv->dev, in this
example /dev/loop1.)
Running 'pvs /dev/loop0 /dev/loop1' produces no error
and two lines of output, with each device displayed
on one of the lines.
Running 'pvs -a' shows two PVs, one with loop0 and one
with loop1, and both shown as a member of the same VG.
Running 'pvs' shows only one of the duplicate PVs,
and that shows the device cached in pv->dev (loop1).
The above output is what the duplicate handling code
was previously designed to output in commits:
b64da4d8b5 toollib: search for duplicate PVs only when needed
3a7c47af0e toollib: pvs -a should display VG name for each duplicate PV
57d74a45a0 toollib: override the PV device with duplicates
c1f246fedf toollib: handle duplicate pvs in process_in_pv
As a further step after this, we may choose to change
some of those.
For all of these commands, a warning is printed about
the existence of the duplicate PVs:
Found duplicate PV ...: using /dev/loop1 not /dev/loop0
Rename envvar LVM_LOG_FILE_UNLINK_STATUS to LVM_EXPECTED_EXIT_STATUS
and change compare sign from '!' to '>'.
Validate LVM_LOG_FILE_EPOCH and support strictly only
up-to 32 alpha chars. If the content doesn't pass
epoch is simply ignored.
Enhance 'not' to manage autodeletion of log files in right cases.
Use separately marked epoch log files for clvmd and dmeventd.
Properly manage stack tracing for new debug.log names.
Add support for 2 new envvars for internal lvm2 test suite
(though it could be possible usable for other cases)
LVM_LOG_FILE_EPOCH
Whether to add 'epoch' extension that consist from
the envvar 'string' + pid + starttime in kernel units
obtained from /proc/self/stat.
LVM_LOG_FILE_UNLINK_STATUS
Whether to unlink the log depending on return status value,
so if the command is successful the log is automatically
deleted.
API is still for now experimental to catch various issue.
add info for various commits, most significant were:
- toollib: close connection to lvmetad after fork
(fe30658a4d)
- toollib: do not spawn polling in lv_change_activate
(c26d81d6e6)
- pvmove: split pvmove_update_metadata function
(65623b63a2)
- lvconvert: move poll code in before refactoring
(5190f56605)
- pvmove: move poll code in before refactoring
(a098aa419f)
--withfullcomments prints all comment lines for each config option.
--withcomments prints only the first comment line, which should be
a short one-line summary of the option.
Previous commit has made have_cache & have_thin producing
false return value.
Fix it and at the some time provide much better reconfiguring
warning message.
If the test machine is missing needed and configured binaries
it will produce TEST WARNING result.
When a var like LVM_TEST_THIN_CHECK_CMD is set to ""
(which is valid) we need to correctly use '-'.
Otherwise ':-' replaces such value with built-in default.
There are two reasons for this: first, this allows the client side to notice
that some PV has multiple devices associated with it and print appropriate
warnings. Second, if a duplicate device pops up and disappears, after this
change the original connection between the PV and device is not lost.
This removes dependency on lvm binary - if it's not present, all LVM
processing is skipped (shouldn't normally happen because if lvm binary
is missing then there's obviously nothing that would activate it, but
let's make sure).
Without this tight dependency on lvm, the blkdeactivate script can
be packaged with libdevmapper/dmsetup (in contrast to lvm as it was
before) and as such the script can still be used to handle other DM
devices.
Put in pvmove background process into list quickly.
Update API for aux add_to_kill_list()/kill_listed_processes().
Run on 'background' (&) only non-background pvmoves.
If the system is correctly configure (cache & thin tools are present)
avoid 'extra' rebuild of configuration.
On the other hand - if some tool is missing - duplicate ##LVMCONF should
make it more straighforward to see.
sharing connection between parent command and background
processes spawned from parent could lead to occasional failures
due to unexpected corruption in daemon responses sent to either child
or a parent.
lvmetad issued warning about duplicate config values in request.
LVM commands occasionaly failed w/ internal error after receving
corrupted response.
lvmetad connection is renewed when needed after explicit disconnect
in child
spawning a background polling from within the lv_change_activate
fn went to two problems:
1) vgchange should not spawn any background polling until after
the whole activation process for a VG is finished. Otherwise
it could lead to a duplicite request for spawning background
polling. This statement was alredy true with one exception of
mirror up-conversion polling (fixed by this commit).
2) due to current conditions in lv_change_activate lvchange cmd
couldn't start background polling for pvmove LVs if such LV was
about to get activated by the command in the same time.
This commit however doesn't alter the lvchange cmd so that it works same as
vgchange with regard to not to spawn duplicate background pollings per
unique LV.
Reenable TESTDIR & PREFIX replacement.
Since we need to replace string in proper order (1st. @TESTDIR@,
2nd. @PREFIX@), drop map and use plain string.
Drop timestamp logging when 'stacktracing'
This patch adds new options to lvmconf:
--enable-halvm (just like --enable-cluster, but configure LVM
for use in HA LVM - meaning disabling lvmetad and
making sure we have locking_type=1)
--disable-halvm (just like --disable-cluster, but configure LVM
back from HA LVM - meaning enabling lvmetad if
it's enabled by default and making sure we have
default locking type set)
--services (causes clvmd and lvmetad services to be enabled or
disabled appropriately and conforming to the changes
in lvm configuration we've just made with lvmconf)
--mirrorservice (in addition to clvmd and lvmetad services, also
enable or disable cmirrord service appropriately;
this is a separate option because cmirrord is
optional and it doesn't need to be always enabled
when clvmd is enabled)
--startstopservices (in addition to enabling or disabling services,
start and stop these services immediately)
These options are supposed to help users to make their system ready
for cluster with clvmd (active-active) or HA LVM (active-passive) use
while lvmconf script can handle services as well so users don't need
to bother about setting them manually.
Also, before this patch, we hardcoded global/use_lvmetad=0 as default
value in lvmconf script. Howeverm this default may change by just
flipping the value in config_settings.h and we may forget to edit
the lvmconf. It's better to use lvm dumpconfig --type default global/use_lvmetad
to get the actual default value and use this one instead of hardcoded one.
When performing initial allocation (so there is nothing yet to
cling to), use the list of tags in allocation/cling_tag_list to
partition the PVs. We implement this by maintaining a list of
tags that have been "used up" as we proceed and ignoring further
devices that have a tag on the list.
https://bugzilla.redhat.com/983600
Add A_PARTITION_BY_TAGS set when allocated areas should not share tags
with each other and allow _match_pv_tags to accept an alternative list
of tags. (Not used yet.)
Comments from the sample config files are copied into
the comment field of the config settings structure.
This includes only minimal changes to the text.
With this in place, the sample config files can
be generated from 'lvm dumpconfig', and content
for an lvm.conf man page can also be generated.
pv_write is called both to write orphans and to rewrite PV headers
of PVs in VGs. It needs to select the correct VG id so that the
internal cache state gets updated correctly.
It only affected commands that involved further steps after
the pv_write and was often masked because the metadata would
be re-read off disk and correct itself.
"Incorrect metadata area header checksum" warnings appeared.
Example:
Create vg1 containing dev1, dev2 and dev3.
Hide dev1 and dev2 from the system.
Fix up vg1 with vgreduce --removemissing.
Bring back dev1 and dev2.
In a single operation reinstate dev1 and dev2 into vg1 (vgextend).
Done as separate operations (automatically fix-up dev1 and dev2 as orphans,
then vgextend) it worked, but done all in one go the internal cache got
corrupted and warnings about checksum errors appeared.
When clvmd starts, it starts it's own command logging into debug.log.
This is interferring with our other command debug.log.
As as sideeffect we may experience log from command,
followed but lots of zeros and continued with clvmd log.
Fix it by renaming debug.log and now we could also print this trace
to get full list of clvmd activity nicely.
Also improve some post-mortem prints from udevadm and dmsetup to
make the output more usable.
There is no benefit in waking-up all the waiters
when there is no actual change in lock state.
This avoid some unnecessarily ping-pong effects like:
Resource V_LVMTEST15724vg retrying lock in mode:WRITE...
Resource V_LVMTEST15724vg already locked lockid=40, mode:WRITE
Resource V_LVMTEST15724vg retrying lock in mode:WRITE...
Resource V_LVMTEST15724vg already locked lockid=40, mode:WRITE
If the user provides '-m #' (# > 0) with mappings
raid4/5/6, the command silently creates
'#mirrors * #stripes + #parity' image component pairs.
Patch rejects '-m #' altogether for those mappings
in order to avoid LV creation with unexpected layout.
- resolves bz#1209445
Since now we have metadata parts running with normal speed,
we could avoid reinitilising delayed dev for every test.
(Saving seconds on cookie waits...)
If the device name is not found in our metadata,
we cannot call strdup few lines later with NULL name.
More intersting story goes behind how it happens -
pvmove removal is unfortunatelly 'multi-state' process
and at some point (for now) we have in lvm2 metadata
LV pvmove0 as stripe and mirror image as error.
If such metadata are left - we fail with any further removal.
Use common code for error_dev & delay_dev.
Both functions now take list of sectors.
From now on we could delay just 'extent' section, while
keeping running lvm commands fast (having native metadata area).
We cannot print TEST WARNING within test shell script
(since it's running in debug mode and thus always prints it)
Use 'should false' trick let the string printed in this case.
So 'non cluster cases' should now end properly.
If the kernel has 'new lcm()' (3.19) it provides wrong
optimal_io_size value for dm device so lvm2 command cannot
create properly aligned devices.
Use 'should' for this case - so test ends with 'TEST WARNING'.
Commit 80f4b4b803
introduced undesirable side-effects for lvm2app user
which happens to be our own python binding.
It appear obtaing pvs list keeps global lock.
So restricting this to VG_GLOBAL READ locks and skip
the drop skip if WRITE lock is held.
we do not allow 0 interval for pvmove command issued
without parameters with classical polldaemon. It would
query the kernel too often with possibly many pvmoves
in-progress.
So far pvmove_update_metadata (originaly _update_metadata) was
used for both initial and subsequent metadata updates during polling.
With a new polldaemon (lvmpolld) all operations that require polling
have to be split in two parts: The initiating one and the polling one.
The later step will be used from lvm command spawned by lvmpolld to
monitor and advance the mirror on next segment if required.
1) The initiation part is _update_metadata in pvmove.c which performs
only the first update, setting up the pvmove itself in metadata.
2) pvmove_update_metadata in pvmove_poll.c now handles all other
subsequent metadata updates except the last one.
Due to the split we could remove some code. Also some functions were
moved back to pvmove.c as they were suited for initialisation of pvmove
only.
This commit has no impact on functionality. Code required to
be visible outside lvconvert.c is just moved into new file
lvconvert_poll.c and some calls are made non-static and
declared in new header file lvconvert.h
This commit has no impact on functionality. Code required to
be visible outside pvmove.c is just moved into new file
pvmove_poll.c and some calls are made non-static and declared in
new header file pvmove.h
When lvm2-pvscan@.service and lvm2-lvmetad.service are scheduled to be
stopped lvm2-pvscan@.service should be stopped first since pvscan uses
lvmetad.
This is especially important if lvm2-lvmetad.socket is also scheduled to
be stopped as in this case connection requests are suppressed causing
pvscan to fail.
_check_lv_status was called from within dm_list_iterate_items cycle.
This was utterly wrong! _check_lv_status may remove more than one LV from
vg->lvs list we iterated in the same time.
In some scenarios this could lead to deadlock iterationg over same LV
indefinitely or segfault depending on the circumstances.
Fixed by moving the _check_lv_status outside iterating the vg->lvs
list.
Note that commit 6e7b24d34f was not enough
as _check_lv_status may result in removal of more than one LV from the list.
Improve testing for condition that pvmove0 is already running in the
table (so we do not kill pvmove while it has loaded target, but
it's not yet Live).
Also delay_dev for 200ms.
When we use /dev/loopX device - shift first PV1 sector by 1M
so /dev/loop0 and dm device do not appear as same device.
Also notify lvmetad once 'devs' are created - so in case this
command is called in the middle of test - lvmetad properly
drops its metadata for these devices.
Drop used test.img file between reuse so the 'prepare_vg'
always starts with zeroed disks.
When LVM_TEST_AUX_TRACE is set, allow shell tracing of aux commands.
Do not keep dangling LVs if they're removed from the vg->lvs list and
move them to vg->removed_lvs instead (this is actually similar to already
existing vg->removed_pvs list, just it's for LVs now).
Once we have this vg->removed_lvs list indexed so it's possible to
do lookups for LVs quickly, we can remove the LV_REMOVED flag as
that one won't be needed anymore - instead of checking the flag,
we can directly check the vg->removed_lvs list if the LV is present
there or not and to say if the LV is removed or not then. For now,
we don't have this index, but it may be implemented in the future.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
When we're iterating over LVs in _poll_vg fn, we need to use the safe
version of iteration - the LV can be removed from the list which we're
just iterating over if we're finishing or aborting pvmove operation.
The code never mixes reads of committed and precommitted metadata,
so there's no need to attempt to set PRECOMMITTED when
*use_previous_vg is being set.
Refactor the recent metadata-reading optimisation patches.
Remove the recently-added cache fields from struct labeller
and struct format_instance.
Instead, introduce struct lvmcache_vgsummary to wrap the VG information
that lvmcache holds and add the metadata size and checksum to it.
Allow this VG summary information to be looked up by metadata size +
checksum. Adjust the debug log messages to make it clear when this
shortcut has been successful.
(This changes the optimisation slightly, and might be extendable
further.)
Add struct cached_vg_fmtdata to format-specific vg_read calls to
preserve state alongside the VG across separate calls and indicate
if the details supplied match, avoiding the need to read and
process the VG metadata again.
Fixes segfault when 'pvs' encounters two different PVs sharing the same
uuid but one an orphan, the other in a VG.
If VG_GLOBAL is held, there seems no point in doing a full scan more
than once.
If undesirable side-effects show up, we can try restricting this to
VG_GLOBAL READ locks. The original code dates back to 2.02.40.
When pvscan --cache --major --minor command is issued from
udev REMOVE event, it basically resulted into a whole device
scan since the device was missing. So avoid such scan
and first check via /sysfs (when available) if such device actually
exists.
When available use nanosecond stat info.
If commands are running closely enough after config update,
the .cache file from persistent filter could have been ignored.
This happens sometimes during i.e. synthetic test suite run.
There is no reason to support persistent major/minor numbers
for pool volumes - it's only meant to be supported for filesystems
(since i.e. nfs may need to keep volume on a persistent device node.)
Support for pools is now explicitely disabled and documented.
Metadata areas which are marked as ignored should not be scanned
and read during pvscan --cache. Otherwise, this can cause lvmetad
to cache out-of-date metadata in case other PVs with fresh metadata
are missing by chance.
Make this to work like in non-lvmetad case where the behaviour would
be the same as if the PV was orphan (in case we have no other PVs
with valid non-ignored metadata areas).
Simplify the function usage and clean up parameter parsing.
There were 2 significant changes made in the test itself
(they passed before because of incorrect shell string handling)
-pvs_sel 'tags="pv_tag1"' "$dev1 $dev2"
+sel pv 'tags="pv_tag1"' "$dev1" "$dev6"
-lvs_sel '(lv_name=vol1 || lv_name=vol2) || vg_tags=vg_tag1' "vol1 vol2
abc orig snap"
+sel lv '(lv_name=vol1 || lv_name=vol2) || vg_tags=vg_tag1' vol1 vol2
orig snap xyz
The iscsi-shutdown.service is the one responsible for logging out
iscsi sessions so blk-availability.service (running the blkdeactivate
script) should be run before that on shutdown (so we need to use
After=iscsi-shutdown.service because "After" relates to starting
the service and the opposite order is automatically applied on
stopping the service at shutdown).
Avoid busy-looping on CPU while reading socket pipe
and always call read only when select tells there is
something for read.
Change the batch output to old nicer output.
When lvm1 PVs are visible, and lvmetad is used, and the foreign
option was included in the reporting command, the reporting
command would fail after the 'pvscan all devs' function saw
the lvm1 PVs. There is no reason the command should fail
because of the lvm1 PVs; they should just be ignored.
Return 1 on success in pvdisplay_short() and lvdisplay_full()
so commands like vgdisplay are not printinig stracktraces
on successful passes.
As the results of fail/success have been internally ignored for those
calls, it had no other visible side effect - command's return value was
still 0 (success).
Detect an lvm1 system id by looking at the WRITE_LOCKED flag.
Don't copy this lvm1 system id into vg->system_id so that the
restrictions associated with the new system id are not applied
to the old VG with the inherited lvm1 system id.
The string reported by uname -n may include characters
that lvm omits from the system id (like parens, as seen
on a test machine.) Check against the final system id
string that lvm uses.
Since we take a lock inside vg_lock_newname() and we do a full
detection of presence of vgname inside all scanned labels,
there is no point to do this for second time to be sure
there is no such vg.
The only side-effect of such call would be a full validation of
some already exising VG metadata - but that's not the task for
vgcreate when create a new VG.
This call noticable reduces number of scans during 'vgcreate'.
Use similar logic as with text_vg_import_fd() and avoid repeated
parsing of same mda and its config tree for vgname_from_mda().
Remember last parsed vgname, vgid and creation_host in labeller
structure and if the metadata have the same size and checksum,
return this stored info.
TODO: The reuse of labeller struct is not ideal, some lvmcache API for
this functionality would be nicer.
When reading VG mda from multiple PVs - do all the validation only
when mda is seen for the first time and when mda checksum and length
is same just return already existing VG pointer.
(i.e. using 300PVs for a VG would lead to create and destroy 300 config trees....)
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
The seg_monitor did not display monitored status for thick snapshots
and mirrors (with mirror log *not* mirrored). The seg monitor did work
correctly even before for other segtypes - thins and raids.
Before (mirrors and snapshots, only mirrors with mirrored log properly displayed monitoring status):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot
With this patch applied (monitoring status displayed for all mirrors and snapshots):
[0] f21/~ # lvs -a -o lv_name,lv_layout,lv_role,seg_monitor vg
LV Layout Role Monitor
mirror mirror public monitored
[mirror_mimage_0] linear private,mirror,image
[mirror_mimage_1] linear private,mirror,image
[mirror_mlog] linear private,mirror,log
mirror_with_mirror_log mirror public monitored
[mirror_with_mirror_log_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mimage_1] linear private,mirror,image
[mirror_with_mirror_log_mlog] mirror private,mirror,log monitored
[mirror_with_mirror_log_mlog_mimage_0] linear private,mirror,image
[mirror_with_mirror_log_mlog_mimage_1] linear private,mirror,image
thick_origin linear public,origin,thickorigin
thick_snapshot linear public,snapshot,thicksnapshot monitored
If configuration setting is marked in config_setting.h with CFG_DISABLED
flag, default value is always used for such setting, no matter if it's defined
by user (in --config/lvm.conf/lvmlocal.conf).
A warning message is displayed if this happens:
For example:
[1] f21/~ # lvm dumpconfig --validate
WARNING: Configuration setting global/system_id_source is disabled. Using default value.
LVM configuration valid.
[1] f21/~ # pvs
WARNING: Configuration setting global/system_id_source is disabled. Using default value.
PV VG Fmt Attr PSize PFree
/dev/sdb lvm2 --- 128.00m 128.00m
...
Though vgremove operates per VG by definition, internally, it
actually means iterating over each LV it contains to do the
remove.
So we need to direct selection a bit in this case so that the
selection is done per-VG, not per-LV.
That means, use processing handle with void_handle.internal_report_for_select=0
for the process_each_lv_in_vg that is called later in vgremove_single fn.
We need to disable internal selection for process_each_lv_in_vg
here as selection is already done by process_each_vg which calls
vgremove_single. Otherwise selection would be done per-LV and not
per-VG as we intend!
An intra-release fix for commit 00744b053f.
Set ACCESS_NEEDS_SYSTEM_ID VG status flag whenever there is
a non-lvm1 system_id set. Prevents concurrent access from
older LVM2 versions.
Not set on VGs that bear a system_id only due to conversion
from lvm1 metadata.
Export _lvm1_system_id as generate_lvm1_system_id and call it in
vg_setup() so it is set before writing the metadata to disk
and not missing from the initial metadata backup file.
format_text processes both lvm2 on-disk metadata and metadata read
from other sources such as backup files. Add original_fmt field
to retain the format type of the original metadata.
Before this patch, /etc/lvm/archives would contain backups of
lvm1 metadata with format = "lvm2" unless the source was lvm1 on-disk
metadata.
The vg->lvm1_systemd_id needs to be initialized as all the code around
counts with that. Just like we initialize lvm1_system_id in vg_create
(no matter if it's actually LVM1 or LVM2 format), this patch adds this
init in alloc_vg as well so the rest of the code does not segfaul
when trying to access vg->lvm1_system_id.
In log messages refer to it as system ID (not System ID).
Do not put quotes around the system_id string when printing.
On the command line use systemid.
In code, metadata, and config files use system_id.
In lvmsystemid refer to the concept/entity as system_id.
Two new functions added in the init script: rh_status and rh_status_q.
First one to be used in status() and second one to be used in start(),
stop(), force_stop(). Check for 'dmeventd' added and print list of
lvs being monitored in status().
"!dev_cache_get(argv[i], cmd->full_filter) && !rescan_done" --> "!rescan_done && !dev_cache_get(argv[i], cmd->full_filter)
Check the simple condition first (variable), then the function return value
(which in this case certainly takes more time to evaluate) - save some time.
Two problems fixed by this patch:
- PV tags were not recognized at all when using them with pvs
report that has only label fields (regression since 2.02.105)
- incorrect persistent .cache file to be generated after pvs
report that has only label fields (regression since 2.02.106)
These bugs come from the transition from process_each_pv to
process_each_label introduced by commit
67a7b7a87d and commit
490226fc47 and related.
Commands that can never use foreign VGs begin with
cmd->error_foreign_vgs = 1. This tells the vg_read
lib layer to print an error as soon as a foreign VG
is read.
The toollib process_each layer also prints an error if a
foreign VG is read, but is more selective about it. It
won't print an error if the command did not explicitly
name the foreign VG. We want to silently ignore foreign VGs
unless a command attempts to use one explicitly.
So, foreign VG errors are printed from two different layers:
vg_read (lower layer) and process_each (upper layer).
Commands that use toollib process_each, only want errors from
the process_each layer, not from both layers. So, process_each
disables the lower layer vg_read error message by setting
error_foreign_vgs = 0.
Commands that do not use toollib process_each, want errors
from the vg_read layer, otherwise they would get no error
message. The original cmd->error_foreign_vgs setting
enables this error.
(Commands that are allowed to operate on foreign VGs always
begin with cmd->error_foreign_vgs = 0, and all the commands
in this group use toollib process_each with the selective
error reporting.)
If an LV is already rw but still ro in the kernel, allow -prw to issue a
refresh to try to change the kernel state to rw.
Intended for use after clearing activation/read_only_volume_list in
lvm.conf.
The only realistic way for a host to have active LVs in a
foreign VG is if the host's system_id (or system_id_source)
is changed while LVs are active.
In this case, the active LVs produce an warning, and access
to the VG is implicitly allowed (without requiring --foreign.)
This allows the active LVs to be deactivated.
In this case, rescanning PVs for the VG offers no benefit.
It is not possible that rescanning would reveal an LV that
is active but wasn't previously in the VG metadata.
cmirror uses the CPG library to pass messages around the cluster and maintain
its bitmaps. When a cluster mirror starts-up, it must send the current state
to any joining members - a checkpoint. When mirrors are large (or the region
size is small), the bitmap size can exceed the message limit of the CPG
library. When this happens, the CPG library returns CPG_ERR_TRY_AGAIN.
(This is also a bug in CPG, since the message will never be successfully sent.)
There is an outstanding bug (bug 682771) that is meant to lift this message
length restriction in CPG, but for now we work around the issue by increasing
the mirror region size. This limits the size of the bitmap and avoids any
issues we would otherwise have around checkpointing.
Since this issue only affects cluster mirrors, the region size adjustments
are only made on cluster mirrors. This patch handles cluster mirror issues
involving pvmove, lvconvert (from linear to mirror), and lvcreate. It also
ensures that when users convert a VG from single-machine to clustered, any
mirrors with too many regions (i.e. a bitmap that would be too large to
properly checkpoint) are trapped.
A foreign VG should be silently ignored by a reporting/display
command like 'vgs'. If the reporting/display command specifies
a foreign VG by name on the command line, it should produce an
error message.
Scanning commands pvscan/vgscan/lvscan are always allowed to
read and update caches from all PVs, including those that belong
to foreign VGs.
Other non-report/display/scan commands always ignore a foreign
VG, or report an error if they attempt to use a foreign VG.
vgimport should always invalidate the lvmetad cache because
lvmetad likely holds a pre-vgexported copy of the VG.
(This is unrelated to using foreign VGs; the pre-vgexported
VG may have had no system_id at all.)
Add --foreign to the remaining reporting and display commands plus
vgcfgbackup.
Add a NEEDS_FOREIGN_VGS flag for vgimport to always set --foreign.
If lvmetad is being used with --foreign, scan foreign VGs (currently
implemented as a full PV scan).
Handle these things centrally in lvmcmdline.c.
Also allow lvchange and vgchange -an/-aln to deactivate any foreign
LVs that happen to be active if something went wrong.
Remember to set the system ID when creating a new VG in vgsplit.
When checking whether the system ID permits access to a VG, check for
each permitted situation first, and only then issue the appropriate
error message. Always issue a message for now. (We'll try to
suppress some of those later when the VG concerned wasn't explicitly
requested.)
Add more messages to try to ensure every return code is checked and
every error path (and only an error path) contains a log_error().
Add self-correction to vgchange -c to deal with situations where
the cluster state and system ID state are out-of-sync (e.g. if
old tools were used).
Move the lvm1 sys ID into vg->lvm1_system_id and reenable the #if 0
LVM1 code. Still display the new-style system ID in the same
reporting field, though, as only one can be set.
Add a format feature flag FMT_SYSTEM_ON_PVS for LVM1 and disallow
access to LVM1 VGs if a new-style system ID has been set.
Treat the new vg->system_id as const.
Allow cmd->unknown_system_id to be cleared during toolcontext
refresh.
Set a default value of "none" for global/system_id_source.
Allow local/system_id to be empty so it's not impossible for
a later config file to remove it.
In a file containing a system ID:
Any whitespace at the start of a line is ignored;
Blank lines are ignored;
Any characters after a # are ignored along with the #.
The system ID is obtained by processing the first line with
non-ignored characters.
If further lines with non-ignored characters follow, a warning is
issued.
Add WARNING messages if there are problems setting the requested
system ID.
Ban "localhost" as a prefix regardless of the system_id_source.
Use cmd->hostname instead of calling uname again.
Make system_id_source values case-insensitive (as with new settings like
log_debug_classes) and also accept machine-id to match the filename.
Require system ID to begin with an alphanumeric character.
Rename fn to make clear it's only validation for systemid
and always terminate result rather than imposing this on the caller.
In 2.02.99, _init_tags() inadvertently began to ignore the
dm_config_tree struct passed to it. "tags" sections are not
merged together, so the "tags" section in the main config file was
being processed repeatedly and other "tags" sections were ignored.
Dop unused value assignments.
Unknown is detected via other combination
(!linear && !striped).
Also change the log_error() message into a warning,
since the function is not really returning error,
but still keep the INTERNAL_ERROR.
Ret value is always set later.
The dev ext source must be reset for the dev_cache_get call
(which evaluates filters), not lvmcache_label_scan - so fix
original commit 727c7ff85d.
Also, add comments in _pvcreate_check fn explaining why
refresh filter and rescan is needed and exactly in which
situations.
We exclude some signatures from being wiped when using blkid wiping.
These are signatures which we simply overwrite. For example, the
LVM2_member signature which denotes a PV - if we call pvcreate on
existing PV, we just overwrite the PV header, no need to wipe it.
Previously, we counted such signatures as if they were wiped
and they were counted in the final number of wiped signatures
that _wipe_known_signatures_with_blkid fn returned in the "wiped"
output arg. Then the code checking this output arg could be
mislead that wiping happened while no wiping took place in real
and we could fire some code uselessly based on this information
(e.g. refreshing filters/rescanning - see also
commit 6b4066585f).
Before, we refreshed filters and we did full rescan of devices if
we passed through wiping (wipe_known_signatures fn call). However,
this fn returns success even if no signatures were found and so
nothing was wiped. In this case, it's not necessary to do the
filter refresh/rescan of devices as nothing changed clearly.
This patch exports number of wiped signatures from all the
wiping functions below. The caller (_pvcreate_check) then checks
whether any wiping was done at all and if not, no refresh/rescan
is done, saving some time and resources.
pvcreate code path executes signature wiping if there are any signatures
found on device to prepare the device for PV. When the signature is wiped,
the WATCH udev rule triggers the event which then updates udev database
with fresh info, clearing the old record about previous signature.
However, when we're using udev db as dev-ext source, we'd need to wait
for this WATCH-triggered event. But we can't synchronize against such
events (at least not at this moment). Without this sync, if the code
continues, the device could still be marked as containing the old
signature if reading udev db. This may end up even with the device
to be still filtered, though the signature is already wiped.
This problem is then exposed as (an example with md components):
$ mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb --run
$ mdadm -S /dev/md0
$ pvcreate -y /dev/sda
Wiping linux_raid_member signature on /dev/sda.
/dev/sda: Couldn't find device. Check your filters?
$ echo $?
5
So we need to temporarily switch off "udev" dev-ext source here
in this part of pvcreate code until we find a way how to sync
with WATCH events.
(This problem does not occur with signature wiping which we do
on newly created LVs since we already handle this properly with
our udev flags - the LV_NOSCAN/LV_TEMPORARY flag. But we can't use
this technique for non-dm devices to keep WATCH rule under control.)
(This reverts patch #d95c6154)
Filter complete device list through full_filter unconditionally when
we're getting the list of *all* devices even in case we're interested
only in fraction of those devices - the PVs, not the other devices
which are not PVs yet (e.g. pvs vs. pvs -a).
We need to do this full filtering whenever we're handling *complete*
list of devices, we need to be safe here, mainly if there are any
future changes and we'd forgot to change to use proper filtering then.
Also properly preventing duplicates if there are any block subsystem
components used (mpath, MD ...).
Thing here is that (under use_lvmetad=1), cmd->filter can be used
only if we're sure that the list of devices we're filtering contains
only PVs. We have to use cmd->full_filter otherwise (like it is in
case of _get_all_devices fn which acquires complete list of devices,
no matter if it is a PV or not).
Of course, cmd->full_filter is more extensive than cmd->filter
which is only a subset of full_filter.
We could optimize this in a way that if we're interested in PVs only
during process_each_pv processing (e.g. using pvs in contrast to pvs -a),
we'd get the list of PV devices directly from lvmetad from the
lvmcache_seed_infos_from_lvmetad fn call which currently updates
lvmcache only. We'd add an additional output arg for this fn to get
the list of PV devices directly in addition, without a need to iterate
over all devices which include non-PVs which we're not interested in
anyway, hence we could use only cmd->filter, not the cmd->full_filter.
So the code would look something like this:
static int _get_all_devices(....)
{
struct device_id_list *dil;
if (interested_in_pvs_only)
lvmcache_seed_infos_from_lvmetad(cmd, &dil); /* new "dil" arg */
/* the "dil" list would be filtered through cmd->filter inside lvmcache_seed_infos_from_lvmetad */
else {
lvmcache_seed_infos_from_lvmetad(cmd, NULL);
dev_iter_create(cmd->full_filter)
while (dev = dev_iter_get ...) {
dm_list_add(all_devices, &dil->list);
}
}
}
pvchange now uses process_each_pv so uncomment parts of the test
which check proper functionality of intersection between selection
result and PVs or PV tags directly provided on command line. This
didn't work properly before when pvchange was not using process_each_pv.
For example:
pvchange -u -S 'pv_name=/dev/sda' /dev/sdb
..changes nothing since clearly the intersection of /dev/sda and
/dev/sdb is empty set. The same applies for tags:
pvchange -u -S 'pv_name=/dev/sda' @some_tag
..changes nothing if /dev/sda is not tagged with some_tag.
It's cleaner this way - do not mix static and dynamic
(init_processing_handle) initializers. Use the dynamic one everywhere.
This makes it easier to manage the code - there are no "exceptions"
then and we don't need to take care about two ways of initializing the
same thing - just use one common initializer throughout and it's clear.
Also, add more comments, mainly in the report_for_selection fn explaining
what is being done and why with respect to the processing_handle and
selection_handle.
Invalid devices no longer included in the counters printed at the end.
May now need to use --ignoreskippedcluster if relying upon exit status.
If more than one change is requested per-PV, attempt to perform them
all. Note that different arguments still handle exit status
differently.
When lvm2 is build with valgrind pool detection - always disable
memcheck, since pool memory allocation are unconditionaly passed
into valgrind library.
We still need to get the list as the calls underneath process_each_pv
rely on this list. But still keep the change related to the filters -
if we're processing all devices, we need to use cmd->full_filter.
If we're processing only PVs, we can use cmd->filter only to save
some time which would be spent in filtering code.
When lvmetad is used and at the same time we're getting list of all
PV-capable devices, we can't use cmd->filter (which is used to filter
out lvmetad responses - so we're sure that the devices are PVs already).
To get the list of PV-capable devices, we're bypassing lvmetad (since
lvmetad only caches PVs, not all the other devices which are not PVs).
For this reason, we have to use the "full_filter" filter chain (just
like we do when we're running without lvmetad).
Example scenario:
- sdo and sdp components of MD device md0
- sdq, sdr and sds components of mpatha multipath device
- mpatha multipath device partitioned
- vda device partitioned
=> sdo,sdp,sdr,sds, mpatha and vda should be filtered!
$ lsblk -o NAME,TYPE
NAME TYPE
sdn disk
sdo disk
`-md0 raid0
sdp disk
`-md0 raid0
sdq disk
`-mpatha mpath
`-mpatha1 part
sdr disk
`-mpatha mpath
`-mpatha1 part
sds disk
`-mpatha mpath
`-mpatha1 part
vda disk
|-vda1 part
`-vda2 part
|-fedora-swap lvm
`-fedora-root lvm
Before this patch:
==================
use_lvmetad=0 (correct behaviour!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
use_lvmetad=1 (incorrect behaviour - sdo,sdp,sdq,sdr,sds and mpatha not filtered!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/sdo --- 0 0
/dev/sdp --- 0 0
/dev/sdq --- 0 0
/dev/sdr --- 0 0
/dev/sds --- 0 0
/dev/vda --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
With this patch applied:
========================
use_lvmetad=1
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
List of all devices is only needed if we want to process devices
which are not PVs (e.g. pvs -a). But if this is not the case, it's
useless to get the list of all devices and then discard it without
any use, which is exactly what happened in process_each_pv where
the code was never reached and the list was unused if we were
processing just PVs, not all PV-capable devices:
int process_each_pv(...)
{
...
process_all_devices = process_all_pvs &&
(cmd->command->flags & ENABLE_ALL_DEVS) &&
arg_count(cmd, all_ARG);
...
/*
* If the caller wants to process all devices (not just PVs), then all PVs
* from all VGs are processed first, removing them from all_devices. Then
* any devs remaining in all_devices are processed.
*/
_get_all_devices(cmd, &all_devices);
...
ret = _process_pvs_in_vgs(...);
...
if (!process_all_devices)
goto out;
ret = _process_device_list(cmd, &all_devices, handle, process_single_pv);
...
}
This patch adds missing check for "process_all_devices" and it gets the
list of all (including non-PV) devices only if needed:
This makes a difference when using selection criteria based on
these fields - if those fields are defined as DM_REPORT_FIELD_TYPE_SIZE
(in contrast to DM_REPORT_FIELD_TYPE_NUMBER), units are also
recognize in selection clause.
For example:
$ lvs -o+seg_start vg1/lv2
LV VG Attr LSize Start
lv2 vg1 -wi-a----- 12.00m 0
lv2 vg1 -wi-a----- 12.00m 8.00m
Before this patch:
$ lvs -o+seg_start --select 'seg_start=8m'
Found size unit specifier but numeric value expected for selection field seg_start.
Selection syntax error at 'seg_start=8m'.
Use 'help' for selection to get more help.
With this patch applied:
$lvs -o+seg_start --select 'seg_start=8m'
LV VG Attr LSize Start
lv2 vg1 -wi-a----- 12.00m 8.00m
(the same applies for ba_start and vg_free fields)
The LVM_COMMAND_PROFILE env var is new - mention it in dumpconfig's
man page.
Also, dumpconfig always displays the top of the config cascade.
To display all the config found in the cascade merged (just like
it's used during LVM command processing), --mergedconfig option
must be used - this one's already described in that man page,
just make sure it's clear and add reference for this option also
in --profile/--commandprofile/--metadataprofile description.
This is a followup patch for previous patchset that enables selection in
process_each_* fns to fix an issue where field prefixes are not
automatically used for fields in selection criteria.
Use initial report type that matches the intention of each process_each_* functions:
- _process_pvs_in_vg - PVS
- process_each_vg - VGS
- process_each_lv and process_each_lv_in_vg - LVS
This is not normally needed for the selection handle init, BUT we would
miss the field prefix matching, e.g.
lvchange -ay -S 'name=lvol0'
The "name" above would not work if we didn't initialize reporting with
the LVS type at its start. If we pass proper init type, reporting code
can deduce the prefix automatically ("lv_name" in this case).
This report type is then changed further based on what selection criteria we
have. When doing pure selection, not report output, the final report type
is purely based on combination of this initial report type and report types
of the fields used in selection criteria.
We already allowed -S|--select with {vg,lv,pv}display -C (which
was then equal to {vg,lv,pv}s command. Since we support selection
in toolib now, we can support -S also without using -C in *display
commands now.
pvchange is an exception that does not use toollib yet for iterating
over the list of PVs (process_each_pv) so intialize the
processing_handle and use just like it's used in toollib.
We have 3 input report types:
- LVS (representing "_select_match_lv")
- VGS (representing "_select_match_vg")
- PVS (representing "_select_match_pv")
The input report type is saved in struct selection_handle's "orig_report_type"
variable.
However, users can use any combination of fields of different report types in
selection criteria - the resulting report type can thus differ. The struct
selection_handle's "report_type" variable stores this resulting report type.
The resulting report_type can end up as one of:
- LVS
- VGS
- PVS
- SEGS
- PVSEGS
This patch adds logic to report_for_selection based on (sensible) combination
of orig_report_type and report_type and calls appropriate reporting functions
or iterates over multiple items that need reporting to determine the selection
result.
The report_for_selection does the actual "reporting for selection only".
The selection status will be saved in struct selection_handle's "selected"
variable.
The code to determine final report type based on combination of input
report type (determined from fields used for reporting to output and selection)
can be reused for pure reporting for selection - factor out this code into
_get_final_report_type function.
This applies to:
- process_each_lv_in_vg - the VG is selected only if at least one of its LVs is selected
- process_each_segment_in_lv - the LV is selected only if at least one of its LV segments is selected
- process_each_pv_in_vg - the VG is selected only if at least one of its PVs is selected
- process_each_segment_in_pv - the PV is selected only if at least one of its PV segments is selected
So this patch causes the selection result to be properly propagated up to callers.
Call _init_processing_handle, _init_selection_handle and
_destroy_processing_handle in process_each_* and related functions to
set up and destroy handles used while processing items.
The init_processing_handle, init_selection_handle and
destroy_processing_handle are helper functions that allocate and
initialize the handles used when processing items in process_each_*
and related functions.
The "struct processing_handle" contains handles to drive the selection/matching
so pass it to the _select_match_* functions which are entry points to the
selection mechanism used in process_each_* and related functions.
This is revised and edited version of former Dave Teigland's patch which
provided starting point for all the select support in process_each_* fns.
The new "report_init_for_selection" is just a wrapper over
dm_report_init_with_selection that initializes reporting for selection
only. This means we're not going to do the actual reporting to output
for display and as such we intialize reporting as if no fields are reported
or sorted. The only fields "reported" are taken from the selection criteria
string and all such fields are marked as hidden automatically (FLD_HIDDEN flag).
These fields are used solely for selection criteria matching.
Also, modify existing report_object function that was used for reporting to
output for display. Now, it can either cause reporting to output or reporting
for selection only. The selection result is stored in struct selection_handle's
"selected" variable which can be handled further by any report_object caller.
This patch replaces "void *handle" with "struct processing_handle *handle"
in process_each_*, process_single_* and related functions.
The struct processing_handle consists of two handles inside now:
- the "struct selection_handle *selection_handle" used for
applying selection criteria while processing process_each_*,
process_single_* and related functions (patches using this
logic will follow)
- the "void* custom_handle" (this is actually the original handle
used before this patch - a pointer to custom data passed into
process_each_*, process_single_* and related functions).
The new dm_report_object_is_selected fn makes it possible to opt whether the
object reported should be displayed on output or not. Also, in addition to
that, it makes it possible to save the result of selection (either 0 or 1).
So dm_report_object_is_selected is simply more general form of object
reporting fn - combinations now allow for:
dm_report_object_is_selected(rh, object, 1, NULL):
This is exactly the original dm_report_object fn and it's fully equal
to it.
dm_report_object_is_selected(rh, object, 0, selected):
Do not display the result on output, but save info whether the object
is selected or not in 'selected' variable.
dm_report_object_is_selected(rh, object, 1, selected):
Display the result on output (if it passes selection criteria) and save
whether the object is selected or not in 'selected' variable.
dm_report_object(rh, object, 0, NULL):
This combination is not allowed - it will end up with internal error.
We're either interested in selection status or we want to display the
result on output or both, but never nothing of the two.
Once LVM_COMMAND_PROFILE environment variable is specified, the profile
referenced is used just like it was specified using "<lvm command> --commandprofile".
If both --commandprofile cmd line option and LVM_COMMAND_PROFILE env
var is used, the --commandprofile cmd line option gets preference.
- The RPM build and the tests are now executed in separate VMs.
- Run the testsuite by using the new lvm2-testsuite RPM.
- The VM running the tests is restarted from the outside if it hangs, and the
runner keeps a journal to avoid running a bad test ad infinitum.
- TODO: lcov reports and more intelligent VM rebooting (track the journal)
all sockets opened by a daemon or handed over by systemd
have to have CLOEXEC flag set. Otherwise we get nasty
warnings about leaking descriptors in processes spawned by
daemon.
Older lvm2 tools where always providing linear mapping for thin pool.
Recent lvm2 version however support external usage of thin pool and
empty/unused pools are loaded without such external linear mapping.
So this patch covers 'upgrade' problem, where older tool has activated
thin-pool with 'linear' layer mapping, and newer tools didn't expected
such mapping to exist and were not able to deactivate such table.
So before checking for new layout in dm-table, check if there is not
an old one already there.
After commit 158e998876 where we may
start to readlv_attr with a 'shared' ioctl call for a single lvs line
we where obtaing single status for thin pools.
However this is not properly reflecting lvm2 reality.
Correcting this by reading lv status from layered thin pool, but lv info
from non-layered (linear) mapped device which is maintained for proper
cluster locking.
Just like MD filtering that detects components of software RAID (md),
add detection for firmware RAID.
We're not adding any native code to detect this - there are lots of
firmware RAIDs out there which is just out of LVM scope. However,
with current changes with which we're able to get device info from
external sources (e.g. external_device_info_source="udev"), we can
do this easily if the external device status source has this kind
of information - which is the case of "udev" source where the results
of blkid scans are stored.
This detection should cover all firmware RAIDs that blkid can detect and
which are identified as:
ID_FS_TYPE = {adaptec,ddf,hpt45x,hpt37x,isw,jmicron,lsi_mega,nvidia,promise_fasttrack,silicon_medley,via}_raid_member
Partitioned devices are marked in udev db as:
ID_PART_TABLE="<partition table type name>"
and at the same time they are *not* marked with:
ID_PART_ENTRY_DISK="<parent disk major:minor>"
Where partition table type name is dos/gpt/... But checking the presence
of this variable is enough for LVM here - it just needs to know whether
there's a partition table or not, not interested in the actual type.
The same applies for parent disk major:minor.
The filter-partitioned code should contain only checks in "partition" domain.
The check for pv_min_size should actually be a part of filter-usable.
If the device size is less than pv_min_size, such device is not usable
as a PV so this check clearly belongs here logically.
With udev external info source, we can get device size via libudev's
sysfs reading interface and we can avoid opening the device this way
effectively.
mpath components are marked in udev db as:
ID_FS_TYPE="mpath_member"
or
DM_MULTIPATH_DEVICE_PATH="1"
(it depends on udev rule/blkid version used for handling mpath)
Composite filter is a filter that can put several filters in one set.
This patch adds a switch when creating the composite filter which will
enable or disable external device info handles for all the filters
the composite filter encompasses.
We want to use this external device info for majority of the filters
which are in the "lvmetad filter chain" (or the respective part if
we're not using lvmetad).
Following patches will use the enabled external device handle in
concrete filters from the composite filter...
for_each_sub_lv() now scans in depth also pools, however for
rename we actually do want to skip pools.
So add a new for_each_sub_lv_except_pools() to be used by rename,
every other user of for_each_sub_lv() scans every sub LV with pools
included.
This is i.e. necessary for properly working preload of pools
that are using raid arrays.
LVSINFO, LVSSTATUS and LVSINFOSTATUS is the same as LVS, just with some
extra info/status decoration attached to it. Recognize this when looking
for properties for lvm2app. This fixes lvm_lv_get_property lvm2app call
for fields which already use LVS{INFO,STATUS,INFOSTATUS} - currently,
this is lv_attr field which was converted to LVSINFOSTATUS from
pure LVS type.
This is a regression from v115 where some of the fields/properties
were converted to using the common "struct lvinfo" and
"struct lv_seg_status" so we don't need to issue info and status
ioctl several times per one reported line. Not all fields are
converted yet, but one that *is* converted is the lv_attr field
with the lv_attr_dup counterpart used in lvm_lv_get_attr lvm2app fn.
These changes were introduced with e34b004422
and later - this patch introduced the "info_ok" field in the
lv_with_info_and_seg_status structure which encapsulates the lvinfo
and lv_seg_status struct.
For the lv_attr_dup, the lv_attr_dup code missed the
assignment for the "info_ok" flag which saves the result of the
lv_info_with_seg_status call. Hence such info was marked
as unusable - unknown and it was returned as such via lvm_lv_get_attr
lvm2app fn.
When cache_mode is undefined, the read of metadata will miss to
set a bit with mode and fails to process metadata on internal
error:
Internal error: LV vg/lvol1 has uknown feature flags 0.
Fix it by setting it to writethrough mode.
Check splitted leg is active before preload.
(Since splitmirrors currently only does work active raid volumes
it's not a change for current code flow).
Minor optimization included - when already positively checked
for raid image don't check again for raid metadata.
for_each_sub_lv() normally does not put pool_lv into deps.
So for now go around it in 'lv_preload()' and add explicit
call with pool.
TODO: think about a better way, we want pool_lv deps only in certain
moments, so maybe for_each_sub_lv() needs new arg for this.
When repairing thin pool or swapping thin pool metadata,
preserve chunk_size property and avoid to be automatically changed
later in the code to better match thin pool metadata size.
When raid leg is extracted, now the preload code handles this state
correctly and put proper new table entry into dm tree,
so the activation of extracted leg and removed metadata works
after commit.
When raid is being splitted, extracted leg & metadata
is still floating in the table - and thus we need to
detect this case and properly preload their matching
table so consequent activation of extracted LVs properly
renames (and FREES) existing raid images, so ongoing
image name shifting will work.
For example, with dmeventd/executable set to "" which is not allowed for
this setting, the config validation now ends up with:
$ lvm dumpconfig --validate
Configuration setting "dmeventd/executable" invalid. It cannot be set to an empty value.
LVM configuration invalid.
This check for empty values for string config settings was not
done before (we only checked empty arrays, but not scalar strings).
Rename original lv_error_when_full field to lv_when_full and also
convert it from binary field to string field displaying three
possible values: "error", "queueu" or "" (blank for undefined).
$ lvs vg/pool vg/pool1 vg/linear_lv -o+lv_when_full
LV VG Attr LSize Data% Meta% WhenFull
linear_lv vg -wi-a----- 4.00m
pool vg twi-aotz-- 4.00m 0.00 0.98 queue
pool1 vg twi-a-tz-- 4.00m 0.00 0.88 error
For -S|--select these synonyms are recognized:
"error" -> "error when full", "error if no space"
"queue" -> "queue when full", "queue if no space"
"" -> "undefined"
The arg check using pvs is unnecessary. If the arg is not a PV,
the command will just fail later. Using the pvs command at this
point in the command is a problem when lvmetad is running, because
the pvs command does not report duplicate PVs when using lvmetad.
(Alternatively, use_lvmetad could be disabled by adding a --config
override to this pvs command.)
Add separate LVSINFOSTATUS field type for fields which display both
dm info-like and dm status-like information.
The internal interface is there with the introduction of LVSSTATUS
field type which can cope with the combination of LVSSTATUS
and LVSINFO field types (several fields).
However, till now, we considered that *single* field can display
either LVSINFO or LVSSTATUS, but not both at the same time.
Till now, we haven't had single field which needs both - hence
add LVSINFOSTATUS field type for such fields as we currently
need this for the lv_attr field which requires combination of
info and status.
This patch just adds interface for an ability to register such fields
(the code that copes with this is already in).
Recently the single 'status' code has been used for number of cache
features.
Extend the API a little bit to allow usage also for lv_attr_dup.
As the function itself is used in lvm2api - add a new function:
lv_attr_dup_with_info_and_seg_status() that is able to use
grabbed info & status information.
report_init() is now using directly passed lvdm struct pointer
which holds the infomation whether lv_info() was correctly obtained or
there was some error when trying to read it.
Move 'healt' attribute to status.
TODO convert raid function to use the already known status.
The previous patch felt short WRT disabling allocation on PVs holding other
legs of the RAID LV persistently; this patch introduces an internal,
transient PV flag PV_ALLOCATION_PROHIBITED to address this very problem.
General problem description for completeness:
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
A full search for duplicate PVs in the case of pvs -a
is only necessary when duplicates have previously been
detected in lvmcache. Use a global variable from lvmcache
to indicate that duplicate PVs exist, so we can skip the
search for duplicates when none exist.
Previously, 'pvs -a' displayed the VG name for only the device
associated with the cached PV (pv->dev), and other duplicate
devices would have a blank VG name. This commit displays the
VG name for each of the duplicate devices. The cost of doing
this is not small: for each PV processed, the list of all
devices must be searched for duplicates.
When multiple duplicate devices are specified on the
command line, the PV is processed once for each of them,
but pv->dev is the device used each time.
This overrides the PV device to reflect the duplicate
device that was specified on the command line. This is
done by hacking the lvmcache to replace pv->dev with the
device of the duplicate being processed. (It would be
preferable to override pv->dev without munging the content
of the cache, and without sprinkling special cases throughout
the code.)
This override only applies when multiple duplicate devices are
specified on the command line. When only a single duplicate
device of pv->dev is specified, the priority is to display the
cached pv->dev, so pv->dev is not overridden by the named
duplicate device.
In the examples below, loop3 is the cached device referenced
by pv->dev, and is given priority for processing. Only after
loop3 is processed/displayed, will other duplicate devices
loop0/loop1 appear (when requested on the command line.)
With two duplicate devices, loop0 and loop3:
# pvs
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs -o+dev_size /dev/loop0 /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
With three duplicate devices, loop0, loop1, loop3:
# pvs -o+dev_size
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3 /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3 /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0 /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0 /dev/loop1 /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
Processes a PV once for each time a device with its PV ID
exists on the command line.
This fixes a regression in the case where:
. devices /dev/sdA and /dev/sdB where clones (same PV ID)
. the cached VG references /dev/sdA
. before the regression, the command: pvs /dev/sdB
would display the cached device clone /dev/sdA
. after the regression, pvs /dev/sdB would display nothing,
causing vgimportclone /dev/sdB to fail.
. with this fix, pvs /dev/sdB displays /dev/sdA
Also, pvs /dev/sdA /dev/sdB will report two lines, one for each
device on the command line, but /dev/sdA is displayed for each.
This only works without lvmetad.
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
API for seg reporting is breaking internal lvm coding - it cannot
use vgmem mem pool for allocation of reported value.
So use separate pool instead of 'vgmem' for non vg related allocations
Add consts for many function params - but still many other are left
for now as non-const - needs deeper level of change even on libdm side.
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
It's not an error if the device is filtered out and hence cleared from
lvmetad cache - "pvscan --cache DevPath" has now the same behaviour in
this case as "pvscan --cache major:minor" (which is more consistent).
Before, the tests expected failure return code for "pvscan --cache DevicePath"
if the device was filtered (which is a different situation if the device
is missing in the system completely!).
Normally, if there are partitions defined on top of device-mapper
device, there should be a device-mapper device created for each
partiton on top of the old one and once the underlying DM device
is used by another devices (partition mappings in this case),
it can't be used as a PV anymore.
However, sometimes, it may happen the partition mappings are
missing - either the partitioning tool is not creating them if
it does not contain full support for device-mapper devices or
the mappings were removed.
Better safe than sorry - check for partition header on DM devs
and filter them out as unsuitable for PVs in case the check is
positive. Whatever the user is doing, let's do our best to prevent
unwanted corruption (...by running pvcreate on top of such device
that would corrupt the partition header).
If pvscan is run with device path instead of major:minor pair and this
device still exists in the system and the device is not visible anymore
(due to a filter that is applied), notify lvmetad properly about this.
This makes it more consistent with respect to existing pvscan with
major:minor which already notifies lvmetad about device that is gone
due to filters.
However, if the device is not in the system anymore, we're not able
to translate the original device path into major:minor pair which
lvmetad needs for its action (lvmetad_pv_gone fn). So in this case,
we still need to use major:minor pair only, not device path. But at
least make "pvscan --cache DevicePath" as near as possible to "pvscan
--cahce <major>:<minor>" functionality.
Also add a note to pvscan man page about this difference when using
pvscan --cache with DevicePath and major:minor pair.
When processing PVs specified on the command line, the arg
name was being matched against pv_dev_name, which will not
always work:
- The PV specified on the command line could be an alias,
e.g. /dev/disk/by-id/...
- The PV specified on the command line could be any random
path to the device, e.g. /dev/../dev/sdb
To fix this, first resolve the named PV args to struct device's,
then iterate through the devices for processing.
No need to use awk now to get appropriate VGs/LVs, use LVM's
own --select - it's quicker, it removes a need for external
dependency on awk and it's also more readable.
Better than previous patch which changed log_warn to log_error -
we can have multiple MDAs and if one of them fails to be written,
we can still continue with other MDAs if we're in a mode where
we can handle missing PVs - so keep the log_warn for single
failed MDA write as it was before.
However, add log_error with "Failed to write VG <vg_name>." in
case we're not handling missing PVs or no MDA was written at all
during VG write process. This also prevents an internal error in
which the vg_write fails and we're not issuing any other log_error
in vg_write caller or above, so we end up with:
"Internal error: Failed command did not use log_error".
At first, all snapshot-origins where marked as unusable unconditionally
here, but we can't cut off whole snapshot-origin use in a stack just
because of this possible mirror state. This whole "device_is_usable"
check was even incorrectly part of persistent filter before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (where filter cleanup was
done).
The persistent filter is used only if obtain_device_list_from_udev=0,
which means that the former check for snapshot-origin here had not even
been hit with default configuration for a few years before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (the check for snapshot-origin and
skipping of this LV was introduced with commit a71d6051ed
back in 2010).
The obtain_device_list_from_udev=1 (and hence not using persistent
filter and hence not hitting this check for snapshot-origins and skipping) has been
in action since commit edcda01a1e (that is 2011).
So for 3 years this condition was not even checked with default configuration,
making it superfluous.
This all changed in 2014 with commit 8a843d0d97
where "filter-usable" is introduced and since then all snapshot-origins
have been marked as unusable more often than before and making snapshot-origins
practically unusable in a stack.
This patch removes this incorrect check from commit a71d6051ed
which caused snapshot-origins to be unusable more often recently.
If we want to fix this eventually in a correct way, we need to look
down the stack and if snapshot-origin is hit and there's a blocked
mirror underneath, only then mark the device as unusable. But mirrors
in stack are not supported anymore so it's questionable whether it's
worth spending more time on this at all...
$ lvcreate -l1 -m1 --type mirror vg
Logical volume "lvol0" created.
$ lvconvert --type raid1 vg/lvol0
Before:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_mimage_0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_mimage_1_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Incorrect name: lvol0_mimage_0_rimage_0
With this patch applied:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Proper name: lvol0_rimage_0
When mirror has missing PVs and there are mirror images on those missing
PVs, we delete the images and during this delete operation, we also
reactivate the LV. But if we're trying to reactivate the LV in cluster
which is not active and at the same time cmirrord is not running (which
is OK since we may have created the mirror LV as inactive), we end up
with:
"Error locking on node <node_name>: Shared cluster mirrors are not available."
That is because we're trying to activate the mirror LV without cmirrord.
However, there's no need to do this reactivation if the mirror LV (and
hence it's sub LVs) were not activated before.
This issue caused failure in mirror-vgreduce-removemissing.sh test
recently with this sequence (excerpt from the test script):
prepare_lvs_
lvcreate -an -Zn -l2 --type mirror -m1 --nosync -n $lv1 $vg "$dev1" $dev2" "$dev3":$BLOCKS
mimages_are_on_ $lv1 "$dev1" "$dev2"
mirrorlog_is_on_ $lv1 "$dev3"
aux disable_dev "$dev2"
vgreduce --removemissing --force $vg
The important thing about that test is that we're not running cmirrord,
we're activating the mirror with "-an" so it's inactive and then
vgreduce --removemissing tries to reactivate the mirror images
as part of the _delete_lv function call inside and since cmirrord
is not running, we end up with the "Shared cluster mirrors are not
available." error.
When creating cluster mirrors while they're not supposed to be activated
immediately after creation, we don't need to check for cmirrord availability.
We can just create these mirrors and let the check to be done on activation
later on. This is addendum for commit cba6186325.
When creating/activating clustered mirrors, we should have cmirrord
available and running. If it's not, we ended up with rather cryptic
errors like:
$ lvcreate -l1 -m1 --type mirror vg
Error locking on node 1: device-mapper: reload ioctl on failed: Invalid argument
Failed to activate new LV.
$ vgchange -ay vg
Error locking on node node 1: device-mapper: reload ioctl on failed: Invalid argument
This patch adds check for cmirror availability and it errors out
properly, also giving a more precise error messge so users are able
to identify the source of the problem easily:
$ lvcreate -l1 -m1 --type mirror vg
Shared cluster mirrors are not available.
$ vgchange -ay vg
Error locking on node 1: Shared cluster mirrors are not available.
Exclusively activated cluster mirror LVs are OK even without cmirrord:
$ vgchange -aey vg
1 logical volume(s) in volume group "vg" now active
Since GET_FIELD_RESERVED_VALUE always returns a pointer, don't reference
it with "&" when used - we already have that pointer value (this is an
addendum to recent commit 028ff30947).
Only GET_TYPE_RESERVED_VALUE needs to be referenced with "&" as it
returns directly the value of that type.
We have to use empty list, not NULL if we want to denote that the list
has no items. Otherwise, the code further can segfault as it expects
there's always a sane value (= some list), including empty list,
but never NULL.
Use helper macros to handle reserved values and also define "undefined"
reserved value as:
FIELD_RESERVED_VALUE(cache_policy, cache_policy_undef, "", "", "undefined")
Which means:
- print "" if the cache_policy value is undefined (the first name for this reserved value is "")
- recognize "undefined" reserved name as synonym to ""
(so statements like "lvs -S cache_policy=undefined" are still recognized)
Avoid making a copy of the keyword which is already registered in
values.h for "unmanaged" (vg_mda_copies field) and "auto" reserved
value (lv_read_ahead field). Also use helper macros to handle these
reserved - this is the correct approach - just do not copy the same
thing again and do not mix it! The GET_FIELD_RESERVED_VALUE and
GET_FIRST_RESERVED_NAME macros guarantees this - use it!
In addition to that, rename reserved values:
vg_mda_copies --> vg_mda_copies_unmanaged
lv_read_ahead --> lv_read_ahead_auto
So the field reserved values follows this scheme:
"<field_name>_<reserved_value_name>".
The same applies for type reserved values with this scheme:
"<report type name in lowercase>_<reserved_value_name>"
Add a comment about this scheme for others to follow as well
when adding new fields and their reserved values. This makes
it a bit easier to read the code then.
RESERVED(id) --> GET_TYPE_RESERVED_VALUE(id)
FIRST_NAME(id) --> GET_FIRST_RESERVED_NAME(id)
Also add GET_FIELD_RESERVED_VALUE(id) macro to get per-field reserved value.
This makes it much more readable and hopefully it'll make it
easier to use these helper macros when adding new reporting
fields with reserved values if needed.
The cache policy name taken as LV segment property must be duped
for report as the VG/LV/seg structure is destroyed after processing,
reporting happens later:
$ valgrind lvs -o+cache_policy
...
==16589== Invalid read of size 1
==16589== at 0x54ABCC3: dm_report_compact_fields
(libdm-report.c:1739)
==16589== by 0x153FC7: _report (reporter.c:619)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
==16589== Address 0x7d465f2 is 8,338 bytes inside a block of size
16,384 free'd
==16589== at 0x4C2ACE9: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==16589== by 0x54B8C85: _free_chunk (pool-fast.c:318)
==16589== by 0x54B84FB: dm_pool_destroy (pool-fast.c:78)
==16589== by 0x1E59C7: _free_vg (vg.c:78)
==16589== by 0x1E5A6D: release_vg (vg.c:95)
==16589== by 0x159B6E: _process_lv_vgnameid_list (toollib.c:1967)
==16589== by 0x159DD7: process_each_lv (toollib.c:2030)
==16589== by 0x153ED8: _report (reporter.c:598)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
We only checked global per-report-type reserved values for compatibility
with selection code. This patch also adds a check for per-report-field
reserved values. This avoids problems where unsupported report type is
used as reserved value which could cause hard to debug problems
otherwise. So this additional check stops from registering unsupported
and unhandled per-field reserved values.
Registerting such unsupported reserved value is a programmatic error,
so report internal error in this case to stop us from making a mistake
here in the future or even today where STR_LIST fields can't have
reserved values yet.
The {pv,vg,lv}display *do* use reporting in case "-C|--columns" is used.
The man page was correct, the recognition for the --binary was missing
in the code though!
The {pv,vg,lv}display commands don't use reporting capabilites and
as such they can't use --binary. This got into the man pages by
mistake - the display commands do not recognize --binary option.
All the LVM commands are run in mode without lvmetad use (since lvmetad
can't handle duplicates). When we're finished with vgimportclone, we
need to notify lvmetad about changes.
Before this patch (/dev/sda and /dev/sdb contains a copy VG called "vg"):
$ vgimportclone --basevgname vg_snap /dev/sdb
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Physical volume "/tmp/snap.zcJ8LCmj/vgimport0" changed 1 physical volume changed / 0 physical volumes not changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Reading all physical volumes. This may take a while...
Found volume group "vg" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
$ vgs
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 9.50g 0
vg 1 1 0 wz--n- 124.00m 120.00m
(...lvmetad doesn't see the new "vg_snap"!)
With this patch applied:
$ vgimportclone --basevgname vg_snap /dev/sdb
...
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Notifying lvmetad about changes since it was disabled temporarily.
Reading all physical volumes. This may take a while...
Found volume group "vg_snap" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
Found volume group "vg" using metadata type lvm2
$ vgs
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 9.50g 0
vg 1 1 0 wz--n- 124.00m 120.00m
vg_snap 1 1 0 wz--n- 124.00m 120.00m
The "restart lvmetad before enabling it" message is a bit misleading
here - we should probably suppress this one, but we can't suppress
warning messages selectively at the moment and we don't want to lose
other warning/error messages printed...
With current dumpconfig, we can generate lvm.conf easily - we can merge
current lvm.conf with the config given on cmd line:
lvm dumpconfig --mergedconfig --config "..."
This is a bit simpler than using awk and it also avoids problems when some of
the configuration is missing in existing lvm.conf file and hardcoded defaults
are used instead. The dumpconfig handles this transparently.
Under certain circumstances, the selection code can segfault:
$ vgs --select 'pv_name=~/dev/sda' --unbuffered vg0
VG #PV #LV #SN Attr VSize VFree
vg0 6 3 0 wz--n- 744.00m 588.00m
Segmentation fault (core dumped)
The problem here is the use of --ubuffered together with regex used in
selection criteria. If the report output is not buffered, each row is
discarded as soon as it is reported. The bug is in the use of report
handle's memory - in the example above, what happens is:
1) report handle is initialized together with its memory pool
2) selection tree is initialized from selection criteria string
(using the report handle's memory pool!)
2a) this also means the regex is initialized from report handle's mem pool
3) the object (row) is reported
3a) any memory needed for output is intialized out of report handle's mem pool
3b) selection criteria matching is executed - if the regex is checked the
very first time (for the very first row reported), some more memory
allocation happens as regex allocates internal structures "on-demand",
it's allocating from report handle's mem pool (see also step 2a)
4) the report output is executed
5) the object (row) is discarded, meaning discarding all the mem pool
memory used since step 3.
Now, with step 5) we have discarded the regex internal structures from step 3b.
When we execute reporting for another object (row), we're using the same
selection criteria (step 3b), but tihs is second time we're using the regex
and as such, it's already initialized completely. But the regex is missing the
internal structures now as they got discarded in step 5) from previous
object (row) reporting (because we're using "unbuffered" reporting).
To resolve this issue and to prevent any similar future issues where each
object/row memory is discarded after output (the unbuffered reporting) while
selection tree is global for all the object/rows, use separate memory pool
for report's selection.
This patch replaces "struct selection_node *selection_root" in struct
dm_report with new struct selection which contains both "selection_root"
and "mem" for separate mem pool used for selection.
We can change struct dm_report this way as it is not exposed via libdevmapper.
(This patch will have even more meaning for upcoming patches where selection
is used even for non-reporting commands where "internal" reporting and
selection criteria matching happens and where the internal reporting is
not buffered.)
Fix incorrect test in configure which sets --enable-udev-systemd-background-jobs
automatically if proper systemd version is available.
The UDEV_SYSTEMD_BACKGROUND_JOBS variable was not properly set to "yes" in
case systemd is available and we had "maybe" for this variable before.
When we split leg from raid - we take a proper new lock for a new LV.
However for now activation checks only 'existince' of device UUID,
but it's not validating device has a proper name.
As a quick fix call suspend()/resume() to rename after split mirror.
Add new dm_report_compact_fields function to cause report outout
(dm_report_output) to ignore fields which don't have any value set
in any of the rows reported. This provides support for compact report
output where only fields which have something to report are displayed.
The dm_report_set_output_selection was not implemented in the end -
we have dm_report_init_with_selection instead. This is just a remnant
from development code that got into libdevmapper.h by mistake.
Free (and clear) h.protocol string on daemon_open() error paths
so it's OK for caller to skip calling daemon_close() if returned
h.socket_fd is -1.
Close h.socket_fd in daemon_close() to avoid possible leak.
https://bugzilla.redhat.com/1164234
The call to dm_config_destroy can derefence result->mem
while result is still NULL:
struct dm_config_tree *get_cachepolicy_params(struct cmd_context *cmd)
{
...
int ok = 0;
...
if (!(result = dm_config_flatten(current)))
goto_out;
...
ok = 1;
out:
if (!ok) {
dm_config_destroy(result)
...
}
...
}
Just call return 0 directly on error path, without using
"goto" - the code is short, no need to use it this way
(the dead code appeared as part of further changes in this
function).
When chunk size needs to be estimated, the code missed to round
to proper 64kb boundaries (or power of 2 for older thin pool driver).
So for some data and metadata size (i.e. 10GB and 4MB) it resulted
in incorrect chunk size (not being a multiple of 64KB)
Fix it by adding proper rounding and also use 1 routine for 2 places
where the same calculation is made.
Fix also incorrect printed warning that has used 'ffs()'
(which returns first 'least significant' bit in word)
and it was not really giving any useful size info and replace it
with properly estimated chunk size.
Use lvm2 standard TARGETS.
Make liblvm_python.c as intermediate target (gets deleted after use)
Properly delete build dir on make distclean.
Mark install_python_bindings as .PHONY.
Fix regression introduced with a2c1024f6a
_setup_task(mknodes ? name : NULL...
has been replaced with:
_setup_task(type != MKNODES ? name : NULL....
Use '=='
Commit d2c116058e introduced regression
with CLVMD_PATH.
+ CLVMD_PATH="$clvmd_prefix/sbin/clvmd"
test "$prefix" != NONE && clvmd_prefix=$prefix
It has set CLVMD_PATH before clvmd_prefix got its final value.
Move it one line below.
The order of the resulting tree is based on the first appearance of
sections. With no section repeats, the sections stay as listed in the
config file. Sections using the brace syntax 'section { key = value }' are
treated the same way: 'section { x = 1 } section { y = 2 }' is the same as
'section/x = 1 section/y = 2' is the same as 'section { x = 1 y = 2 }'
Make sure there is 'control' node before clvmd is started.
Somehow 'clvmd' is not allowed by selinux to create one.
TODO: Check is selinux policy is right here...
In case someone wants to use @some_path@ that contains
prefix or exec_prefix and it's used in other files than Makefiles
(Makefiles used make.tmpl to expand these).
systemd-run is available in systemd>=205. Also, this fix prevents
systemd-specific udev rules in 69-dm-lvm-metad.rules to appear in
case systemd environment is not available - make configure to check
this automatically and use these systemd specific rules only if it
is applicable.
- closer to the recommendation of man-pages (7) if possible
- Add crossrefs
- Sort options and crossrefs
- Fix default timeout (60 secs) of -t
- Documents -I[auto]
Signed-off-by: Stéphane Aulery <saulery@free.fr>
- Closer to the recommendation of man-pages and groff_man (7) if
possible
- Sort options and crossrefs
- Relocate sub-options on the right places
Signed-off-by: Stéphane Aulery <saulery@free.fr>
ignore_vg now returns 0 for the FAILED_CLUSTERED case,
so all the ignore_vg 1 cases will return vg's with an
empty vg->pvs, so we do not need to iterate through
vg->pvs to remove the entries from the devices list.
Clean up whitespace problems in that area from the
previous commit.
- Fix problems with recent changes related to skipping in:
. _process_vgnameid_list
. _process_pvs_in_vgs
- Undo unnecessary changes to the code structure and readability.
- Preserve valid but minor changes:
. testing FAILED bit values in ignore_vg
. using "skip" value from ignore_vg instead of "ret" value
. applying the sigint check to the start of all loops
. setting stack backtrace when ECMD_PROCESSED is not returned,
i.e. apply the following pattern:
ret = process_foo();
if (ret != ECMD_PROCESSED)
stack;
if (ret > ret_max)
ret_max = ret;
Extend/fix d8923457b8 commit.
'skip'-ed VG is not holding any lock - so don't unlock such VG.
At the same time simplify the code around and relase VG at a single
place and unlock only not skiped and not ignored VGs.
Use log_warn when we are effectively not creating an error -
we 'allowed' inconsistent read for a reason - so it's just warning
level we process inconsistent VG - it's upto caller later to decide
error level of command return value and in case of error it needs
to use log_error then.
Rework ignore_vg() API so it properly handles
multiple kind of vg_read_error() states.
Skip processing only otherwise valid VG.
Always return ECMD_FAILED when break is detected.
Check sigint_caught() in front of dm iterator loop.
Add stack for _process failing ret codes.
Failed recovery provides different (NULL) VG then FAILED_INCONSISTENT.
Mark it with different failure bit - since FAILED_INCONSISTENT is
supposed to contain something 'usable' (thought inconsistent).
Move common code into shared internal fn so the logic for getting the
LV info as well LV segment status is not scattered around - call common
_do_info_and_status to gather required parts in reporting handlers.
- Add separate lv_status fn (if we're interested only in seg status,
but not lv info at the same time as it is with existing
lv_info_with_seg_status fn). So we 3 fns:
- lv_info (existing one, runs only info ioctl, fills in struct lvinfo only)
- lv_status (new one, runs status ioctl, fills in struct lv_seg_status only)
- lv_info_with_seg_status (existing one, runs status ioctl, fills
in struct lvinfo as well as lv_seg_status)
- Add more comments in the code explaining the difference between lv_info,
lv_status and lv_info_with_seg_status and their return values.
- Move decision whether lv_info_with_seg_status needs to call only
status ioctl (in case the segment for which we require status is from
the LV for which we require info) or separate status and info ioctl
(in case the segment for which we require status is from different
LV that the one for which we require info) into
lv_info_with_seg_status fn so caller doesn't need to bother about
this at all.
- Cleanup internal interface for this seg status so it's more readable.
Since we support device stack of pools over pool
(thin-pool with cache data volume) the existing code
is no longer able to detect orphan _pmspare.
So instead do a _pmspare check after volume removal,
and remove spare afterwards.
We need to stop guessing deleted names - so rather collect
deleted UUID into a string list - and then remove them properly
in _clean_tree. Restore origin _clean_tree behaviour them for
currently unconverted removal of snapshots.
Pending delete feature now properly tracks whole subtree of cache
(so i.e. data or metadata as raid volumes).
It properly replaces all related volumes with 'errors' in suspend
preload, then resume them as error and remove collected UUIDs
from root - since they are not longer part of any volume deps.
This would be in case the pool segment was not found.
LVM2.2.02.112/lib/metadata/pool_manip.c:238:36: warning: Access to field 'segtype' results in a dereference of a null pointer (loaded from variable 'pool_seg')
LVM2.2.02.112/lib/metadata/cache_manip.c:73: overflow_before_widen: Potentially overflowing expression "*pool_metadata_extents *vg->extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->len * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->le * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:196:5: warning: 'dmtask' may be used uninitialized in this function [-Wmaybe-uninitialized]
In _info_run fn:
switch (type) {
case INFO:
...
case STATUS:
...
case MKNODES:
...
}
The "type" is enum and currently only those three types are supported,
but if we added a new type in the future, this would end up with a bug
(if we forgot to add the new "case" in that "switch"). So let's make
sure proper internal error is printed:
default:
log_error(INTERNAL_ERROR "_info_run: unhandled info type");
return 0;
LVM2.2.02.112/daemons/clvmd/clvmd.c:1131: warning[arrayIndexOutOfBoundsCond]: Array 'row[8]' accessed at index 8, which is out of bounds. Otherwise condition 'j==8' is redundant.
This code:
int i,j = 0;
...
for (i = 0; i < len; ++i) {
...
if ((j == 8) || (i + 1 == len)) {
for (;j < 8; ++j) {
...
}
...
j = 0;
}
}
Indeed - j is 0 at the beginning, then iterating till j < 8,
then always zeroed at the end of the outer loop - so "j" never
reaching value of 8 - the j == 8 condition is redundant.
LVM2.2.02.112/tools/toollib.c:1991: leaked_storage: Variable "iter" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/filters/filter-usable.c:89: leaked_storage: Variable "f" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/activate/dev_manager.c:1874: leaked_handle: Handle variable "fd" going out of scope leaks the handle.
When getting status for LV segment types, we need to be sure
that proper segment is selected for the status ioctl.
When reporting fields that require status ioctl,
the "_choose_lv_segment_for_status_report" fn in tools/reporter.c
must be completed properly to choose the proper segment for all
the LV types (at the moment, it just takes the first LV segment
by default).
This works fine with cache LVs surely. The other segment types
need more auditing. We use this status ioctl only for cache status
fields at the moment only, so restrict it to the cache only.
Once the _choose_lv_segment_for_status_report is completed
properly, release the restriction in _get_segment_status_from_target_params.
Similar to LVSINFO type which gathers LV + its DM_DEVICE_INFO, the
new LVSSTATUS/SEGSSTATUS report type will gather LV/segment + its
DM_DEVICE_STATUS.
Since we can report status only for certain segment, in case
of LVSSTATUS we need to choose which segment related to the LV
should be processed that represents the "LV status". In case of
SEGSSTATUS type it's clear - the status is reported for the
segment just processed.
The former struct lv_with_info is renamed to lv_with_info_and_seg_status as it can
hold more than just "info", there's lv's segment status now in addition:
struct lv_with_info_and_seg_status {
struct logical_volume *lv;
struct lvinfo *info;
struct lv_seg_status *seg_status;
}
Where struct lv_seg_status is:
struct lv_seg_status {
struct dm_pool *mem;
struct lv_segment lv_seg;
lv_seg_status_type_t type;
void *status; /* struct dm_status_* */
}
Where lv_seg points to lv's segment that is being reported or
processed in general.
New struct lv_seg_status keeps the information about segment status -
the status retrieved via DM_DEVICE_STATUS ioctl. This information will
be used for reporting dm device target status for the LV segment
specified.
So this patch introduces third level of LV information that is
kept for reuse while reporting fields within one reporting line,
causing only one DM_DEVICE_STATUS ioctl call per LV segment line
reported (otherwise we'd need to call the DM_DEVICE_STATUS for each
segment status field in one LV segment/reporting line which is not
efficient).
This is following exactly the same principle as already introduced
by commit ecb2be5d16.
So currently we have three levels of information that can be used
to report an LV/LV segment:
- LV metadata itself (struct logical_volume *lv)
- LV's DM_DEVICE_INFO ioctl result (struct lvinfo *info)
- LV's segment DM_DEVICE_STATUS ioctl result (this status must be
bound to a segment, not the whole LV as the whole LV may be
composed of several segments of course)
(this is the new struct lv_seg_status *seg_status)
Do not use 'any' policy name as a value in config tree - so we stick
with 'policy_settings' and extra 'policy_name' for libdm params.
Update lvm2 API as well.
Example of supported metadata:
policy = "mq"
policy_settings {
migration_threshold = 2048
sequential_threshold = 512
random_threshold = 4
read_promote_adjustment = 10
}
Support new PASSTHROUGH 'feature' flag.
Add dm_config_node to pass in policy args.
Really use origin_uuid instead of using extra call
to pass seg_areas.
Switch to 64bit feature flag bit set so there is
enough space in future for new bits...
More efficient spare volume creation. Save 1 extra commit
and properly activate this volume according to our cluster
activation rules (using lv_active_change() for this).
Since we 'layer' for cache origin which and we support dropping
cache layer - we need to restore origin name in case
the origin LV is more complex target - i.e. raid.
Drop _corig from name
Cleanup and rename parent -> parent_lv.
Revert part of commit 51a29e6056,
it's probably bad idea to continue with any recovery, when
vg_write() or vg_commit() fail - so it's better to leave it as it is.
When deactivating origin, we may have possibly left table in broken state,
where origin is not active, but snapshot volume is still present.
Let's ensure deactivation of origin detects also all associated
snapshots are inactive - otherwise do not skip deactivation.
(so i.e. 'vgchange -an' would detect errors)
Let's use this function for more activations in the code.
'needs_exlusive' will enforce exlusive type for any given LV.
We may want to activate LV in exlusive mode, even when we know
the LV (as is) supports non-exlusive activation as well.
lvcreate -ay -> exclusive & local
lvcreate -aay -> exclusive & local
lvcreate -aly -> exclusive & local
lvcreate -aey -> exclusive (might be on any node).
LVSINFO is just a subtype of LVS report type with extra "info" ioctl
called for each LV reported (per output line) so include its processing
within "case LVS" switch, not as completely different kind of reporting
which may be misleading when reading the code.
There's already the "lv_info_needed" flag set in the _report fn, so
call the approriate reporting function based on this flag within the
"case LVS" switch line.
Actually the same is already done for LV is reported per segments
within the "case SEGS" switch line. So this patch makes the code more
consistent so it's processed the same way for all cases.
Also, this is a preparation for another and new subtype that will
be introduced later - the "LVSSTATUS" and "SEGSSTATUS" report type.
When responding to DM_EVENT_CMD_GET_REGISTERED_DEVICE no longer
ignore threads that have already been unregistered but which
are still present.
This means the caller can unregister a device and poll dmeventd
to ensure the monitoring thread has gone away before removing
the device. If a device was registered and unregistered in quick
succession and then removed, WAITEVENT could run in parallel with
the REMOVE.
Threads are moved to the _thread_registry_unused list when they
are unregistered.
Activate of new/unused/empty thin pool volume skips
the 'overlay' part and directly provides 'visible' thin-pool LV to the user.
Such thin pool still gets 'private' -tpool UUID suffix for easier
udev detection of protected lvm2 devices, and also gets udev flags to
avoid any scan.
Such pool device is 'public' LV with regular /dev/vgname/poolname link,
but it's still 'udev' hidden device for any other use.
To display proper active state we need to do few explicit tests
for this condition.
Before it's used for any lvm2 thin volume, deactivation is
now needed to avoid any 'race' with external usage.
Call check_new_thin_pool() to detect in-use thin-pool.
Save extra reactivation of thin-pool when thin pool is not active.
(it's now a bit more expensive to invoke thin_check for new pools.)
For new pools:
We now active locally exclusively thin-pool as 'public' LV.
Validate transaction_id is till 0.
Deactive.
Prepare create message for thin-pool and exclusively active pool.
Active new thin LV.
And deactivate thin pool if it used to be inactive.
Allowing 'external' use of thin-pools requires to validate even
so far 'unused' new thin pools.
Later we may have 'smarter' way to resolve which thin-pools are
owned by lvm2 and which are external.
When transaction_id is set 0 for thin-pool, libdm avoids validation
of thin-pool, unless there are real messages to be send to thin-pool.
This relaxes strict policy which always required to know
in front transaction_id for the kernel target.
It now allows to activate thin-pool with any transaction_id
(when transaction_id is passed in)
It is now upto application to validate transaction_id from life
thin-pool volume with transaction_id within it's own metadata.
Show some stats with 'lvs'
Display same info for active cache volume and cache-pool.
data% - #used cache blocks/#total cache blocks
meta% - #used metadata blocks/#total metadata blocks
copy% - #dirty/#used cache blocks
TODO: maybe there is a better mapping
- should be seen as first-try-and-see.
When the cache pool is unused, lvm2 code will internally
allow to activate such cache-pool.
Cache-pool is activate as metadata LV, so lvm2 could easily
wipe such volume before cache-pool is reused.
Replace lv_cache_block_info() and lv_cache_policy_info()
with lv_cache_status() which directly returns
dm_status_cache structure together with some calculated
values.
After use mem pool stored inside lv_status_cache structure
needs to be destroyed.
Add init of no_open_count into _setup_task().
Report problem as warning (cannot happen anyway).
Also drop some duplicated debug messages - we have already
printed the info about operation so make log a bit shorter.
Tool will use internal activation of unused cache pool to
clear metadata area before next use of cache-pool.
So allow to deactivation unused pool in case some error
case happend and we were not able to deactivation pool
right after metadata wipe.
Add API call to calculate extents from percentage value.
Size is based in DM_PERCENT_1 units.
(Supporting decimal point number).
This commit is preparing functionality for more global
usage of % with i.e. --size option.
New size_mb_arg_with_percent is able to read size_mb_arg
but also it's able to read % values.
Percent parsing is share with int_arg_with_sign_and_percent.
If root has locales with different decimal point then '.'
(i.e. Czech with ',') lets be tolerant and retry with
"C" locales in the case '.' is found during parse of number.
Locales are then restored back.
Support compile type configurable defaults for creation
of sparse volumes.
By default now create 'thin-pools' for sparse volumes.
Use the global/sparse_segtype_default to switch back to old
snapshots if needed.
Apply the same compile logic for newly introduces mirror/raid1 options.
Some values are reserved for special purpose like 'undefined', 'unmanaged' etc.
When using >, <, >= and < comparison operators where the range is considered,
do not include reserved values as proper values in this range which
would otherwise result in not so obvious criteria match (as the reserved value is
actually transparent for the user). It's incorrect.
Example scenario:
$ vgs -o vg_name,vg_mda_copies vg1 vg2
VG #VMdaCps
vg1 1
vg2 unmanaged
The "unmanaged" is actually mapped onto reserved value
18446744073709551615 (2^64 - 1) internally.
Such reseved value is already caught on selection criteria input
properly:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=18446744073709551615'
Numeric value 18446744073709551615 found in selection is reserved.
However, we still need to fix situaton where the reserved value may be
included in resulting range:
Before this patch:
$ vgs -o vg_name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies >= 1'
VG #VMdaCps
vg1 1
vg2 unmanaged
With this patch applied:
$ vgs -o vg_name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies >= 1'
VG #VMdaCps
vg1 1
From the examples above, we can see that without this patch applied,
the vg_mda_copies >= 1 also matched the reserved value 18446744073709551615
(which is represented by the "unamanged" string on report). When
applying the operators, such values must be skipped! They're meant to
be matched only against their string representation only, e.g.:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=unmanaged'
VG #VMdaCps
vg2 unmanaged
...or any synonyms:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=undefined'
VG #VMdaCps
vg2 unmanaged
Unlike with thin-pool - with cache we support all args also
directly when create cache volume.
So the result of 'separate' cache-pool creation and setting its
options should give same result as specifying those args
during cache creation.
Cache-pool values are used as defaults if the params are
not specified with cache creation.
Move code for creation of thin volume into a single place
out of lv_extend(). This allows to drop extra pool arg
for alloc_lv_segment() && lv_extend() and makes code
more easier to read and follow.
When we create volumes with chunk size bigger then extent size
we try to round up to some nearest chunk boundary.
Until now we did this for thins - use same logic for
cache volumes.
Fixed syntax parsing means that some commands that used to work are now
failing. Particullary this case:
$ invalid lvcreate -l1 --type thin vg/pool
> Needs to fail becase thin type LV needs --virtualsize
$ invalid lvcreate --type snapshot vg/lv1
> Needs to fail because old-snapshot segment type needs --size
Some reported error messages have been also updated.
Use segment flags to avoid zeroing of cache, cache pool
snapshot and thin pool segments.
We never want to zero these segment types.
Note:
Snapshot COW and Cache origin are created as stripes
thus are then properly zeroed.
Let the finaly state of zero & wipe_signature to be
resolved later together with all the types.
Don't play with zero assigment and segtype flag
(i.e. thin-pool -Z has different meaning).
Check if the passed options do allow requested zeroing/wiping.
lvcreate without -Z or -W will fallback to warning if the device
cannot be zeroed, however if user requested them explicitely
it will give user error.
Refactor lvcreate code.
Prefer to use arg_outside_list_is_set() so we get automatic 'white-list'
validation of supported options with different segment types.
Drop used lp->cache, lp->cache and use seg_is_cache(), seg_is_thin()
Draw clear border where is the last moment we could change create
segment type.
When segment type is given with --type - do not allow it to be changed
later.
Put together tests related to individual segment types.
Finish cache conversion at proper part of lv_manip code after
the vg_metadata are written - so we could correcly clean-up created
stripe LV for cache volume.
Move test for size of new LV names in front before
any creation of LV.
Properly check striped segtype kernel presence,
since passed 'segtype' is already tested.
Keep deactivation error path local to wiping part of the function.
Create metadata with temporary flag (it's activated, zeroed
and deactivated).
Introduce new option to specify pool data size.
This will be user to create i.e. cache & cachepool at once.
And possible for thin external origin snapshot.
This is only very basic patch to enable options, the
real working code will come later.
We want to print smarter warning message only when
the zeroing was not provided on the first zeroable segment
of newly created LV.
Put warning within _should_wipe_lv function to avoid reevaluation
of same conditions twice.
Hide creation of temporary LVs and print them only in verbose mode.
e.g. hides confusing message about creation of _pmspare
device during creation of pool.
Use new libdm macro DM_LIST_HEAD_INIT().
Embeded 'free' segment type (so it's not needed in the list)
Drop assignments of 0,NULL since they are defaults.
Ask for lock the proper LV.
Use the top-most LV to query for locally exclusive lock.
The rest of operations are then using 'lv_info()'
TODO:
Check all devices are reloaded from proper level.
In general any query on lv_is_active is supposed to be running
ona lv_lock_holder() volume.
Now we reference segment name via lvseg_name() and
we can drop default implementation and leave its
function pointer to be NULL.
Default give us 'return seg->segtype->name'.
Instead of segtype->ops->name() introduce lvseg_name().
This also allows us to leave name() function 'empty' for default
return of segtype->name.
TODO: add functions for rest of ops->
There was a bug in value and their synonym definition for these two fields
causing selections on these fields to not work correctly - nothing matched
against vg/lv_permissions fields even if selection criteria should have
matched.
Scenario:
$ lvs -o name,lv_permissions vg
LV LPerms
lvol0 read-only
lvol1 writeable
Before this patch:
$ lvs -o name,lv_permissions vg -S 'permissions=read-only'
(blank)
$ lvs -o name,lv_permissions vg -S 'permissions=writeable
(blank)
With this patch applied:
$ lvs -o name,lv_permissions vg -S 'permissions=read-only'
LV LPerms
lvol0 read-only
$ lvs -o name,lv_permissions vg -S 'permissions=writeable'
LV LPerms
lvol1 writeable
Also synonyms match correctly now:
$ lvs -o name,lv_permissions vg -S 'permissions=rw'
LV LPerms
lvol1 writeable
Fix lvm_split that is called when cmd line string is separated into
argv fields to recognize quote chars ('\'" and '"') properly and
when these quotes are used, consider the text within quotes as one
argument, do not separate it based on space characters inside.
The lvm_split is used during processing lvm shell command line or
when calling lvm commands through cmdlib (e.g. dmeventd plugins).
For example, the lvm shell scenario:
Before this patch:
$lvm
lvm> lvs --config 'global{ suffix=0 }'
Parse error at byte 9 (line 1): unexpected token
Failed to set overridden configuration entries.
With this patch applied:
$lvm
lvm> lvs --config 'global{ suffix=0 }'
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root fedora -wi-ao---- 9.00g
swap fedora -wi-ao---- 512.00m
(Exactly the same problem is hit when calling LVM commands with
quoted arguments via lvm2cmd lib in dmeventd plugins.)
Because of the recent change to process_each_pv(), the vg is always
provided when the pv is in a vg. is_pv(pv) means the pv is in a vg,
which means that the vg arg will not be NULL, which means the removed
block of code is not needed.
Bug https://bugzilla.redhat.com/show_bug.cgi?id=843587 is handled better
now - the hang does not occur anymore. There are still error messages
issued though during shutdown if someone stops lvm2-lvmetad.service
manually that lvm2-monitor.service depends on (even during shutdown).
These errors are correct though and will point to incorrect
configuration (still having use_lvmetad=1 in lvm.conf and stopping
lvm2-lvmetad.service manually).
The workaround to prevent the hang is not needed now. So the
'--config "global{use_lvmetad=0}"' is now removed from the
lvm2-monitor.service's ExecStop line.
We can't hang on blocked or suspended devices when the scan is done
for lvmetad update - when the device gets unblocked or resumed, there's
always CHANGE event generated which will fire the udev rule to run
extra pvscan --cache for that device which makes sure that lvmetad
is up-to-date.
When we are given an existing LV name - it needs to be allowed
to pass in even restricted name as the LV could have existed
long before we introduced some new restriction on prefix/suffix.i
Fix the regression on name limits and drop restriction to be applied
on any existing LVs - only the new created LV names have to be
complient with current name restrictions.
FIXME: we are currently using restricted names incorrectly in few
other places - device_is_usable() skips restricted names,
and udev flags are also incorrectly set for restricted names
so these LVs are not getting links properly.
find_pv_in_vg fn iterates over the list of PVs covered by the VG and
each PV's pvl->pv->dev is compared with device acquired from device
cache. However, in case pvl->pv->dev is NULL as well as device cache
returns NULL (e.g. when device is filtered), we'll get incorrect match
and the code calling find_pv_in_vg uses incorrect PV (as it thinks
it's the exact PV with the pv_name). The INTERNAL_ERROR covers this
situation and errors out immediately.
The warnings arg was used to enable logging of warnings
when reading a PV. This arg is turned into a set of flags
with the WARN_PV_READ flag matching the existing behavior.
A new flag WARN_INCONSISTENT is added that will cause
vg_read_internal() to log the "VG is not consistent"
warning so the various callers do not need to log
this warning themselves.
A new vg_read flag READ_WARN_INCONSISTENT is used from
reporting to enable the WARN_INCONSISTENT flag in
vg_read_internal.
[Committed by agk with cosmetic changes and tweaks.]
Process PVs by iterating through VGs, then iterating through
devices if the command needs to process non-PV devices.
The process_single function can always use the VG and PV args.
[Committed by agk with cosmetic changes and tweaks.]
lvcreate of thin pools can now use '-n lv vg' like other lv types,
or it can name the new thin pool in the free arg as 'vg/lv', which
is not allowed with other lv types.
Introduce pool function for validation of chunk size.
It's good idea to be able to reject invalid chunk size
when entered on command line before we open VG.
Move code to better locations.
Improve test and remove invalid ones
(i.e. no reason to require cache size to be >= then origin).
Correctly comment where the code is doing actual conversion
of other existing volume - we do already a similar thing with
external origins.
Lots of new command line options and combinations is now supported.
Hopefully older syntax still works as well.
lvcreate --cache --cachepool vg/pool -l1
lvcreate --type cache --cachepool vg/pool -l1
lvcreate --type cache-pool vg/pool -l1
lvcreate --type cache-pool --name pool vg -l1
... and many many more ...
Since _pmspare is internal volume move it to
lv_remove_single - so it's automatically removed with
last remove thin-pool.
lv_remove_with_dependencies() is not always used for pool removal.
--splitcache
Splits only cached LV (also pool could be specified).
Detaches cachepool from cached LV.
--split
Should be univerzal command to split various complex targets.
At this moment it knows cache.
--uncache
Opposite command to --cache. Detaches and DELETES cachepool for
cached LV.
Note: we support thin pool cached metadata device for uncaching.
Also use may specify wither cached LV or association cachepool device
to request split of cache.
Over the time lvcreate code has accumulated various hacks.
So try to move that code in right places.
Detect all types early in _lvcreate_params() so functions like
_read_size_params() do not need to change volume types.
Also ultimately respect give volume --type, that its shortcut
(-T, H, -m, -s) and after that options which do type estimation.
(i.e. --cachepool, --thinpool)
Avoid repeative tests - if we know all types are decode at once
place we can 'optimize' number of validations.
Copy the same form as the new process_each_vg.
Replace unused struct cmd_vg and cmd_vg_read() replicator
code with struct vg and vg_read() directly.
The failed_lvnames arg is no longer used since the
cmd_vg replicator wrapper was removed.
[Committed by agk with cosmetic changes and tweaks.]
Split VG argument collection from processing.
This allows the two different loops through VGs to
be replaced by a single loop.
Replace unused struct cmd_vg and cmd_vg_read() replicator
code with struct vg and vg_read() directly.
[Committed by agk with cosmetic changes and tweaks.]
The cache mode of a new cache pool is always explicitly
included in the vg metadata. If a cache mode is not
specified on the command line, the cache mode is taken
from lvm.conf allocation/cache_pool_cachemode, which
defaults to "writethrough".
The cache mode can be displayed with lvs -o+cachemode.
filters/filter-usable.c:22: warning: "ucp.check_..." may be used uninitialized in this function
This can't actually be hit in real, but let's clean this up for the compiler
to be happy again.
There are actually three filter chains if lvmetad is used:
- cmd->lvmetad_filter used when when scanning devices for lvmetad
- cmd->filter used when processing lvmetad responses
- cmd->full_fiilter (which is just cmd->lvmetad_filter + cmd->filter chained together) used
for remaining situations
This patch adds the third one - "cmd->full_filter" - currently this is
used if device processing does not fall into any of the groups before,
for example, devices which does not have the PV label yet and we're just
creating a new one or we're processing the devices where the list of the
devices (PVs) is not returned by lvmetad initially.
Currently, the cmd->full_filter is used exactly in these functions:
- lvmcache_label_scan
- _pvcreate_check
- pvcreate_vol
- lvmdiskscan
- pvscan
- _process_each_label
If lvmetad is used, then simply cmd->full_filter == cmd->filter because
cmd->lvmetad_filter is NULL in this case.
The ENABLE_ALL_DEVS flag is added to the command structure
for commands that should process all devs (pvs and non-pvs)
when they call process_each_pv and the command includes the
--all arg. This will be used in a later process_each_pv patch.
The ALL_VGS_IS_DEFAULT flag is added to the command structure
for commands that should process all vgs when they call
process_each_vg or process_each_lv with no args.
This will be used in later patches to process_each functions.
We need to use proper filter chain when we disable lvmetad use
explicitly in the code by calling lvmetad_set_active(0) while
overriding existing configuration. We need to reinitialize filters
in this case so proper filter chain is used. The same applies
for the other way round - when we enable lvmetad use explicitly in
the code (though this is not yet used).
With this change, the filter chains used look like this now:
A) When *lvmetad is not used*:
- persistent filter -> regex filter -> sysfs filter ->
global regex filter -> type filter ->
usable device filter(FILTER_MODE_NO_LVMETAD) ->
mpath component filter -> partitioned filter ->
md component filter
B) When *lvmetad is used* (two separate filter chains):
- the lvmetad filter chain used when scanning devs for lvmetad update:
sysfs filter -> global regex filter -> type filter ->
usable device filter(FILTER_MODE_PRE_LVMETAD) ->
mpath component filter -> partitioned filter ->
md component filter
- the filter chain used for lvmetad responses:
persistent filter -> usable device filter(FILTER_MODE_POST_LVMETAD) ->
regex filter
Usable device filter is responsible for filtering out unusable DM devices.
The filter has 3 modes of operation:
- FILTER_MODE_NO_LVMETAD:
When this mode is used, we check DM device usability by looking:
- whether device is empty
- whether device is blocked
- whether device is suspended (only on devices/ignore_suspended_devices=1)
- whether device uses an error target
- whether device name/uuid is reserved
- FILTER_MODE_PRE_LVMETAD:
When this mode is used, we check DM device usability by looking:
- whether device is empty
- whether device is suspended (only on devices/ignore_suspended_devices=1)
- whether device uses an error target
- whether device name/uuid is reserved
- FILTER_MODE_POST_LVMETAD:
When this mode is used, we check DM device usability by looking:
- whether device is blocked
- whether device is suspended (only on devices/ignore_suspended_devices=1)
These modes will be used by subsequent patch to create different
instances of this filter, depending on lvmetad use.
Currently, there are 5 things that device_is_usable function checks
(for DM devices only, of course):
- is device empty?
- is device blocked? (mirror)
- is device suspended?
- is device composed of an error target?
- is device name/uuid reserved?
If answer to any of these questions is "yes", then the device is not usable.
This patch just adds possibility to choose what to check for exactly - the
device_is_usable function now accepts struct dev_usable_check_params make
this selection possible. This is going to be used by subsequent patches.
When compiled with valgrind pool support - don't waste time
with preallocation of memory - it just waste of CPU cycles to
trace access to this memory.
We also may get slightly better estimation about real memory usage
during command processing.
Split internals of extract_vgname into _extract_vgname.
This common code will be used for other similar function.
Reuse skip_dev_dir() instead of less mature coded to skip
device dir.
Instead of duplicating full vg/lv name - allocate string
only vg portion of lv name.
Previously, this was the recommended form for creating a thin pool:
lvconvert --thinpool VG/ThinDataLV --poolmetadata VG/ThinMetaLV
but this is confusing, because --thinpool does not actually take
an arg, and is more naturally used to specify an existing thin pool.
The new recommended form is:
lvconvert --type thin-pool --poolmetadata VG/ThinMetaLV VG/ThinDataLV
Previously, this was the recommended form for creating a cache pool:
lvconvert --cachepool VG/CacheDataLV --poolmetadata VG/CacheMetaLV
but this is confusing, because --cachepool does not actually take
an arg, and is more natually used to specify an existing cache pool.
The new recommended form is:
lvconvert --type cache-pool --poolmetadata VG/CacheMetaLV VG/CacheDataLV
We are not using already defined segement type names where we could.
There is a lot of other places in device-mapper and LVM2 we have those
hardcoded so we should better finally have a common interface in
libdevmapper to avoid this.
Use of lv_info() internally in lv_check_not_in_use(),
so it always could use with_open_count properly.
Skip sysfs() testing in open_count == 0 case.
Accept just 'lv' pointer like other functions.
The function has 'built-in' lv_is_active_locally check,
which however is not what we need to check in many place.
For now at least remotely active snapshot merge is
detected and for this case merge on next activation is scheduled.
Move check for snapshot-merge support before archiving.
Split code on 2 paths - with merge_on_activate
using vg_write & vg_commit
and lv_update_reload call for instant merging.
Move printing after backup.
Before leaving _activate_lvs_in_vg() wait till devices
are active - so we do not print message about active
devices earlier then it really happens for a user.
More validations before any thin or cache related conversion begins.
We allow to use and stack:
pool data: cache or raid
pool metadata: raid
pool: linear, striped
cache: linear, striped, raid
thin(extorig): linear, origin, cow, virtual, thin
We use adjusted_mirror_region_size() in two different contexts.
Either on command line -
here we do want to inform user about reduction of size.
Or in pvmove activation context -
here we should only use 'verbose' info.
When requesting to reload an LV imrove this API to
automatically reload its lock holding LV as in cluster
only top-level LVs are addressable with lock.
When vg_ondisk is NULL we do not need to search
through the whole VG to find out the same LV.
NOTE: as of now - VG locking is not enabled as some code parts
are breaking memory locking rules (lvm2app).
Once we enforce VG locking for read-only commands the effect
will be much better for larger VGs.
Do not let fly metadata with just 'minor' set
(since they would not be readable on older version)
Be permissive with invalid major/minor number and
just report them as problem, but allow to use
such metadata with default major:minor.
Use nice instruction_HLT macro
Use log_debug_mem()
Don't actually log things after we prohibit 'mmap'.
Move initialization of strerror & udev before blocking mmap.
We do not need to restore LV content on error path - since
for reactivation we always use ondisk/commited metadata,
so passed data are never used.
Drop some unneded extra message, since the called function
repeated logs same info.
Move common code for reading and processing
of --persistent arguments for lvcreate and lvchange
into lvmcmdline.
Reuse validate_major_minor() routine for validation.
Don't blindly activate LVs after change in cluster
and instead only local reactivation is supported.
(we have now many limited targets now).
Dropping 'sigint_caught()' handling, since
prompt() is resolving this case itself.
Add code to trap both mmap implementation on 32bit arch.
Use dlsym()
Use hlt instraction instead of int3 - generates usable stack trace
when problem is catched.
If we want to support conversion of VG to clustered type,
we currently need to relock active LV to get proper DLM lock.
So add extra loop after change of VG clustered attribute
to exlusively activate all active top level LVs.
When doing change -cy -> -cn we should validate LVs are not
active on other cluster nodes - we could be sure about this only
when with local exclusive activation - for other types
we require user to deactivate volumes first.
As a workaround for this limitation there is always
locking_type = 0 which amongs other skip the detection
of active LVs.
FIXME:
clvmd should handle looks for cluster locking type all the time.
Failure to copy the 'feature_flags' lvconvert_param to the matching
lv_segment field meant that when a user specified the cachemode argument,
the request was not honored.
While we could probably reacquire some type of lock when
going from non-clustered to clustered vg, we don't have any
single road back to drop the lock and keep LV active.
For now keep it safe and prohibit conversion when LV
is active in the VG.
Try to enforce consistent macro usage along these lines:
lv_is_mirror - mirror that uses the original dm-raid1 implementation
(segment type "mirror")
lv_is_mirror_type - also includes internal mirror image and log LVs
lv_is_raid - raid volume that uses the new dm-raid implementation
(segment type "raid")
lv_is_raid_type - also includes internal raid image / log / metadata LVs
lv_is_mirrored - LV is mirrored using either kernel implementation
(excludes non-mirror modes like raid5 etc.)
lv_is_pvmove - internal pvmove volume
Cmirrord has endian bugs, which cause failure to lvcreate a mirrored lv
on s390.
- data_size is uint32, should not use xlate64 to convert, which will
cause data_size 0 after xlate.
- request_type and data_size still used by local(v5_data_switch),
should convert later. If request_type xlate too early, it will
cause request_type judge error; if data_size xlate too early, it
will cause coredump in case DM_ULOG_CLEAR_REGION.
- when receiving package in clog_request_from_network. vp[0] will always
be little endian. We could use xlate64(vp[0]) == vp[0] to decide if
the local node is little endian or not.
Signed-off-by: Lidong Zhong<lzhong@suse.com> & Liuhua Wang <lwang@suse.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Use lv_is_* macros throughout the code base, introducing
lv_is_pvmove, lv_is_locked, lv_is_converting and lv_is_merging.
lv_is_mirror_type no longer includes pvmove.
This is probably better approach than 3880ca5eca.
If dm module is not loaded during dm_is_dm_major call, there are no
lines for dm in /proc/devices, of course. Normally, dm_is_dm_major
is called to check existing devices, hence if module is not loaded,
we can expect there's no DM device present at the same time so we
can directly return 0 here (meaning the major number being inspected
is not dm device's one).
See also https://bugzilla.redhat.com/show_bug.cgi?id=1059711.
For dm_is_dm_major to determine whether the major number given as
an argument belongs to a DM device, libdm code needs to know what
the actual DM major is to do the comparison.
It may happen that the dm-mod module is not loaded during this
call and so for the completness let's try our best before we start
giving various errors - we can still make use of dm-mod autoloading,
though only since kernels 2.6.36 where this feature was introduced.
Caused by recent changes - a7be3b12df.
If global filter was not defined, then part of the code
creating composite filter (the cmd->lvmetad_filter) incorrectly
increased index value even if this global filter was not created
as part of the composite filter. This caused a gap with "NULL"
value in the composite filter array which ended up with the rest
of the filters after the gap to be ignored and also it caused a mem
leak when destroying the composite filter.
We used to print an error message whenever we tried to deal with devices that
lvmetad knew about but were rejected by a client-side filter. Instead, we now
check whether the device is actually absent or only filtered out and only print
a warning in the latter case.
If a PV label is exposed both through a composite device (MD for example) and
through its component devices, we always want the PV that lvmetad sees to be the
composite, since this is what all LVM commands (including activation) will then
use. If pvscan --cache is triggered for multiple clones of the same PV, the last
to finish wins. This patch basically re-arranges the filters so that
component-device filters are part of the global_filter chain, not of the
client-side filter chain. This has a subtle effect on filter evaluation order,
but should not alter visible semantics in the non-lvmetad case.
The code in dev_iter_create assumes that if a filter can be wiped, doing so will
always trigger a call to _full_scan. This is not true for composite filters
though, since they can always be wiped in principle, but there is no way to know
that a component filter inside will exist that actually triggers the scan.
This message should be printed only for activation commands,
however since the handling of this flag is not correct
(rhbz 1140029) and will require further changes,
do now just a minor change and switch message into log_debug
(so it's not printed i.e. with every 'lvs -v')
Use lv_update_and_reload() and lv_update_and_reload_origin()
to handle write/suspend/commit/resume sequence.
In few places this properly handle vg_revert() after suspend failure,
and also ensures there is metadata backup after successful vg_commit().
Fix rename operation for snapshot (cow) LV.
Only the snapshot's origin has the lock and by mistake suspend
and resume has been called for the snapshot LV.
This further made volumes unusable in cluster.
So instead of suspend and resuming list of LVs,
we need to just suspend and resume origin.
As the sequence write/suspend/commit/resume
is widely used in lvm2 code base - move it to
new lv_update_and_reload function.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.