IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Reinstantiate reporting of metadata percent usage for cache volumes.
Also show the same percentage with hidden cache-pool LV.
This regression was caused by optimization for a single-ioctl in
2.02.155.
Older udev versions (udev < v165), don't have the official
udev_device_get_is_initialized function available to query for
device initialization state in udev database. Also, devices don't
have USEC_INITIALIZED udev db variable set - this is bound to the
udev_device_get_is_initialized fn functionality.
In this case, check for "DEVLINKS" variable instead - all block devices
have at least one symlink set for the node (the "/dev/block/<major:minor>".
This symlink is set by default basic udev rules provided by udev directly.
We'll use this as an alternative for the check that initial udev
processing for a device has already finished.
It's possible (mainly during boot) that udev has not finished
processing the device and hence the udev database record for that
device is still marked as uninitialized when we're trying to look
at it as part of multipath component check in pvscan --cache code.
So check several times with a short delay to wait for the udev db
record to be initialized before giving up completely.
When scanning devs to populate lvmetad during system startup,
filter-mpath with native sysfs multipath component detection
may not detect that a dev is multipath component. This is
because the multipath devices may not be set up yet.
Because of this, pvscan will scan multipath components during
startup, will see them as duplicate PVs, and will disable
lvmetad. This will leave lvmetad disabled on systems using
multipath, unless something or someone runs pvscan --cache
to rescan.
To avoid this problem, the code that is scanning devices to
populate lvmetad will now check the udev db to see if a
dev is a multipath component that should be skipped.
(This may not be perfect due to inherent udev races, but will
cover most cases and will be at least as good as it's ever
been.)
Avoid monitoring of activated cache-pool - where the only purpose ATM
is to clear metadata volume which is actually activate in place
of cache-pool name (using public LV name).
Since VG lock is held across whole clear operation, dmeventd cannot
be used anyway - however in case of appliction crash we may
leave unmonitored device.
In future we may provide better mechanism as the current name
replacemnet is creating 'uncommon' table setups in case the metadata
LV is more complex type like raid (needs some futher thinking about
error path results).
Another point to think about is the fact we should not clear device
while holding lock (i.e. dmeventd mirror repair cannot work in cases
like this).
Introduce 'hard limit' for max number of cache chunks.
When cache target operates with too many chunks (>10e6).
When user is aware of related possible troubles he
may increase the limit in lvm.conf.
Also verbosely inform user about possible solution.
Code works for both lvcreate and lvconvert.
Lvconvert fully supports change of chunk_size when caching LV
(and validates for compatible settings).
Commit e947c362dd introduced
config_settings.h file for central place to store all definitions for
config options. By mistake, it used report/colums_as_rows instead
of report/columns_as_rows (missing "n" in "columns").
'pvmove -n name pv1 pv2' allows to collocate multiple RAID SubLVs
on pv2 (e.g. results in collocated raidlv_rimage_0 and raidlv_rimage_1),
thus causing loss of resilence and/or performance of the RaidLV.
Fix this pvmove flaw leading to potential data loss in case of PV failure
by preventing any SubLVs from collocation on any PVs of the RaidLV.
Still allow to collocate any DataLVs of a RaidLV with their sibling MetaLVs
and vice-versa though (e.g. raidlv_rmeta_0 on pv1 may still be moved to pv2
already holding raidlv_rimage_0).
Because access to the top-level RaidLV name is needed,
promote local _top_level_lv_name() from raid_manip.c
to global top_level_lv_name().
- resolves rhbz1202497
Adding MetaLVs to given DataLVs (e.g. raid0 -> raid0_meta takeover),
_avoid_pvs_with_other_images_of_lv() was missing code to prohibit
allocation when called with a just allocated MetaLV to prohibit
collaocation of the next allocated MetaLV on the same PV.
- resolves rhbz1366738
When lvm is compiled with --enable-notify-dbus and a user uses lvm
shell, after they issue 200+ commands the lvm shell will hang for
~30 seconds trying to notify the lvm dbus service that a change
has occurred. This appears to be caused by resource exhaustion,
because the sockets used for dbus communication are not be closed.
Enforce mirror/raid0/1/10/4/5/6 type specific maximum images when
creating LVs or converting them from mirror <-> raid1.
Document those maxima in the lvcreate/lvconvert man pages.
- resolves rhbz1366060
Some settings are not suitable for override in interactive/shell
mode because such settings may confuse the code and it may end
up with unexpected behaviour. This is because of the fact that
once we're in the interactive/shell mode, we have already applied
some settings for the shell itself and we can't override them
further because we're already using those settings to drive the
interactive/shell mode. Such settings would get ignored silently
or, in worse case, they would mess up the existing configuration.
When lvm commands are executed in lvm shell, we cover the whole lvm
command execution within this shell now. That means, all messages logged
and status caught during each command execution is now recorded in the
log report, including overall command's return code.
With patches that will follow, this will make it possible to widen log
report coverage when commands are executed from lvm shell so the amount
of messages that may end up in stderr/stdout instead of log report are
minimized.
Add new log_context=shell and with log_object_type=cmd and
log_object_name=<command_name> for command log report to collect
overall return code from last command (this is reported under
log_type=status).
Currently, the output is separated in 3 parts and each part can go into
a separate and user-defined file descriptor:
- common output (stdout by default, customizable by LVM_OUT_FD environment variable)
- error output (stderr by default, customizable by LVM_ERR_FD environment variable)
- report output (stdout by default, customizable by LVM_REPORT_FD environment variable)
For example, each type of output goes to different output file:
[0] fedora/~ # export LVM_REPORT_FD=3
[0] fedora/~ # lvs fedora vg/abc 1>out 2>err 3>report
[0] fedora/~ # cat out
[0] fedora/~ # cat err
Volume group "vg" not found
Cannot process volume group vg
[0] fedora/~ # cat report
LV VG Attr LSize Layout Role CTime
root fedora -wi-ao---- 19.00g linear public Wed May 27 2015 08:09:21
swap fedora -wi-ao---- 500.00m linear public Wed May 27 2015 08:09:21
Another example in LVM shell where the report goes to "report" file:
[0] fedora/~ # export LVM_REPORT_FD=3
[0] fedora/~ # lvm 3>report
(in lvm shell)
lvm> vgs
(content of "report" file)
[1] fedora/~ # cat report
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 19.49g 0
(in lvm shell)
lvm> lvs
(content of "report" file)
[1] fedora/~ # cat report
VG #PV #LV #SN Attr VSize VFree
fedora 1 2 0 wz--n- 19.49g 0
LV VG Attr LSize Layout Role CTime
root fedora -wi-ao---- 19.00g linear public Wed May 27 2015 08:09:21
swap fedora -wi-ao---- 500.00m linear public Wed May 27 2015 08:09:21
'lvchange --resync LV' or 'lvchange --syncaction repair LV' request the
RAID layout specific parity blocks in raid4/5/6 to be recreated or the
mirrored blocks to be copied again from the master leg/copy for raid1/10,
thus not allowing a rebuild of a particular PV.
Introduce repeatable option '--[raid]rebuild PV' to allow to request
rebuilds of specific PVs in a RaidLV which are known to contain corrupt
data (e.g. rebuild a raid1 master leg).
Add test lvchange-rebuild-raid.sh to test/shell doing rebuild
variations on raid1/10 and 5; add aux function check_status_chars
to support the new test.
- Resolves rhbz1064592
introduced with commit 8f62b7bfe5 rely on complete
defintions of the relations between the LVs of a VG.
Hence only run these checks when the complete_vg flag
is set on calls to check_lv_segments().
lvconvert failed in test lvconvert-thin-raid.sh when
calling check_lv_segments() from _read_segments() without
providing a complete definition.
General RAID and RAID segment type specific checks are added
to merge.c. New static _check_raid_seg() is called on each segment
of a RaidLV (which have just one) from check_lv_segments().
New checks caught some unititialized segment members
which are addressed here as well:
- initialize seg->region_size to 0 in lvcreate.c for raid0/raid0_meta
- initialize list seg->origin_list in lv_manip.c
When volume was lvconvert-ed to a thin-volume with external origin,
then in case thin-pool was in non-zeroing mode
it's been printing WARNING about not zeroing thin volume - but
this is wanted and expected - so nothing to warn about.
So in this particular use case WARNING needs to be suppressed.
Adding parameter support for lvcreate_params.
So now lvconvert creates 'normal thin LV' in read-only mode
(so any read will 'return 0' for a moment)
then deactivate regular thin LV and reacreate in 'final R/RW' mode
thin LV with external origin and activate again.
Before, the automatic update from older to newer version of PV extension
header happened within vg_write call. This may have caused problems under
some circumnstances where there's a code in between vg_write and vg_commit
which may have failed. In such situation, we reverted precommitted metadata
and put back the state to working version of VG metadata.
However, we don't have revert for PV write operation at the moment. So
if we updated PV headers already and we reverted vg_write due to failure
in subsequent code (before vg_commit), we ended up with lost VG metadata
(because old metadata pointers got reset by the PV write operation).
To minimize problematic situations here, we should put vg_write and
vg_commit that is done after PV header rewrites as close to each
other as possible.
This patch moves the automatic PV header rewrite for new extension
header part from vg_write to _vg_read where it's done the same way
as we do any other VG repairs if detected during VG read operation
(under VG write lock).
If the VG holding the global lock is removed, we can indicate
that as the reason for not being able to acquire the global
lock in subsequent error messages, and can suggest enabling
the global lock in another VG. (This helpful error message
will go away if the global lock is enabled in another VG,
or if lvmlockd is restarted.)
In some cases, the command will update VG metadata
in lvmetad without writing it. In these cases there
is no vg->vg_committed and it should use 'vg' directly.
This happens when the command finds that the lvmetad
VG has been invalidated, rereads the metadata from disk,
then updates lvmetad with that metadata. This happens
often with lvmlockd or foreign VGs, and can happen without
lvmlockd if a previous command fails after invalidating
the VG in lvmetad.
Commit 3928c96a37 introduced
new defaults for raid number of stripes, which may cause
backwards compatibility issues with customer scripts.
Adding configurable option 'raid_stripe_all_devices' defaulting
to '0' (i.e. off = new behaviour) to select the old behaviour
of using all PVs in the VG or those provided on the command line.
In case any scripts rely on the old behaviour, just set
'raid_strip_all_devices = 1'.
- resolves rhbz1354650
This fixes a regression from commit a7c45ddc5, which moved
the lvmetad VG update from vg_commit() to unlock_vg().
The lvmetad VG update needs to send the version of metadata
that was committed rather than sending the state of struct 'vg'.
The 'vg' may have been partially modified since vg_commit(),
and contain non-committed metadata that shouldn't be sent
to lvmetad.
Any failing stripes in raid0/raid0_meta type LVs cause data loss,
thus replacement via 'lvconvert --replace...' does not make sense.
Patch prohibits replacement on raid0/raid0_meta LVs.
- resolves rhbz1356734
Commit ca878a3426 changed behavior
or resize operation. Later the code has been futher changed
to skip fs resize completely when size of LV is already matching
and finaly at the most recent resize changeset for resize the
check for matching size has been eliminated as well so we ended
with a request call to resize fs to 0 size in some cases.
This commit reoders some test so the prompt happens just once before
resize of possibly 2 related volumes.
Also extra test for having LV already given size is added, and
whole metadata update is skipped for this case as the only
result would be an increment of seqno.
However the filesystem is still resized when requested,
so if the LV has some size and the resize is resolved to
the same size, the filesystem resize is called so in case FS
would not match, the resize will happen.
A livelock occurs on extension in lv_manip when adjusting the region size,
which doesn't apply to any raid0/raid0_meta LVs (these don't have a bitmap).
Fix by prohibiting the region size adjustment on any such LVs.
- resolves rhbz1354604
An unconditional access to the non-existing MetaLV of a raid0 LV in
lv_raid_remove_missing() was causing the segfault.
Only call log_debug() on replacements of existing MetaLVs.
- resolves rhbz1354646
When logging to epoch files we would like to prevent creating too large
log files otherwise a spining command could fulfill available space
very easily and quickly.
Limit for to 100000 per command.
This patch fixes link validation for used thin-pool.
Udev rules correctly creates symlinks only for unused new thin-pool.
Such thin-pool can be used by foreing apps (like Docker) thus
has /dev/vg/lv link.
However when thin-pool becomes used by thinLV - this link is no
longer exposed to user - but internal verfication missed this
and caused messages like this to be printed upon 'vgchange -ay':
The link /dev/vg/pool should have been created by udev but it was not
found. Falling back to direct link creation.
And same with 'vgchange -an':
The link /dev/vg/pool should have been removed by udev but it is still
present. Falling back to direct link removal.
This patch ensures only unused thin-pool has this link.
Apply the same idea as vg_update.
Before doing the VG remove on disk, invalidate
the VG in lvmetad. After the VG is removed,
remove the VG in lvmetad. If the command fails
after removing the VG on disk, but before removing
the VG metadata from lvmetad, then a subsequent
command will see the INVALID flag and not use the
stale metadata from lvmetad.
Previously, a command sent lvmetad new VG metadata in vg_commit().
In vg_commit(), devices are suspended, so any memory allocation
done by the command while sending to lvmetad, or by lvmetad while
updating its cache could deadlock if memory reclaim was triggered.
Now lvmetad is updated in unlock_vg(), after devices are resumed.
The new method for updating VG metadata in lvmetad is in two phases:
1. In vg_write(), before devices are suspended, the command sends
lvmetad a short message ("set_vg_info") telling it what the new
VG seqno will be. lvmetad sees that the seqno is newer than
the seqno of its cached VG, so it sets the INVALID flag for the
cached VG. If sending the message to lvmetad fails, the command
fails before the metadata is committed and the change is not made.
If sending the message succeeds, vg_commit() is called.
2. In unlock_vg(), after devices are resumed, the command sends
lvmetad the standard vg_update message with the new metadata.
lvmetad sees that the seqno in the new metadata matches the
seqno it saved from set_vg_info, and knows it has the latest
copy, so it clears the INVALID flag for the cached VG.
If a command fails between 1 and 2 (after committing the VG on disk,
but before sending lvmetad the new metadata), the cached VG retains
the INVALID flag in lvmetad. A subsequent command will read the
cached VG from lvmetad, see the INVALID flag, ignore the cached
copy, read the VG from disk instead, update the lvmetad copy
with the latest copy from disk, (this clears the INVALID flag
in lvmetad), and use the correct VG metadata for the command.
(This INVALID mechanism already existed for use by lvmlockd.)
Simplify code around _do_get_report_selection - remove "expected_idxs[]"
argument which is superfluous and add "allow_single" switch instead to
allow for recognition of "--configreport <report_name> -S" as well as
single "-S" if needed.
Null pointer dereferences (FORWARD_NULL) /safe/guest2/covscan/LVM2.2.02.158/tools/reporter.c: 961 in _do_report_get_selection()
Null pointer dereferences (FORWARD_NULL) Dereferencing null pointer "single_args".
Uninitialized variables (UNINIT) /safe/guest2/covscan/LVM2.2.02.158/tools/toollib.c: 3520 in _process_pvs_in_vgs()
Uninitialized variables (UNINIT) Using uninitialized value "do_report_ret_code".
Null pointer dereferences (REVERSE_INULL) /safe/guest2/covscan/LVM2.2.02.158/libdm/libdm-report.c: 4745 in dm_report_output()
Null pointer dereferences (REVERSE_INULL) Null-checking "rh" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
Incorrect expression (MISSING_COMMA) /safe/guest2/covscan/LVM2.2.02.158/lib/log/log.c: 280 in _get_log_level_name()
Incorrect expression (MISSING_COMMA) In the initialization of "log_level_names", a suspicious concatenated string ""noticeinfo"" is produced.
Null pointer dereferences (FORWARD_NULL) /safe/guest2/covscan/LVM2.2.02.158/tools/reporter.c: 816 in_get_report_options()
Null pointer dereferences (FORWARD_NULL) Comparing "mem" to null implies that "mem" might be null.
This reverts commit fa69ed0bc8.
This code sometimes expects to be presented with a read-only filesystem
(during some boot sequences for example) and copes appropriately with
this and it should not lead to expected error messages that might cause
unnecessary alarm.
lv_name arg is only used without known LV for resolving '*lv'.
Once we know *lv, never use lv_name ever again.
So setting it when passing *lv has not needed.
Add code to support more LVs to be resized through a same code path
using a single lvresize_params struct.
(Now it's used for thin-pool metadata resize,
next user will be snapshot virtual resize).
Update code to adjust percent amount resize for use_policies.
Properly activate inactive thin-pool in case of any pool resize
as the command should not 'deffer' this operation to next activation.
Use common API design and pass just LV pointer to lv_manip.c functions.
Read cmd struct via lv->vg->cmd when needed.
Also do not try to return EINVALID_CMD_LINE error when we
have already openned VG - this error code can only be returned before
locking VG.
We have only 2 users of _lv_active() - one was already checking for ==1
while the other use (_lv_is_active()) could have take '-1' as a sign of having
an LV active. So return 0 and log_debug also the reason while detection
has failed (i.e. in case --driverload n - it's kind of expectable,
but might have confused user seeing just <backtrace>).
This fixes commit f50d4011cd which
introduced a problem when using older lvm2 code with newer libdm.
In this case, the old LVM didn't recognize new _LOG_BYPASS_REPORT flag
that libdm-report code used. This ended up with no output at all
from libdm where log_print_bypass_report was called because the
_LOG_BYPASS_REPORT was not masked properly in lvm2's print_log fn
which was called as callback function for logging.
With this patch, the lvm2 registers separate print_log_libdm logging
function for libdm instead. The print_log_libdm is exactly the same
as print_log (used throughout lvm2 code) but it checks whether we're
printing common line on output where "common" means not going to stderr,
not a warning and not an error and if we are, it adds the
_LOG_BYPASS_REPORT flag so the log_print goes directly to output, not
to any log report.
So this achieves the same goal as in f50d4011cd,
just doing it in a way that newer libdm is still compatible with older
lvm2 code (libdm-report is the only code using log_print).
Looking at the opposite mixture - older libdm with newer lvm2 code,
that won't be compilable because the new log report functionality
that is in lvm2 also requires new dm_report_group_* libdm functions
so we don't need to care here.
Move code from original print_log fn to a separate _vprint_log function
that accepts va_list and make print_log a wrapper over _vprint_log.
The print_log just initializes the va_list and uses it for _vprint_log
call now. This way, we can reuse _vprint_log if needed.
Previously, vgcfgrestore would attempt to vg_remove the
existing VG from lvmetad and then vg_update to add the
restored VG. But, if there was a failure in the command
or with vg_update, the lvmetad cache would be left incorrect.
Now, disable lvmetad before the restore begins, and then
rescan to populate lvmetad from disk after restore has
written the new VG to disk.
log_print is used during cmd line processing to log the result of the
operation (e.g. "Volume group vg successfully changed" and similar).
We don't want output from log_print to be interleaved with current
reports from group where log is reported as well. Also, the information
printed by log_print belongs to the log report too, so it should be
rerouted to log report if it's set.
Since the code in libdm-report which is responsible for doing the report
output uses log_print too, we need to use a different kind of log_print
which bypasses any log report currently used for logging (...simply,
we can't call log_print to output the log report itself which in turn
would again reroute to report - the report would never get on output
this way).
This patch adds structures and functions to reroute error and warning
logs to log report, if it's set.
There are 5 new functions:
- log_set_report
Set log report where logging will be rerouted.
- log_set_report_context
Set context globally so any report_cmdlog call will use it.
- log_set_report_object_type
Set object type globally so any report_cmdlog call will use it.
- log_set_report_object_name_and_id
Set object ID and name globally so any report_cmdlog call will use it.
- log_set_report_object_group_and_group_id
Set object group ID and name globally so any report_cmdlog call will use it.
These functions will be called during LVM command processing so any logs
which are rerouted to log report contain proper information about current
processing state.
lvm fullreport executes 5 subreports (vg, pv, lv, pvseg, seg) per each VG
(and so taking one VG lock each time) within one command which makes it
easier to produce full report about LVM entities.
Since all 5 subreports for a VG are done under a VG lock, the output is
more consistent mainly in cases where LVM entities may be changed in
parallel.
New report/output_format configuration sets the output format used
for all LVM commands globally. Currently, there are 2 formats
recognized:
- basic (the classical basic output with columns and rows, used by default)
- json (output is in json format)
Add new --reportformat option and new report_format_init function that
checks this option and creates new report group accordingly, also
preparing log report handle and adding it to the report group just
created.
This is a preparation for new CMDLOG report type which is going to be
used for reporting LVM command log.
The new report type introduces several new fields (log_seq_num, log_type,
log_context, log_object_type, log_object_group, log_object_id, object_name,
log_message, log_errno, log_ret_code) as well as new configuration settings
to set this report type (report/command_log_sort and report/command_log_cols
lvm.conf settings).
This patch also introduces internal report_cmdlog helper function
which is a wrapper over dm_report_object to report command log via
CMDLOG report type and which is going to be used throughout the code
to report the log items.
This fixes a problem in commit ae0a8740c. The problem
in that commit was that all existing PVs are initially
dropped from lvmetad. This works if the VG is updated
at the end, which replaces the dropped PVs, but if the
rescan finds that the VG seqno is unchanged, it leaves
the cached VG in place. So, we should only drop the
existing PVs in lvmetad when the VG is going to be updated.
Original code missed to catch all apperances of SIGINT.
Also enhance logging when running in shell without tty.
Accept this regex as valid input:
'^[ ^t]*([Yy]([Ee]([Ss]|)|)|[Nn]([Oo]|))[ ^t]*$'
Some commands scan labels to populate lvmcache multiple
times, i.e. lvmcache_init, scan labels to fill lvmcache,
lvmcache_destroy, then later repeat
Each time labels are scanned, duplicates are detected,
and preferred devices are chosen. Each time this is done
within a single command, we want to choose the same
preferred devices. So, check for existing preferences
when choosing preferred devices.
This also fixes a problem with the list of unused duplicate
devs when run in an lvm shell. The devs had been allocated
from cmd memory, resulting in invalid list entries between
commands.
A number of places are working on a specific dev when they
call lvmcache_info_from_pvid() to look up an info struct
based on a pvid. In those cases, pass the dev being used
to lvmcache_info_from_pvid(). When a dev is specified,
lvmcache_info_from_pvid() will verify that the cached
info it's using matches the dev being processed before
returning the info. Calling code will not mistakenly
get info for the wrong dev when duplicate devs exist.
This confusion was happening when scanning labels when
duplicate devs existed. label_read for the first dev
would add an info struct to lvmcache for that dev/pvid.
label_read for the second dev would see the pvid in
lvmcache from first dev, and mistakenly conclude that
the label_read from the second dev can be skipped
because it's already been done. By verifying that the
dev for the cached pvid matches the dev being read,
this mismatch is avoided and the label is actually read
from the second duplicate.
If a command gets stuck during an lvmetad update, lvmetad
will cancel that update after the timeout. The next command
to check the lvmetad will see that lvmetad needs to be
populated because lvmetad will return token of "none" after
a timed out update (same as when lvmetad is not populated
at all after starting.)
If a command gets an error during an lvmetad update, it
will now just quit and leave its updating token in place.
That update will be cancelled after the timeout.
Treat loop device created with 'losetup -P' as regular
partitioned device - so if it has partition table,
prevent its usage in commands like 'pvcreate'.
Before 'pvcreate /dev/loop0' could have erased and formated as PV,
after this patch, device is filtered out and cannot be used.
Before this fix, when reporting 'lvm devtypes', the report was
initialized with incorrect reserved values - the ones used for
pvs/vgs/lvs report were used instead of NULL value (because devtypes
doesn't have any reserved values).
For example, trying to (incorrectly) use lv_name for the -S|--select
with lvm devtypes which doesn't have this field at all:
Before this patch (internal error issued):
$ lvm devtypes -S 'lv_name=lvol0'
Internal error: _check_reserved_values_supported: field-specific reserved value of type 0x0 for field not supported
Internal error: dm_report_init_with_selection: trying to register unsupported reserved value type, skipping report selection
DevType MaxParts Description
aoe 16 ATA over Ethernet
ataraid 16 ATA Raid
bcache 1 bcache block device cache
...
With this patch applied (correct error displayed about
unrecognized selection field):
$ lvm devtypes -S 'lv_name=lvol0'
Device Types Fields
-------------------
devtype_name - Name of Device Type exactly as it appears in /proc/devices. [string]
devtype_max_partitions - Maximum number of partitions. (How many device minor numbers get reserved for each device.) [number]
devtype_description - Description of Device Type. [string]
Special Fields
--------------
selected - Set if item passes selection criteria. [number]
help - Show help. [unselectable number]
? - Show help. [unselectable number]
Unrecognised selection field: lv_name
Selection syntax error at 'lv_name=lvol0'.
Use 'help' for selection to get more help.
Convert fields into using a single status ioctl call per LV.
This is a bit tricky since when there are more complicated
stacks, at this moment its undefined which values should be shown.
It's clear we need to cache more then single ioctl per LV,
but also we need to define more explicitely relation between
reported values for snapshots.
This patch is not a final state, rather a transitional step.
It should not be giving more 'worst' values then previous
many-ioctl-calls-per-lv solution.
Add function to obtain percentage value for cache lv_seg_status.
This API is rather evolving 'middle' step as the ultimate goal
is segment API fuctionality.
But first we need to be clear at reporting level which values
are needed to be reported for which LVs and segments.
Add more code to properly store status for snapshot segment
maintaining lvm2 fiction of COW and snapshot internal volumes.
The key issue here is however not though-through reporting
logic - as there is no single answer for whole line state.
It not counting with layer and we may need few more ioctl to
cover all reporting needs depending upon what is actually
needed.
In reality we need to 'cache' more ioctl status queries for
individual LVs and their segments (so they checked at most once).
The other 'hard' topic for conversion is mirror segment handling.
Also we definitelly need to relocate some logic into segment's methods,
yet it might be complex as we have not clear border between targets.
TODO: define more clearly how are reporting fields defined in case
we 'stack' volumes like - cache of stacked thin LV snapshot origin.
lv_refresh_suspend_resume() has escaped with fail ret code
after failing suspend and could have left many volumes in suspend state.
So always unconditionally call resume also when suspend has failed.
To get better control when flushing is used add extra arg when
setting up dm task.
By default now check dm device status without flush.
(At this moment this should effect only thin and cache volumes).
Also switch dev_manager_thin_pool_status() to use more
readable 'flush' parameter instead of 'no_flush'.
Check first the LV is cow before even checking it's a merging COW.
Note: previosly merging_cow was also merging origin, so without
this explicit check it used to return '1' also when passed
LV has been merging origin.
When mirror/raid called copy_percent function to return,
when 100% was supposed to be returned, wrong float 100.0 value
could have been reported back instead of dm_percent_t DM_PERCENT_100.
There is broken API somewhere, since the function here rely on
actively being modifid VG content even when doing 'lvs' operation.
(extents_copies)
This refactors the code for autoactivation. Previously,
as each PV was found, it would be sent to lvmetad, and
the VG would be autoactivated using a non-standard VG
processing function (the "activation_handler") called via
a function pointer from within the lvmetad notification path.
Now, any scanning that the command needs to do (scanning
only the named device args, or scanning all devices when
there are no args), is done first, before any activation
is attempted. During the scans, the VG names are saved.
After scanning is complete, process_each_vg is used to do
autoactivation of the saved VG names. This makes pvscan
activation much more similar to activation done with
vgchange or lvchange.
The separate autoactivate phase also means that if lvmetad
is disabled (either before or during the scan), the command
can continue with the activation step by simply not using
lvmetad and reverting to disk scanning to do the
activation.
Add support for active cache LV.
Handle --cachemode args validation during command line processing.
Rework some lvm2 internal to use lvm2 defined CACHE_MODE enums
indepently on libdm defines and use enum around the code instead
of passing and comparing strings.
A program may be using liblvm2app for simply checking a config
setting in lvm.conf. In this case, a full lvm context is not
needed, only cmd->cft (which are the config settings read from
lvm.conf).
lvm_config_find_bool() can now be passed a NULL lvm context
in which case it will only create cmd->cft, check the config
setting asked for, and destroy the cmd.
When setting up a toolcontext, the lib init function
was detecting an error when there was none, and then
it was returning an incompletely initialized cmd struct
instead of NULL. The effect was that the lib would try
to use the uninitialized cmd struct and segfault.
This would happen if a non-fatal error occurred during
cmd setup, e.g. user permission failed on lvmetad socket,
causing cmd to fall back to scanning and not use lvmetad.
The only real error condition is when create_toolcontext
returns NULL. If cmd is returned, the lib can use it.
If a command begins repopulating the lvmetad cache,
and fails part way through, it should set the disabled
state in lvmetad so other commands don't use bad data.
If a subsequent scan succeeds, the disabled state is
cleared.
If duplicate devices exist for a PV, and one device's
size matches the PV size, but the other doesn't, then
prefer the matching device.
If one device is used by an active LV, prefer that device.
When there are duplicate devices for a PV, one device
is preferred and chosen to exist in the VG. The other
devices are not used by lvm, but are displayed by pvs
with a new PV attr "d", indicating that they are
unchosen duplicate PVs.
The "duplicate" reporting field is set to "duplicate"
when the PV is an unchosen duplicate, and that field
is blank for the chosen PV.
Previously, duplicate PVs were processed as a side effect
of processing the "chosen" PV in lvmcache. The duplicate
PV would be hacked into lvmcache temporarily in place of
the chosen PV.
In the old way, we had to always process the "chosen" PV
device, even if a duplicate of it was named on the command
line. This meant we were processing a different device than
was asked for. This could be worked around by naming
multiple duplicate devs on the command line in which case
they were swapped in and out of lvmcache for processing.
Now, the duplicate devs are processed directly in their
own processing loop. This means we can remove the old
hacks related to processing dups as a side effect of
processing the chosen device. We can now simply process
the device that was named on the command line.
When the same PVID exists on two or more devices, one device
is preferred and used in the VG, and the others are duplicates
and are not used in the VG. The preferred device exists in
lvmcache as usual. The duplicates exist in a specical list
of unused duplicate devices.
The duplicate devs have the "d" attribute and the "duplicate"
reporting field displays "duplicate" for them.
'pvs' warns about duplicates, but the formal output only
includes the single preferred PV.
'pvs -a' has the same warnings, and the duplicate devs are
included in the output.
'pvs <path>' has the same warnings, and displays the named
device, whether it is preferred or a duplicate.
Wait to compare and choose alternate duplicate devices until
after all devices are scanned. During scanning, the first
duplicate dev is kept in lvmcache, and others are kept in a
new list (_found_duplicate_devs).
After all devices are scanned, compare all the duplicates
available for a given PVID and decide which is best.
If the dev used in lvmcache is changed, drop the old dev
from lvmcache entirely and rescan the replacement dev.
Previously the VG metadata from the old dev was kept in
lvmcache and only the dev was replaced.
A new config setting devices/allow_changes_with_duplicate_pvs
can be set to 0 which disallows modifying a VG or activating
LVs in it when the VG contains PVs with duplicate devices.
Set to 1 is the old behavior which allowed the VG to be
changed.
The logic for which of two devs is preferred has changed.
The primary goal is to choose a device that is currently
in use if the other isn't, e.g. by an active LV.
. prefer dev with fs mounted if the other doesn't, else
. prefer dev that is dm if the other isn't, else
. prefer dev in subsystem if the other isn't
If neither device is preferred by these rules, then don't
change devices in lvmcache, leaving the one that was found
first.
The previous logic for preferring a device was:
. prefer dev in subsystem if the other isn't, else
. prefer dev without holders if the other has holders, else
. prefer dev that is dm if the other isn't
Support parsing --chunksize option also when converting.
Now user can use cache pool created with i.e. 32K chunksize,
while in caching user can select 512K blocks.
Tool is supposed to validate cache metadata size is big enough
to support such chunk size. Otherwise error is shown.
When creating LV - in some case we change created segment type
(ATM for cache and snapshot) and we then manipulate with
lv segment according to 'lp' segtype.
Fix this by checking for proper type before accessing segment members.
This makes command like:
lvcreate --type cache-pool -L10 vg/cpool
lvcreate -H -L10 --cachesettings migtation_threshold=10000 vg/cpool
to pass since now tool correctly selects default cache policy.
If there's an activation volume_filter, it might not be possible
to activate the rmeta LVs to wipe them. At least inherit any
LV tags from the parent LV while attempting this.
Checking for devices uses is_missing_pv() to check
if there is a device for the PV. is_missing_pv()
is based on the MISSING_PV flag, which does not
always correspond to !pv->dev. When using lvmetad,
a command like:
pvs --config 'devices/filter=["a|/dev/sdb|", "r|.*|"]'
will cause a number of PVs to have NULL pv->dev, but
not the MISSING_PV flag. So, NULL pv->dev needs to
also be checked.
Before executing modprobe for given module name, just check
if the module is not already present in /sys/module.
Useful when checking dm-cache-policy modules as we do not
having matching interface like for targets.
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
Probably not worth mentioning "segments" here, just state that devices
for an LV can't be all found during the check - it's less mysterious for
user then:
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find all devices for LV fedora/root while checking used and assumed devices.
WARNING: Couldn't find all devices for LV fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
When checking assumed PVs against real devices used for LVs and if
there's no device assigned for an assumed PV (e.g. due to filters),
do log_warn instead of log_error and continue checking LV segments
and associated assumed PVs further, just like we do log_warn elsewhere
in this situation.
This way user will see the warning for each LV which couldn't be
checked completely against real PVs used. Before, we logged only
the very first occurence of missing device for an LV in a VG and we
returned from the function doing this check for all the LVs in VG
immediately which may be a bit misleading because it didn't tell
user about all the other LVs and whether they could be checked
or not.
For example, we have this setup:
[0] fedora/~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
/dev/vda2 fedora lvm2 a-- 19.49g 0
[0] fedora/~ # lvs -o+devices
LV VG Attr LSize Devices
root fedora -wi-ao---- 19.00g /dev/vda2(0)
swap fedora -wi-ao---- 500.00m /dev/vda2(4864)
Before this patch (only the very first LV in a VG is logged to have a
problem while checking used and assumed devices):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
With this patch applied (all LVs where we hit problem while checking
used and assumed devices are logged and it's warning, not error):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
If lvmetad is running, and a command opts to not use it
(--config global/use_lvmetad=0), and the command changes
metadata, then the metadata change is not visible to
lvmetad. Subsequent commands using lvmetad to change
metadata may cause corruption based on the invalid
lvmetad state.
Eventually we can set the disabled state in lvmetad
to prevent this problem, but for now print a warning
about the possibility.
When command is not using lvmetad because
use_lvmetad=0 in the config, but the lvmetad
pidfile exists, print a warning (previously
this checked for the socket existing instead
of the pidfile existing.)
vg/snapshotN should not appear anywhere.
No code should be showing this, but it was noticed in some logs last
week and we can deal with it in display_lvname().
The lvmetad connection is created within the
init_connections() path during command startup,
rather than via the old lvmetad_active() check.
The old lvmetad_active() checks are replaced
with lvmetad_used() which is a simple check that
tests if the command is using/connected to lvmetad.
The old lvmetad_set_active(cmd, 0) calls, which
stopped the command from using lvmetad (to revert to
disk scanning), are replaced with lvmetad_make_unused(cmd).
After a device rescan that repopulates lvmetad,
if no reason for disabling lvmetad was seen
(lvm1 metadata or duplicate PVs), then clear
the disabled flag in lvmetad. This allows
commands to resume using the lvmetad cache
after the cause for disabling it has been removed.
Commands already check if the lvmetad token is valid,
and if not, they rescan devices to repopulate lvmetad
before running. Now, in addition to checking the
lvmetad token, they also check if the lvmetad disabled
flag is set. If so, they do not use the lvmetad cache
and revert to disk scanning.
A global flag in lvmetad indicates it has been disabled.
Other flags indicate the reason it was disabled.
These flags can be queried using get_global_info.
The lvmetactl debugging utility can set and clear the
disabled flag in lvmetad. Nothing else sets the
disabled flag yet.
Commands will check these flags after connecting to
lvmetad. If the disabled flag is set, the command
will not use the lvmetad cache, but revert to disk
scanning.
To test this feature:
$ lvmetactl get_global_info
response = "OK"
global_invalid = 0
global_disable = 0
disable_reason = "none"
token = "filter:3041577944"
$ vgs
(should report VGs from lvmetad)
$ lvmetactl set_global_disable 1
$ lvmetactl get_global_info
response = "OK"
global_invalid = 0
global_disable = 1
disable_reason = "DIRECT"
token = "filter:3041577944"
$ vgs
WARNING: Not using lvmetad because the disable flag was set directly.
(should report VGs without contacting lvmetad)
$ lvmetactl set_global_disable 0
$ vgs
(should report VGs from lvmetad)
Improve code for snapshot merge for readabilty
and also reduce number of tests needed to decide
if merging can or cannot be started.
(Further improving 9cccf5245a)
To recognize in runtime if we are merging or not
to make consistent decision between suspend and resume
add function to parse thin table line when add
merging thin device to the table.
A snapshot merge into its origin cannot be initiated while the devices
are in use. If there is outside interference (such as from udev),
the suspend (preload) and resume stages can reach conflicting decisions
about whether or not to proceed.
Try to make the logic more robust by checking the inactive or live
table during resume. (This is still not perfect.)
Move checking the lvmetad state, and the possible rescan,
out of lvmetad_send() to the start of the command.
Previously, the token mismatch and rescan would occur
within lvmetad_send() for some other request. Now,
the token mismatch is detected earlier, so the
rescan can be done before the main command is in
progress. Rescanning deep within the processing of
another command will disturb the lvmcache state of
that other command.
A rescan already exists at the start of the command
for the case where foreign VGs are going to be read.
This same rescan is now also performed when there is
an lvmetad token mismatch (from a changed global_filter).
The commands pvscan/vgscan/lvscan/vgimport are excluded
from this preemptive checking/rescanning for lvmetad
because they want to do rescanning themselves explicitly.
If rescanning devices fails, then lvmetad has not been
correctly repopulated and should not be used, so make
the command revert to not using lvmetad.
When not obtaining device from udev, we are doing deep devdir scan,
and at the same time we try to insert everything what /sys/dev/block
knows about. However in case lvm2 is configured to use nonstardard
devdir this way it will see (and scan) devices from a real system.
lvm2 test suite is using its own test devdir with its
own device nodes. To avoid touching real /dev devices, validate
the device node exist in give dir and do not insert such device
into a cache.
With obtain list from udev this patch has no effect
(the normal user path).
We have _insert_dirs() for udev and non-udev compilation.
Compiling without udev missed to call dev_cache_index_devs().
Move the call after _insert_dirs() call so both compilation
gets it.
When scanning if device is being usable as PV,
we call STATUS - but this status should not cause
any flushing.
Skip also open_count information as it's not needed.
We don't have any report field of this type yet. Return this patch into
the play if we really need that. Currenly we always report status
(result of "status" dm ioctl) for an LV as a whole where we choose
segment which represents the LV, not calling status for each possible
segment it contains - we don't need this now so I'm removing it to
not make the code more complex uselessly.
Devices without "LVM-" uuid prefix have been generated by very old
version of lvm2 2.00 and 2.01.
Since version 2.02 all lvm2 devices are using prefix "LVM-".
However checking for present of ancient non prefixed devices does
take extra IOCTL per every call and for majority of todays user
it will not find anything new.
So use the assumption that users with kernel 3.X and newer are not
really using such old versions of lvm2 (year <2005) and with their
new kernel they are also using new version of lvm2 and skip
checking for them.
This change also makes trace logs more readable.
When leaving preload routine it should not altet state of flush required
when it's been already set to 1 and drop it to 0.
The API here is unclean, but current usage expects the same
variable pointer is for all preload calls and combines 'flush_required'
across all of them.
Commit 844b009584 tried to move
limit for usage of noflush to 'preload' however this was not
correctly processed.
Intead explicitly check for which types we do not want noflush
and also add debug message in this case.
Fix regression caused by commit ba41ee1dc9.
The idea was to use no_flush for changed device only for thin volumes
and thin pools but also to merge this with change made in commit
844b009584.
However the resulting condition has caused misbehavior for the mirror
suspend - as that has been before the ONLY allowed target type
that could have been suspended with noflush.
Result was badly working repair for --type mirror that has been
passing 'flush' to the repaired mirror target whenever preload
wrongly set flush_required.
The origin code has required the flush_required to be set whenever
deivce size is changed.
Now it first detects if device size got smaller
'dm_tree_node_size_changed(root) < 0' - this requires flush.
Otherwise it checks device is not thin volume nor thin pool and its
size has changed (got bigger) and requires flush.
This mean upsize of thin-pool or thin volume will not require flush.
/sys/dev/block is available since kernel version 2.2.26 (~ 2008):
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-dev
The VGID/LVID indexing code relies on this feature so skip indexing
if it's not available to avoid error messages about inability to open
/sys/dev/block directory.
We're not going to provide fallback code to read the /sys/block/
instead in this case as that's not that efficient - it needs extra
reads for getting major:minor and reading partitions would also
pose further reads and that's not worth it.
If compiling without udev_sync support, udev_get_library_context simply
returns NULL so we don't need to remember putting ifdef UDEV_SYNC_SUPPORT
in the code all the time we just need to check whether there's any udev
context initialized or not.
If obtain_device_list_from_udev=0, LVM can make use of persistent .cache
file. This cache file contains only devices which underwent filters in
previous LVM command run. But we need to iterate over all block devices
to create the VGID/LVID index completely for the device mismatch check
to be complete as well.
This patch iterates over block devices found in sysfs to generate the
VGID/LVID index in dev cache if obtain_device_list_from_udev=0
(if obtain_device_list_from_udev=1, we always read complete list of
block devices from udev and we ignore .cache file so we don't need
to look in sysfs for the complete list).
For the case when we print device name associated with struct device
that was not found in /dev, but in sysfs, for example when printing
devices where LV device mismatch is found.
Use meta% to expose highest mapped sector in thinLV.
so showing there 100.00% means thinLV maps latest sector.
Currently using a 'trick' with total_numerator to pass-in
device size when 'seg==NULL'
TODO: Improve device status API per target - current 'percentage'
is not really extensible.
Previous fix missed the fact the we do query for 'percent' with
seg value either set or unset (API overload...)
When 'seg' was unset, we still issue flush with status.
Fix it by cheking segtype by target_type.
As we check for segtype - we could also skip whole percentage
if the 'segtype' is unknown by code directly.
Reported-by: Ming-Hung Tsai <mingnus gmail com
It's correct to have a DM device that has no DM UUID assigned
so no need to issue error message in this case. Also, if the
device doesn't have DM UUID, it's also clear it's not an LVM LV
(...when looking for VGID/LVID while creating VGID/LVID indices
in dev cache).
For example:
$ dmsetup create test --table "0 1 linear /dev/sda 0"
And there's no PV in the system.
Before this patch (spurious error message issued):
$ pvs
_get_sysfs_value: /sys/dev/block/253:2/dm/uuid: no value
With this patch applied (no spurious error message):
$ pvs
If we're using persistent .cache file, we're reading this file instead
of traversing the /dev content. Fix missing indexing by VGID and LVID
here - hook this into persistent_filter_load where we populate device
cache from persistent .cache file instead of scanning /dev.
For example, inducing situation in which we warn about different device
actually used than what LVM thinks should be used based on metadata:
$ lsblk -s /dev/vg/lvol0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vg-lvol0 253:4 0 124M 0 lvm
`-loop1 7:1 0 128M 0 loop
$ lvmconfig --type diff
global {
use_lvmetad=0
}
devices {
obtain_device_list_from_udev=0
}
(obtain_device_list_from_udev=0 also means the persistent .cache file is used)
Before this patch - pvs is fine as it does the dev scan, but lvs relies
on persistent .cache file and it misses the VGID/LVID indices to check
and warn about incorrect devices used:
$ pvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
PV VG Fmt Attr PSize PFree
/dev/loop0 vg lvm2 a-- 124.00m 0
$ lvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
LV VG Attr LSize
lvol0 vg -wi-a----- 124.00m
With this patch applied - both pvs and lvs is fine - the indices are
always created correctly (lvs just an example here, other LVM commands
that rely on persistent .cache file are fixed with this patch too):
$ pvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
PV VG Fmt Attr PSize PFree
/dev/loop0 vg lvm2 a-- 124.00m 0
$ lvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
LV VG Attr LSize
lvol0 vg -wi-a----- 124.00m
It's possible that while a device is already referenced in sysfs, the node
is not yet in /dev directory.
This may happen in some rare cases right after LVs get created - we sync
with udev (or alternatively we create /dev content ourselves) while VG
lock is held. However, dev scan is done without VG lock so devices may
already be in sysfs, but /dev may not be updated yet if we call LVM command
right after LV creation (so the fact that fs_unlock is done within VG
lock is not usable here much). This is not a problem with devtmpfs as
there's at least kernel name for device in /dev as soon as the sysfs
item exists, but we still support environments without devtmpfs or
where different directory for dev nodes is used (e.g. our test suite).
This patch covers these situations by tracking such devices in
_cache.sysfs_only_names helper hash for the vgid/lvid check to work still.
This also resolves commit 6129d2e64d
which was then reverted by commit 109b7e2095
due to performance issues it may have brought (...and it didn't resolve
the problem fully anyway).
dmeventd daemon may call further code itself that looks at /dev, e.g.
via dmeventd_lvm2_command call. We need to have a consistent view of
the /dev content at that time. Therefore, sync /dev content before
calling monitoring hook which contacts dmeventd.
This problem was quite hidden before, but now it has manifested itself
because of recent additions to dev-cache code where we started looking
at device holders as seen in sysfs. What happened here was that the
device was already in sysfs, but not yet under /dev and this triggered
the new error message sometimes:
log_error("%s: failed to find associated device structure for holder %s.", devname, devpath);
This problem has manifested recently in our api/pytest.sh test from
testsuite where we create thin pool LVs and thin LVs and hence it also
causes dmeventd to be used as well and these error messages were
visible there.
UUID for LV is either "LVM-<vg_uuid><lv_uuid>" or "LVM-<vg_uuid><lv_uuid>-<suffix>".
The code before just checked the length of the UUID based on the first
template, not the variant with suffix - so LVs with this suffix were not
processed properly.
For example a thin pool LV (as an example of an LV that contains
sub LVs where UUIDs have suffixes):
[0] fedora/~ # lsblk -s /dev/vg/lvol1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vg-lvol1 253:8 0 4M 0 lvm
`-vg-pool-tpool 253:6 0 116M 0 lvm
|-vg-pool_tmeta 253:2 0 4M 0 lvm
| `-sda 8:0 0 128M 0 disk
`-vg-pool_tdata 253:3 0 116M 0 lvm
`-sda 8:0 0 128M 0 disk
Before this patch (spurious warning message about device mismatch):
[0] fedora/~ # pvs
WARNING: Device mismatch detected for vg/lvol1 which is accessing /dev/mapper/vg-pool-tpool instead of (null).
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 0
With this patch applied (no spurious warning message about device mismatch):
[0] fedora/~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 0
Check if the value we read from sysfs is not blank and replace the '\n'
at the end only when needed ('\n' should usually be there for sysfs values,
but better check this).
It's possible for an LVM LV to use a device during activation which
then differs from device which LVM assumes based on metadata later on.
For example, such device mismatch can occur if LVM doesn't have
complete view of devices during activation or if filters are
misbehaving or they're incorrectly set during activation.
This patch adds code that can detect this mismatch by creating
VG UUID and LV UUID index while scanning devices for device cache.
The VG UUID index maps VG UUID to a device list. Each device in the
list has a device layered above as a holder which is an LVM LV device
and for which we know the VG UUID (and similarly for LV UUID index).
We can acquire VG and LV UUID by reading /sys/block/<dm_dev_name>/dm/uuid.
So these indices represent the actual state of PV device use in
the system by LVs and then we compare that to what LVM assumes
based on metadata.
For example:
[0] fedora/~ # lsblk /dev/sdq /dev/sdr /dev/sds /dev/sdt
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdq 65:0 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev1 253:3 0 104M 0 mpath
sdr 65:16 0 104M 0 disk
`-mpath_dev1 253:3 0 104M 0 mpath
sds 65:32 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev2 253:4 0 104M 0 mpath
sdt 65:48 0 104M 0 disk
`-mpath_dev2 253:4 0 104M 0 mpath
In this case the vg-lvol0 is mapped onto sdq and sds becauset this is
what was available and seen during activation. Then later on, sdr and
sdt appeared and mpath devices were created out of sdq+sdr (mpath_dev1)
and sds+sdt (mpath_dev2). Now, LVM assumes (correctly) that mpath_dev1
and mpath_dev2 are the PVs that should be used, not the mpath
components (sdq/sdr, sds/sdt).
[0] fedora/~ # pvs
Found duplicate PV xSUix1GJ2SK82ACFuKzFLAQi8xMfFxnO: using /dev/mapper/mpath_dev1 not /dev/sdq
Using duplicate PV /dev/mapper/mpath_dev1 from subsystem DM, replacing /dev/sdq
Found duplicate PV MvHyMVabtSqr33AbkUrobq1LjP8oiTRm: using /dev/mapper/mpath_dev2 not /dev/sds
Using duplicate PV /dev/mapper/mpath_dev2 from subsystem DM, ignoring /dev/sds
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdq, /dev/sds instead of /dev/mapper/mpath_dev1, /dev/mapper/mpath_dev2.
PV VG Fmt Attr PSize PFree
/dev/mapper/mpath_dev1 vg lvm2 a-- 100.00m 0
/dev/mapper/mpath_dev2 vg lvm2 a-- 100.00m 0
Commit b64703401d cause regression
when handling stacked resize of pool metadata volume that would
be a raid LV.
Fix it by properly setting up size also for layer extension.
Recent kernel (4.4) start to report values smaller then sector size
(but in reporting size for SSD which support data zeroing on discard).
For now log warning and assume it really means 1 sector.
Addressing RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1313377
There's a window between doing VG read and checking PV device size
against real device size. If the device is removed in this window,
the dev cache still holds struct device and pv->dev still references
that and that PV is not marked as missing. However, if we're trying
to get size for such device, the open fails because that device
doesn't exists anymore.
We called existing pv_dev_size in _check_pv_dev_sizes fn. But
pv_dev_size assigned a size of 0 if the dev_get_size it called failed
(because the device is gone).
So call the dev_get_size directly and check for the return code
in _check_pv_dev_sizes and go further only if we really know the
device size. This is to avoid confusing warning messages like:
Device /dev/sdd1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
Specifying an output width of 0 now leads to a default minimum width
taken from the width of the column heading. (Most fields should use
this.)
Components of field names are generally separated by underscores (which
are optional at run-time). (Dropped 3 duplicate fields now covered by
this rule.)
Each field heading must be unique and generally doesn't have spaces
between words (which get capitalised) unless they are already short and
the fields are normally longer or clarity demands it.
This command option can be used to trigger a D-Bus
notification independent of the usual notifications
that are sent from other commands as an effect of
changes to PV/VG/LV state. If lvm is not built with
dbus notification support or if notify_dbus is disabled
in the config, this command will exit with an error.
When a command modifies a PV or VG, or changes the
activation state of an LV, it will send a dbus
notification when the command is finished. This
can be enabled/disabled with a config setting.
The code in _print_historical_lv function works with temporary
"descendants_buffer" that is allocated and freed within this
function.
When printing text out, we used "outf" macro which called
"out_text" fn and it checked return value and if failed,
the macro called "return_0" automatically. But since we
use the temporary buffer, if any of the out_text calls
fails, we need to deallocate this buffer properly - that's
the "goto_out", otherwise we'll be leaking memory.
So add new "outfgo" helper macro which does the same as "outf",
but it calls "goto_out" instead of "return_0" so we can jump
to a cleanup hook at the end.
Historical LV is valid as long as there is at least one live LV among
its ancestors. If we find any invalid (dangling) historical LVs, remove
them automatically.
The vg_strip_outdated_historical_lvs iterates over the list of historical LVs
we have and it shoots down the ones which are outdated.
Configuration hook to set the timeout will be in subsequent patch.
The metadata/record_lvs_history is global switch which enables or
disables recording historical LVs in metadata.
If both metadata/record_lvs_history=1 lvm.conf option and
--nohistory command switch is used at the same time, the
--nohistory prevails.
Report proper values for historical LVs in lv_layout and lv_role fields.
Any historical LV doesn't have any layout anymore and the role is "history".
For example:
$ lvs -H -o name,lv_attr,lv_layout,lv_role vg/-lvol1
LV Attr Layout Role
-lvol1 ----h----- none public,history
The lv_full_ancestors reporting field is just like the existing
lv_ancestors field but it also takes into account any history and
indirect relations recorded.
All names for historical LVs are prefixed with '-' character to make clear
difference between live and historical LVs. The '-' can't be set by users
for live LV names during lvcreate hence we never get into a conflict with
the names that user defines for live LVs.
The lv_historical reporting field is a simple binary field that reports
whether an LV is historical one ("historical" value or value of "1" displayed)
or not (blank string "" or value of "0" displayed).
This patch adds "include_historical_lvs" field to struct cmd_context to
make it possible for the command to switch between original funcionality
where no historical LVs are processed and functionality where historical
LVs are taken into account (and reported or processed further). The switch
between these modes is done using the '-H|--history' switch on command
line.
The include_historical_lvs state is then passed to process_each_* fns
using the "include_historical_lvs" field within struct processing_handle.
Add support for making an interconnection between thin LV segment and
its indirect origin (which may be historical or live LV) - add a new
"indirect_origin" argument to attach_pool_lv function.
Also export historical LVs when exporting LVM2 metadata.
This is list of all historical LVs listed in
"historical_logical_volumes" metadata section with all
the properties exported for each historical LV.
For example, we have this thin snapshot sequence:
lvol1 --> lvol2 --> lvol3
\
--> lvol4
We end up with these metadata:
logical_volume {
...
(lvol1, lvol3 and lvol4 listed here as usual - no change here)
...
}
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "lvol1"
descendants = ["lvol3", "lvol4"]
}
}
By removing lvol1 further, we end up with:
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "-lvol1"
descendants = ["lvol3", "lvol4"]
}
lvol1 {
id = "me0mes-aYnK-nRfT-vNlV-UiR1-GP7r-ojbROr"
creation_time = 1456919608 # 2016-03-02 12:53:28 +0100
removal_time = 1456919767 # 2016-03-02 12:56:07 +0100
}
}
When an LV is being removed, we create an instance of
"struct historical_logical_volume" wrapped up in
"struct generic_logical_volume".
All instances of "struct historical_logical_volume" are then recorded in
"historical_lvs" list which is part of "struct volume_group".
The "historical LV" is then interconnected with "live LVs" to
connect a history chain for the live LV.
The add_glv_to_indirect_glvs is a helper function that registers a
volume represented by struct generic_logical_volume instance ("glv")
as an indirect user of another volume ("origin_glv") and vice versa -
it also registers the other volume ("origin_glv") as indirect_origin
of user volume ("glv").
The remove_glv_from_indirect_glvs does the opposite.
The get_or_create_glv is helper function that retrieves any existing
generic_logical_volume wrapper for the LV. If the wrapper does not exist
yet, it's created.
The get_org_create_glvl is the same as get_or_create_glv but it creates
the glv_list wrapper in addition so it can be added to a list.
Add new structures and new fields in existing structures to support
tracking history of LVs (the LVs which don't exist - the have been
removed already):
- new "struct historical_logical_volume"
This structure keeps information specific to historical LVs
(historical LV is very reduced form of struct logical_volume +
it contains a few specific fields to track historical LV
properties like removal time and connections among other LVs).
- new "struct generic_logical_volume"
Wrapper for "struct historical_logical_volume" and
"struct logical_volume" to make it possible to handle volumes
in uniform way, no matter if it's live or historical one.
- new "struct glv_list"
Wrapper for "struct generic_logical_volume" so it can be
added to a list.
- new "indirect_glvs" field in "struct logical_volume"
List that stores references to all indirect users of this LV - this
interconnects live LV with historical descendant LVs or even live
descendant LVs.
- new "indirect_origin" field in "struct lv_segment"
Reference to indirect origin of this segment - this interconnects
live LV (segment) with historical ancestor.
- new "this_glv" field in "struct logical_volume"
This references an existing generic_logical_volume wrapper for this
LV, if used. It can be NULL if not needed - which means we're not
handling historical LVs at all.
- new "historical_lvs" field in "struct volume group
List of all historical LVs read from VG metadata.
Pair kernel_cache_settings with kernel_cache_policy.
There is very small chance in error case that the value in table
might be differnet from the value stored in metadata
so make it 'checkable'.
Showing 'u' in the pv_attr reporting field is mostly unnecessary because
most PVs are allocatable, and being allocatable implies it is (u)sed,
and this is already obvious from other fields in the default 'pvs'
output like the VG name.
So move the new (u)sed pv_attr from character position 4 to 1, and only
show it in those rare cases when the PV is not (a)llocatable or the
relevant metadata is missing.
(Scripts should not be using pv_attr, but rather pv_allocatable,
pv_exported, pv_missing, pv_in_use etc.)
Make the data_alignment variable 64 bits so it
can hold the invalid command line arg used in
pvreate-usage.sh pvcreate --dataalignment 1e.
On 32 bit arches, the smaller variable wouldn't
hold the invalid value so the error would not
trigger as expected by the test.
"pvcreate_each_params" was a temporary name used
to transition from the old "pvcreate_params".
Remove the old pvcreate_params struct and rename the
new pvcreate_each_params struct to pvcreate_params.
Rename various pvcreate_each_params terms to simply
pvcreate_params.
Use the new pvcreate_each_device() function from
toollib, previously added for pvcreate, in place
of the old pvcreate_vol().
This also requires shifting the location where the
lock is acquired for the new VG name. The lock for
the new VG is supposed to be acquired before pvcreate.
This means splitting the vg_lock_newname() out of
vg_create(), and calling vg_lock_newname() directly
before pvcreate, and then calling the remainder of
vg_create() after pvcreate.
The new function vg_lock_and_create() now does
vg_lock_newname() + vg_create(), like the previous
version of vg_create().
The lock on the new VG name is released before the
pvcreate and reacquired after the pvcreate because
pvcreate needs to reset lvmcache, which doesn't work
when locks are held. An exception could likely be
made for the new VG name lock, which would allow
vgcreate to hold the new VG name lock across the
pvcreate step.
This is common code for handling PV create/remove
that can be shared by pvcreate/vgcreate/vgextend/pvremove.
This does not change any commands to use the new code.
- Pull out the hidden equivalent of process_each_pv
into an actual top level process_each_pv.
- Pull the prompts to the top level, and do not
run any prompts while locks are held.
The orphan lock is reacquired after any prompts are
done, and the devices being created are checked for
any change made while the lock was not held.
Previously, pvcreate_vol() was the shared function for
creating a PV for pvcreate, vgcreate, vgextend.
Now, it will be toollib function pvcreate_each_device().
pvcreate_vol() was called effectively as a helper, from
within vgcreate and vgextend code paths.
pvcreate_each_device() will be called at the same level
as other process_each functions.
One of the main problems with pvcreate_vol() is that
it included a hidden equivalent of process_each_pv for
each device being created:
pvcreate_vol() -> _pvcreate_check() ->
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
pvcreate_each_device() reorganizes the code so that
each-VG-each-PV loop is done once, and uses the standard
process_each_pv function at the top level of the function.
This uses the vg->pv_write_list in place of the
vg->pvs_to_write list, and eliminates the use of
pvcreate_params. The label remove and zeroing
steps are shifted out of vg_write() to the higher
level like pvcreate will do.
The vg->pv_write_list contains pv_list structs for which
vg_write() should call pv_write().
The new list will replace vg->pvs_to_write that contains
vg_to_create structs which are used to perform higher-level
pvcreate-related operations. The higher level pvcreate
operations will be moved out of vg_write() to higher levels.
Since we already check in few other places 'info' is not NULL,
do the same for others - however when info would be NULL
it more or less looks like internal error.
Reshuffle messages during pvremove.
Always print WARNING: when PV is in use so using options
--force --force doesn't make this important user
notification go away.
Simplify variable 'used' usage (so older gcc doesn't warn
about the use of unitilizied variable).
Add some '.' into messages.
Currently it's been checked for 'zero' header for thin-pool,
but lets use it always for cache as well - since it's relatively 'cheap'
detection of read 'error' problems as thin/cache tools
currently do not work fast enough in this case.