IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This reverts commit fa69ed0bc8.
This code sometimes expects to be presented with a read-only filesystem
(during some boot sequences for example) and copes appropriately with
this and it should not lead to expected error messages that might cause
unnecessary alarm.
lv_name arg is only used without known LV for resolving '*lv'.
Once we know *lv, never use lv_name ever again.
So setting it when passing *lv has not needed.
Add code to support more LVs to be resized through a same code path
using a single lvresize_params struct.
(Now it's used for thin-pool metadata resize,
next user will be snapshot virtual resize).
Update code to adjust percent amount resize for use_policies.
Properly activate inactive thin-pool in case of any pool resize
as the command should not 'deffer' this operation to next activation.
Use common API design and pass just LV pointer to lv_manip.c functions.
Read cmd struct via lv->vg->cmd when needed.
Also do not try to return EINVALID_CMD_LINE error when we
have already openned VG - this error code can only be returned before
locking VG.
We have only 2 users of _lv_active() - one was already checking for ==1
while the other use (_lv_is_active()) could have take '-1' as a sign of having
an LV active. So return 0 and log_debug also the reason while detection
has failed (i.e. in case --driverload n - it's kind of expectable,
but might have confused user seeing just <backtrace>).
This fixes commit f50d4011cd which
introduced a problem when using older lvm2 code with newer libdm.
In this case, the old LVM didn't recognize new _LOG_BYPASS_REPORT flag
that libdm-report code used. This ended up with no output at all
from libdm where log_print_bypass_report was called because the
_LOG_BYPASS_REPORT was not masked properly in lvm2's print_log fn
which was called as callback function for logging.
With this patch, the lvm2 registers separate print_log_libdm logging
function for libdm instead. The print_log_libdm is exactly the same
as print_log (used throughout lvm2 code) but it checks whether we're
printing common line on output where "common" means not going to stderr,
not a warning and not an error and if we are, it adds the
_LOG_BYPASS_REPORT flag so the log_print goes directly to output, not
to any log report.
So this achieves the same goal as in f50d4011cd,
just doing it in a way that newer libdm is still compatible with older
lvm2 code (libdm-report is the only code using log_print).
Looking at the opposite mixture - older libdm with newer lvm2 code,
that won't be compilable because the new log report functionality
that is in lvm2 also requires new dm_report_group_* libdm functions
so we don't need to care here.
Move code from original print_log fn to a separate _vprint_log function
that accepts va_list and make print_log a wrapper over _vprint_log.
The print_log just initializes the va_list and uses it for _vprint_log
call now. This way, we can reuse _vprint_log if needed.
In commit 6ae22125, vgcfgrestore began disabling lvmetad
while running, and rescanned to enable it again at the end,
but missed the rescanning/enabling in the error case.
Reconnect to lvmetad if either the send fails (e.g. lvmetad
was restarted since lvmlockd last connected), or if no
lvmetad connection exists (e.g. lvmetad was started after
lvmlockd so no previous connection existed.)
Currently, a shutdown signal will cause lvmetad to quit
responding to new connections, but not actually exit until
all connections are gone. If a program is maintaining a
long running connection (e.g. lvmlockd, or even an lvm
command) when lvmetad gets a shutdown signal, then all
further commands will hang indefinately waiting for a
response that won't be sent.
With this patch, make lvmetad continue handling new
connections even after a shutdown signal. It will exit
once all connections are gone.
Previously, vgcfgrestore would attempt to vg_remove the
existing VG from lvmetad and then vg_update to add the
restored VG. But, if there was a failure in the command
or with vg_update, the lvmetad cache would be left incorrect.
Now, disable lvmetad before the restore begins, and then
rescan to populate lvmetad from disk after restore has
written the new VG to disk.
Reporting commands can be of different types (even if the command name
is the same):
- pvs command can be either of PVS, PVSEGS or LABEL report type,
- vgs command is of VGS report type,
- lvs command is of LVS or SEGS report type.
Use basic report type when looking for report prefix used for
--configreport option.
This means that:
- 'pvs --configreport pv' applies to PVS, PVSEGS or LABEL report type
- 'vgs --configreport vg' applies to VGS report type
- 'lvs --configreport lv' applies to LVS and SEGS report type
The DM_REPORT_OUTPUT_MULTIPLE_TIMES instructs reporting code to
keep rows even after dm_report_output call - the rows are not
destroyed in this case which makes it possible to call dm_report_output
multiple times.
This allows for moving parts of the code from dm_report_object to
dm_report_output which is important for subsequent patches that allow
for repeated dm_report_output, not destroying rows on each
dm_report_output call.
log_print is used during cmd line processing to log the result of the
operation (e.g. "Volume group vg successfully changed" and similar).
We don't want output from log_print to be interleaved with current
reports from group where log is reported as well. Also, the information
printed by log_print belongs to the log report too, so it should be
rerouted to log report if it's set.
Since the code in libdm-report which is responsible for doing the report
output uses log_print too, we need to use a different kind of log_print
which bypasses any log report currently used for logging (...simply,
we can't call log_print to output the log report itself which in turn
would again reroute to report - the report would never get on output
this way).
This patch adds structures and functions to reroute error and warning
logs to log report, if it's set.
There are 5 new functions:
- log_set_report
Set log report where logging will be rerouted.
- log_set_report_context
Set context globally so any report_cmdlog call will use it.
- log_set_report_object_type
Set object type globally so any report_cmdlog call will use it.
- log_set_report_object_name_and_id
Set object ID and name globally so any report_cmdlog call will use it.
- log_set_report_object_group_and_group_id
Set object group ID and name globally so any report_cmdlog call will use it.
These functions will be called during LVM command processing so any logs
which are rerouted to log report contain proper information about current
processing state.
The lvm fullreport works per VG and as such, the vg, lv, pv, seg and
pvseg subreport is done for each VG. However, if the PV is not part of
any VG yet, we still want to display pv and pvseg subreports for these
"orphan" PVs - so enable this for lvm fullreport's process_each_vg call.
If we have fullreport, make sure that the options/sort keys used for
each report doesn't change its type - we want to preserve the original
type so it's always 5 different subreports within fullreport (vg, lv, pv,
seg, pvseg). Since we have all report types within fullreport, users
should add fields under proper subreport type - this minimizes
duplication of info displayed on output.
Groupable args (the ones marked with ARG_GROUPABLE flag) start a new
group of args if:
- this is the first time we hit such a groupable arg,
- or if non-countable arg is repeated.
However, there may be cases where we want to give priorities when
forming groups and hence force new group creation if we hit an arg
with higher grouping priority.
For example, let's assume (for now) hypothetical sequence of args used:
lvs -o lv_name --configreport log -o log_type --configreport lv -o +vg_name
Without giving any priorites, we end up with:
lvs -o lv_name --configreport log -o log_type --configreport lv -o +vg_name
| | | | | |
\__________GROUP1___________/ \________GROUP2___________/ \_GROUP3_/
This is because we hit "-o" as the first groupable arg. The --configreport,
even though it's groupable too, it falls into the previous "-o" group.
While we may need to give priority to the --configreport arg that should
always start a new group in this scenario instead:
lvs -o lv_name --configreport log -o log_type --configreport lv -o +vg_name
| | | | | |
\_GROUP1_/ \_________GROUP2___________/ \_________GROUP3__________/
So here "-o" started a new group but since "--configreport" has higher
priority than "-o", it starts fresh new group now and hence the rest of
the command line's args are grouped by --configreport now.
lvm fullreport executes 5 subreports (vg, pv, lv, pvseg, seg) per each VG
(and so taking one VG lock each time) within one command which makes it
easier to produce full report about LVM entities.
Since all 5 subreports for a VG are done under a VG lock, the output is
more consistent mainly in cases where LVM entities may be changed in
parallel.
Add any report (pvs/vgs/lvs) currently processed to current report group
which is part of processing handle and which already contains log
report. This way both log report and pvs/vgs/lvs report will be
reported as a whole within a group, thus having same output format as
selected by --reportformat option.
If there's parent processing handle, we don't need to create completely
new report group and status report - we'll just reuse the one already
initialized for the parent.
Currently, the situation where this matter is when doing internal report
to do the selection for processing commands where we have parent processing
handle for the command itself and processing handle for the selection
part (that is selection for non-reporting tools).
Wire up report group creation with log report in struct
processing_handle and call report_format_init during processing handle
initialization (init_processing_handle fn) and destroy it while
destroing processing handle (destroy_processing_handle fn).
This way, all the LVM command processing using processing handle
has access to log report via which the current command log
can be reported as items are processed.
Separating common report and per-report arguments prepares the code for
handling several reports per one command (for example, the command log
report and LVM command report itself).
Each report can have sort keys, options (fields), list of fields to
compact and selection criteria set individually. Hooks for setting these
per report within one command will be a part of subsequent patches, this
patch only separates new struct single_report_args out of existing
struct report_args.
New report/output_format configuration sets the output format used
for all LVM commands globally. Currently, there are 2 formats
recognized:
- basic (the classical basic output with columns and rows, used by default)
- json (output is in json format)
Add new --reportformat option and new report_format_init function that
checks this option and creates new report group accordingly, also
preparing log report handle and adding it to the report group just
created.
This is a preparation for new CMDLOG report type which is going to be
used for reporting LVM command log.
The new report type introduces several new fields (log_seq_num, log_type,
log_context, log_object_type, log_object_group, log_object_id, object_name,
log_message, log_errno, log_ret_code) as well as new configuration settings
to set this report type (report/command_log_sort and report/command_log_cols
lvm.conf settings).
This patch also introduces internal report_cmdlog helper function
which is a wrapper over dm_report_object to report command log via
CMDLOG report type and which is going to be used throughout the code
to report the log items.
This patch introduces DM_REPORT_GROUP_JSON report group type. When using
this group type and when pushing a report to such a group, these flags
are automatically unset:
DM_REPORT_OUTPUT_ALIGNED
DM_REPORT_OUTPUT_HEADINGS
DM_REPORT_OUTPUT_COLUMNS_AS_ROWS
...and this flag is set:
DM_REPORT_OUTPUT_BUFFERED
The whole group is encapsulated in { } for the outermost JSON object
and then each report is reported on output as array of objects where
each object is the row from report:
{
"report_name1": [
{field1="value", field2="value",...},
{field1="value", field2="value",...}
...
],
"report_name2": [
{field1="value", field2="value",...},
{field1="value", field2="value",...}
...
]
...
}
This patch introduces DM_REPORT_GROUP_BASIC report group type. This
type has exactly the classical output format as we know from before
introduction of report groups. However, in addition to that, it allows
to put several reports into a group - this is the very basic grouping
scheme that doesn't change the output format itself:
Report: report1_name
Header1 Header2 ...
value value ...
value value ...
... ... ...
Report: report2_name
Header1 Header2 ...
value value ...
value value ...
... ... ...
There's no change in output for this report group type - with this type,
we only make sure there's always only one report in a group at a time,
not more.
This patch introduces DM report group (represented by dm_report_group
structure) that is used to group several reports to make a whole. As a
whole, all the reports in the group follow the same settings and/or
formatting used on output and it controls that the output is properly
ordered (e.g. the output from different reports is not interleaved
which would break readability and/or syntax of target output format
used for the whole group).
To support this feature, there are 4 new functions:
- dm_report_group_create
- dm_report_group_push
- dm_report_group_pop
- dm_report_group_destroy
From the naming used (dm_report_group_push/pop), it's clear the reports
are pushed onto a stack. The rule then is that only the report on top
of the stack can be reported (that means calling dm_report_output).
This way we make sure that the output is not interleaved and provides
determinism and control over the output.
Different formats may allow or disallow some of the existing report
flags controlling output itself (DM_REPORT_OUTPUT_*) to be set or not so
once the report is pushed to a group, the grouping code makes sure that
all the reports have compatible flags set and then these flags are
restored once each report is popped from the report group stack.
We also allow to push/pop non-report item in which case such an item
creates a structure (e.g. to put several reports together with any
opening and/or closing lines needed on output which pose as extra
formatting structure besides formatting the reports).
The dm_report_group_push function accepts an argument to pass any
format-specific data needed (e.g. handle, name, structures passed
along while working with reports...).
We can call dm_report_output directly anytime we need (with the only
restriction that we can call dm_report_output only for the report that
is currently on top of the group's stack). Or we don't need to call
dm_report_output explicitly in which case all the reports in a stack are
reported on output automatically once we call dm_report_group_destroy.
This fixes a problem in commit ae0a8740c. The problem
in that commit was that all existing PVs are initially
dropped from lvmetad. This works if the VG is updated
at the end, which replaces the dropped PVs, but if the
rescan finds that the VG seqno is unchanged, it leaves
the cached VG in place. So, we should only drop the
existing PVs in lvmetad when the VG is going to be updated.
commit 15da467b was meant to address the case where
use_lvmetad=1 in lvm.conf, and lvmetad is not available,
in which case, pvscan --cache -aay should activate LVs.
But the commit unintentionally also changed the case
where use_lvmetad=0 in lvm.conf, in which case
pvscan --cache -aay should not activate LVs, so fix
that here.
When pvscan --cache -aay fails to connect to lvmetad it will
simply exit and do nothing. Change this so that it will
skip the lvmetad cache step and do the activation step from
disk.
We were initially looking to see if an LV was hidden and if it was we were
creating an instance of a LvCommon object to represent it. Thus if we
had a hidden cache pool for example we were missing the methods and
properties for the cache pool. However, when we create the object path,
any hidden LVs, regardless of type/functionality will be placed in the
hidden path.
The object manager method get_object_by_lvm_id was used in many cases for
the sole reason of getting the object path for the object. Instead of
retrieving the object and then calling 'dbus_object_path' on the object, we
are adding a method which returns the object path.
When we are processing the LVs we need to build up dbus objects from least
dependent to most dependent, so that we have information available when
constructing.
Original code missed to catch all apperances of SIGINT.
Also enhance logging when running in shell without tty.
Accept this regex as valid input:
'^[ ^t]*([Yy]([Ee]([Ss]|)|)|[Nn]([Oo]|))[ ^t]*$'
Some commands scan labels to populate lvmcache multiple
times, i.e. lvmcache_init, scan labels to fill lvmcache,
lvmcache_destroy, then later repeat
Each time labels are scanned, duplicates are detected,
and preferred devices are chosen. Each time this is done
within a single command, we want to choose the same
preferred devices. So, check for existing preferences
when choosing preferred devices.
This also fixes a problem with the list of unused duplicate
devs when run in an lvm shell. The devs had been allocated
from cmd memory, resulting in invalid list entries between
commands.
A number of places are working on a specific dev when they
call lvmcache_info_from_pvid() to look up an info struct
based on a pvid. In those cases, pass the dev being used
to lvmcache_info_from_pvid(). When a dev is specified,
lvmcache_info_from_pvid() will verify that the cached
info it's using matches the dev being processed before
returning the info. Calling code will not mistakenly
get info for the wrong dev when duplicate devs exist.
This confusion was happening when scanning labels when
duplicate devs existed. label_read for the first dev
would add an info struct to lvmcache for that dev/pvid.
label_read for the second dev would see the pvid in
lvmcache from first dev, and mistakenly conclude that
the label_read from the second dev can be skipped
because it's already been done. By verifying that the
dev for the cached pvid matches the dev being read,
this mismatch is avoided and the label is actually read
from the second duplicate.
If a command gets stuck during an lvmetad update, lvmetad
will cancel that update after the timeout. The next command
to check the lvmetad will see that lvmetad needs to be
populated because lvmetad will return token of "none" after
a timed out update (same as when lvmetad is not populated
at all after starting.)
If a command gets an error during an lvmetad update, it
will now just quit and leave its updating token in place.
That update will be cancelled after the timeout.
Commit #5b3a4a9 caused the "name" variable to be cleared if
declaration and assignment is on two lines so put it back
so it's on one line for it to work again.
pvmove began processing tags unintentionally from commit,
6d7dc87cb pvmove: use toollib
pvmove works on a single PV, but tags can match multiple PVs.
If we allowed tags, but processed only the first matching PV,
then the resulting PV would be unpredictable.
Also, the current processing code does not allow us to simply
report an error and do nothing if more than one PV matches the tag,
because the command starts processing PVs as they are found,
so it's too late to do nothing if a second PV matches.
If configuration consists of several sources in config cascade
("config cascade" defined in man lvmconfig(8)), lvmconfig displayed
only difference from defaults of the topmost config in the cascade.
Fix lvmconfig to display complete difference, considering all
the configuration in the cascade.
For example, before this patch:
(use_lvmetad=0 set in lvm.conf which differs from defaults)
$ lvmconfig --type diff
global {
use_lvmetad=0
}
(compact_output=1 set on cmd line)
$ lvmconfig --type diff --config report/compact_output=1
report {
compact_output=1
}
(headings=0 set in profile)
$ lvmconfig --type diff --commandprofile test
report {
headings=0
}
(difference in topmost configuration source is displayed)
$ lvmconfig --type diff --commandprofile test --config report/compact_output=1
report {
compact_output=1
}
With this patch applied (the config cascade is merged before looking for
difference from defaults in configuration):
$ lvmconfig --type diff
global {
use_lvmetad=0
}
$ lvmconfig --type diff --config report/compact_output=1
report {
compact_output=1
}
global {
use_lvmetad=0
}
$ lvmconfig --type diff --profile test
report {
headings=0
}
global {
use_lvmetad=0
}
$ lvmconfig --type diff --profile test --config report/compact_output=1
report {
headings=0
compact_output=1
}
global {
use_lvmetad=0
}
Treat loop device created with 'losetup -P' as regular
partitioned device - so if it has partition table,
prevent its usage in commands like 'pvcreate'.
Before 'pvcreate /dev/loop0' could have erased and formated as PV,
after this patch, device is filtered out and cannot be used.
All the variables for sscanf in lvmlockctl.c and lvmlockd-sanlock.c are
zeroed before sscanf call so the failure is detected by seeing the zero
value instead of proper one in subsequent code - so use (void) for
sscanf calls to ignore return value here.
Before this fix, when reporting 'lvm devtypes', the report was
initialized with incorrect reserved values - the ones used for
pvs/vgs/lvs report were used instead of NULL value (because devtypes
doesn't have any reserved values).
For example, trying to (incorrectly) use lv_name for the -S|--select
with lvm devtypes which doesn't have this field at all:
Before this patch (internal error issued):
$ lvm devtypes -S 'lv_name=lvol0'
Internal error: _check_reserved_values_supported: field-specific reserved value of type 0x0 for field not supported
Internal error: dm_report_init_with_selection: trying to register unsupported reserved value type, skipping report selection
DevType MaxParts Description
aoe 16 ATA over Ethernet
ataraid 16 ATA Raid
bcache 1 bcache block device cache
...
With this patch applied (correct error displayed about
unrecognized selection field):
$ lvm devtypes -S 'lv_name=lvol0'
Device Types Fields
-------------------
devtype_name - Name of Device Type exactly as it appears in /proc/devices. [string]
devtype_max_partitions - Maximum number of partitions. (How many device minor numbers get reserved for each device.) [number]
devtype_description - Description of Device Type. [string]
Special Fields
--------------
selected - Set if item passes selection criteria. [number]
help - Show help. [unselectable number]
? - Show help. [unselectable number]
Unrecognised selection field: lv_name
Selection syntax error at 'lv_name=lvol0'.
Use 'help' for selection to get more help.
Convert fields into using a single status ioctl call per LV.
This is a bit tricky since when there are more complicated
stacks, at this moment its undefined which values should be shown.
It's clear we need to cache more then single ioctl per LV,
but also we need to define more explicitely relation between
reported values for snapshots.
This patch is not a final state, rather a transitional step.
It should not be giving more 'worst' values then previous
many-ioctl-calls-per-lv solution.
Add function to obtain percentage value for cache lv_seg_status.
This API is rather evolving 'middle' step as the ultimate goal
is segment API fuctionality.
But first we need to be clear at reporting level which values
are needed to be reported for which LVs and segments.
Add more code to properly store status for snapshot segment
maintaining lvm2 fiction of COW and snapshot internal volumes.
The key issue here is however not though-through reporting
logic - as there is no single answer for whole line state.
It not counting with layer and we may need few more ioctl to
cover all reporting needs depending upon what is actually
needed.
In reality we need to 'cache' more ioctl status queries for
individual LVs and their segments (so they checked at most once).
The other 'hard' topic for conversion is mirror segment handling.
Also we definitelly need to relocate some logic into segment's methods,
yet it might be complex as we have not clear border between targets.
TODO: define more clearly how are reporting fields defined in case
we 'stack' volumes like - cache of stacked thin LV snapshot origin.
lv_refresh_suspend_resume() has escaped with fail ret code
after failing suspend and could have left many volumes in suspend state.
So always unconditionally call resume also when suspend has failed.
To get better control when flushing is used add extra arg when
setting up dm task.
By default now check dm device status without flush.
(At this moment this should effect only thin and cache volumes).
Also switch dev_manager_thin_pool_status() to use more
readable 'flush' parameter instead of 'no_flush'.
Check first the LV is cow before even checking it's a merging COW.
Note: previosly merging_cow was also merging origin, so without
this explicit check it used to return '1' also when passed
LV has been merging origin.
When mirror/raid called copy_percent function to return,
when 100% was supposed to be returned, wrong float 100.0 value
could have been reported back instead of dm_percent_t DM_PERCENT_100.
There is broken API somewhere, since the function here rely on
actively being modifid VG content even when doing 'lvs' operation.
(extents_copies)
This reverts commit 8fd886f735.
This was a deliberate omission because logging token-by-token metadata
parsing greatly increases the amount of logging for hardly any benefit.
In general, only LVM config file settings need to be logged, and in
places where it's considered important to log particular elements of
metadata that should be done using specific log_* lines.
This area can be revisited.
In the same way that process_each_vg() can be passed
a single VG name to process, also allow process_each_lv()
to be passed a single VG name and LV name to process.
This refactors the code for autoactivation. Previously,
as each PV was found, it would be sent to lvmetad, and
the VG would be autoactivated using a non-standard VG
processing function (the "activation_handler") called via
a function pointer from within the lvmetad notification path.
Now, any scanning that the command needs to do (scanning
only the named device args, or scanning all devices when
there are no args), is done first, before any activation
is attempted. During the scans, the VG names are saved.
After scanning is complete, process_each_vg is used to do
autoactivation of the saved VG names. This makes pvscan
activation much more similar to activation done with
vgchange or lvchange.
The separate autoactivate phase also means that if lvmetad
is disabled (either before or during the scan), the command
can continue with the activation step by simply not using
lvmetad and reverting to disk scanning to do the
activation.
Add support for active cache LV.
Handle --cachemode args validation during command line processing.
Rework some lvm2 internal to use lvm2 defined CACHE_MODE enums
indepently on libdm defines and use enum around the code instead
of passing and comparing strings.
Don't use lvm_init() to create a full command context, which
does a lot of command setup (like connecting to daemons), which
is unnecessary for simply reading a value from lvm.conf.
Passing a NULL context arg to the lvm_config_ function is now
allowed, in which case lvm.conf is read without doing lvm
command setup.
A program may be using liblvm2app for simply checking a config
setting in lvm.conf. In this case, a full lvm context is not
needed, only cmd->cft (which are the config settings read from
lvm.conf).
lvm_config_find_bool() can now be passed a NULL lvm context
in which case it will only create cmd->cft, check the config
setting asked for, and destroy the cmd.
Only call lvm_init() when it's needed so that simply
loading the lvm python code in another program doesn't
make that program do lvm initialization.
The version call doesn't need a handle.
The garbage collection can just do lvm_quit to destroy
the command. The next call that needs lvm_init will
do it first.
When setting up a toolcontext, the lib init function
was detecting an error when there was none, and then
it was returning an incompletely initialized cmd struct
instead of NULL. The effect was that the lib would try
to use the uninitialized cmd struct and segfault.
This would happen if a non-fatal error occurred during
cmd setup, e.g. user permission failed on lvmetad socket,
causing cmd to fall back to scanning and not use lvmetad.
The only real error condition is when create_toolcontext
returns NULL. If cmd is returned, the lib can use it.
The _report fn is getting big - separate it in two:
- _report fn to get all the options and arguments
- _do_report fn for reporting itself
Also, place all the variables/arguments in one structure for easier
handling of the variables around.
If a command begins repopulating the lvmetad cache,
and fails part way through, it should set the disabled
state in lvmetad so other commands don't use bad data.
If a subsequent scan succeeds, the disabled state is
cleared.
If duplicate devices exist for a PV, and one device's
size matches the PV size, but the other doesn't, then
prefer the matching device.
If one device is used by an active LV, prefer that device.
When there are duplicate devices for a PV, one device
is preferred and chosen to exist in the VG. The other
devices are not used by lvm, but are displayed by pvs
with a new PV attr "d", indicating that they are
unchosen duplicate PVs.
The "duplicate" reporting field is set to "duplicate"
when the PV is an unchosen duplicate, and that field
is blank for the chosen PV.
Previously, duplicate PVs were processed as a side effect
of processing the "chosen" PV in lvmcache. The duplicate
PV would be hacked into lvmcache temporarily in place of
the chosen PV.
In the old way, we had to always process the "chosen" PV
device, even if a duplicate of it was named on the command
line. This meant we were processing a different device than
was asked for. This could be worked around by naming
multiple duplicate devs on the command line in which case
they were swapped in and out of lvmcache for processing.
Now, the duplicate devs are processed directly in their
own processing loop. This means we can remove the old
hacks related to processing dups as a side effect of
processing the chosen device. We can now simply process
the device that was named on the command line.
When the same PVID exists on two or more devices, one device
is preferred and used in the VG, and the others are duplicates
and are not used in the VG. The preferred device exists in
lvmcache as usual. The duplicates exist in a specical list
of unused duplicate devices.
The duplicate devs have the "d" attribute and the "duplicate"
reporting field displays "duplicate" for them.
'pvs' warns about duplicates, but the formal output only
includes the single preferred PV.
'pvs -a' has the same warnings, and the duplicate devs are
included in the output.
'pvs <path>' has the same warnings, and displays the named
device, whether it is preferred or a duplicate.
Wait to compare and choose alternate duplicate devices until
after all devices are scanned. During scanning, the first
duplicate dev is kept in lvmcache, and others are kept in a
new list (_found_duplicate_devs).
After all devices are scanned, compare all the duplicates
available for a given PVID and decide which is best.
If the dev used in lvmcache is changed, drop the old dev
from lvmcache entirely and rescan the replacement dev.
Previously the VG metadata from the old dev was kept in
lvmcache and only the dev was replaced.
A new config setting devices/allow_changes_with_duplicate_pvs
can be set to 0 which disallows modifying a VG or activating
LVs in it when the VG contains PVs with duplicate devices.
Set to 1 is the old behavior which allowed the VG to be
changed.
The logic for which of two devs is preferred has changed.
The primary goal is to choose a device that is currently
in use if the other isn't, e.g. by an active LV.
. prefer dev with fs mounted if the other doesn't, else
. prefer dev that is dm if the other isn't, else
. prefer dev in subsystem if the other isn't
If neither device is preferred by these rules, then don't
change devices in lvmcache, leaving the one that was found
first.
The previous logic for preferring a device was:
. prefer dev in subsystem if the other isn't, else
. prefer dev without holders if the other has holders, else
. prefer dev that is dm if the other isn't
When duplicate PVs are detected, set the disabled
flag so that commands will disable use of lvmetad.
This duplicate detection is done by lvmetad itself
when it's told about a single new PV with a PVID
that matches an existing PV on another device.
(This is different from the case where the command
is scanning all devices and detects the duplicate.)
Remove the "altdev" logic that attempted to keep
track of multiple devices for a single PV. It
is no longer used since lvmetad is disabled in
this case.
For raid1 use chunksize as bitmap-chunk specification.
Always enforce usage of bitmap - getting comparable outcome
as lvm2 raid support uses.
Add udev_wait after stopping md array - as in fact leg-device
are still in use by target even command has finished.
(mdadm --stop causes WATCH rule wakeup, and
ioctl(STOP_ARRAY) returns IMHO to early - it should finish
and fsync work on leg devices first).
Support parsing --chunksize option also when converting.
Now user can use cache pool created with i.e. 32K chunksize,
while in caching user can select 512K blocks.
Tool is supposed to validate cache metadata size is big enough
to support such chunk size. Otherwise error is shown.
When creating LV - in some case we change created segment type
(ATM for cache and snapshot) and we then manipulate with
lv segment according to 'lp' segtype.
Fix this by checking for proper type before accessing segment members.
This makes command like:
lvcreate --type cache-pool -L10 vg/cpool
lvcreate -H -L10 --cachesettings migtation_threshold=10000 vg/cpool
to pass since now tool correctly selects default cache policy.
Rather than doing repeated translations from name to
device when comparing args to existing PVs, do one
translation of the arg names and saving the device,
before checking existing PVs.
If there's an activation volume_filter, it might not be possible
to activate the rmeta LVs to wipe them. At least inherit any
LV tags from the parent LV while attempting this.
pvscan autoactivation has its own VG processing implementation,
so it can't properly handle things like foreign or shared VGs,
so make it ignore those VG types (or errors from them) as best
as possible.
Add a FIXME stating that pvscan autoactivation must really be
moved to the standard VG processing by calling process_each_vg
to do activation once the scanning / cache update is finished.
Checking for devices uses is_missing_pv() to check
if there is a device for the PV. is_missing_pv()
is based on the MISSING_PV flag, which does not
always correspond to !pv->dev. When using lvmetad,
a command like:
pvs --config 'devices/filter=["a|/dev/sdb|", "r|.*|"]'
will cause a number of PVs to have NULL pv->dev, but
not the MISSING_PV flag. So, NULL pv->dev needs to
also be checked.
Before executing modprobe for given module name, just check
if the module is not already present in /sys/module.
Useful when checking dm-cache-policy modules as we do not
having matching interface like for targets.
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
Probably not worth mentioning "segments" here, just state that devices
for an LV can't be all found during the check - it's less mysterious for
user then:
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find all devices for LV fedora/root while checking used and assumed devices.
WARNING: Couldn't find all devices for LV fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
When checking assumed PVs against real devices used for LVs and if
there's no device assigned for an assumed PV (e.g. due to filters),
do log_warn instead of log_error and continue checking LV segments
and associated assumed PVs further, just like we do log_warn elsewhere
in this situation.
This way user will see the warning for each LV which couldn't be
checked completely against real PVs used. Before, we logged only
the very first occurence of missing device for an LV in a VG and we
returned from the function doing this check for all the LVs in VG
immediately which may be a bit misleading because it didn't tell
user about all the other LVs and whether they could be checked
or not.
For example, we have this setup:
[0] fedora/~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
/dev/vda2 fedora lvm2 a-- 19.49g 0
[0] fedora/~ # lvs -o+devices
LV VG Attr LSize Devices
root fedora -wi-ao---- 19.00g /dev/vda2(0)
swap fedora -wi-ao---- 500.00m /dev/vda2(4864)
Before this patch (only the very first LV in a VG is logged to have a
problem while checking used and assumed devices):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
With this patch applied (all LVs where we hit problem while checking
used and assumed devices are logged and it's warning, not error):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
Check that @stats_list and @stats_print returned data in the
_stats_parse_list() and _stats_parse_region() functions before
attempting to operate on region and area values.
This avoids a coverity warning since fgets() could potentially
return no data from the memory buffer returned by the ioctl.
In both cases the ioctl would return an error, preventing these
functions from running, however it is cleaner to test for the
condition explicitly and fail in those cases.
If lvmetad is running, and a command opts to not use it
(--config global/use_lvmetad=0), and the command changes
metadata, then the metadata change is not visible to
lvmetad. Subsequent commands using lvmetad to change
metadata may cause corruption based on the invalid
lvmetad state.
Eventually we can set the disabled state in lvmetad
to prevent this problem, but for now print a warning
about the possibility.
When command is not using lvmetad because
use_lvmetad=0 in the config, but the lvmetad
pidfile exists, print a warning (previously
this checked for the socket existing instead
of the pidfile existing.)
vg/snapshotN should not appear anywhere.
No code should be showing this, but it was noticed in some logs last
week and we can deal with it in display_lvname().
When user requested on cmdline disabling of lvmetad/lvmpoll,
respect it and when lvmlockd requires these daemon,
Error configure with clear message about misconfiguration.
The lvmetad connection is created within the
init_connections() path during command startup,
rather than via the old lvmetad_active() check.
The old lvmetad_active() checks are replaced
with lvmetad_used() which is a simple check that
tests if the command is using/connected to lvmetad.
The old lvmetad_set_active(cmd, 0) calls, which
stopped the command from using lvmetad (to revert to
disk scanning), are replaced with lvmetad_make_unused(cmd).
The test was a weak attempt at verifying the special
combination of lvchange/vgchange -aay --sysinit, but
was only looking for lvmetad connection warnings.
Update the warning checks, and check the LV activation
state directly which is the main point.
Rename the test to reflect its purpose of checking
the -aay --sysinit combination.
Update the check about lvmetad running but not used.
Also add tests related to the new lvmetad disabled state.
lvm1 metadata is used here to test the disabled state
because lvm1 metadata is the first condition using the
disabled state.
After a device rescan that repopulates lvmetad,
if no reason for disabling lvmetad was seen
(lvm1 metadata or duplicate PVs), then clear
the disabled flag in lvmetad. This allows
commands to resume using the lvmetad cache
after the cause for disabling it has been removed.
Commands already check if the lvmetad token is valid,
and if not, they rescan devices to repopulate lvmetad
before running. Now, in addition to checking the
lvmetad token, they also check if the lvmetad disabled
flag is set. If so, they do not use the lvmetad cache
and revert to disk scanning.
A global flag in lvmetad indicates it has been disabled.
Other flags indicate the reason it was disabled.
These flags can be queried using get_global_info.
The lvmetactl debugging utility can set and clear the
disabled flag in lvmetad. Nothing else sets the
disabled flag yet.
Commands will check these flags after connecting to
lvmetad. If the disabled flag is set, the command
will not use the lvmetad cache, but revert to disk
scanning.
To test this feature:
$ lvmetactl get_global_info
response = "OK"
global_invalid = 0
global_disable = 0
disable_reason = "none"
token = "filter:3041577944"
$ vgs
(should report VGs from lvmetad)
$ lvmetactl set_global_disable 1
$ lvmetactl get_global_info
response = "OK"
global_invalid = 0
global_disable = 1
disable_reason = "DIRECT"
token = "filter:3041577944"
$ vgs
WARNING: Not using lvmetad because the disable flag was set directly.
(should report VGs without contacting lvmetad)
$ lvmetactl set_global_disable 0
$ vgs
(should report VGs from lvmetad)
process_each_pv was doing:
1. lvmcache_seed_infos_from_lvmetad()
sends pv_list request to lvmetad.
2. get_vgnameids()
sends vg_list request to lvmetad.
3. _get_all_devices()
first calls lvmcache_seed_infos_from_lvmetad(),
which is a no-op if it's already been called.
Because get_vgnameids() does not use the information
from lvmcache_seed_infos_from_lvmetad(), it does not
need to be called prior to get_all_devices where
it is actually needed.
Improve code for snapshot merge for readabilty
and also reduce number of tests needed to decide
if merging can or cannot be started.
(Further improving 9cccf5245a)
When dm_tree_find_node_by_uuid() fails to find passed uuid,
report in lof_debug the complete original uuid,
not the one stripped of LVM- prefix.
TODO: inspect manipulation with LVM- prefix here.
To recognize in runtime if we are merging or not
to make consistent decision between suspend and resume
add function to parse thin table line when add
merging thin device to the table.
A snapshot merge into its origin cannot be initiated while the devices
are in use. If there is outside interference (such as from udev),
the suspend (preload) and resume stages can reach conflicting decisions
about whether or not to proceed.
Try to make the logic more robust by checking the inactive or live
table during resume. (This is still not perfect.)
Commit 971ab733b7 ("thin: activation of
merging thin snapshot") also added an incorrect deactivation attempt
for non-thin LVs: find_snapshot(lv)->lv is not designed to be
activated and any attempt to deactivate it is incorrect.
Move checking the lvmetad state, and the possible rescan,
out of lvmetad_send() to the start of the command.
Previously, the token mismatch and rescan would occur
within lvmetad_send() for some other request. Now,
the token mismatch is detected earlier, so the
rescan can be done before the main command is in
progress. Rescanning deep within the processing of
another command will disturb the lvmcache state of
that other command.
A rescan already exists at the start of the command
for the case where foreign VGs are going to be read.
This same rescan is now also performed when there is
an lvmetad token mismatch (from a changed global_filter).
The commands pvscan/vgscan/lvscan/vgimport are excluded
from this preemptive checking/rescanning for lvmetad
because they want to do rescanning themselves explicitly.
If rescanning devices fails, then lvmetad has not been
correctly repopulated and should not be used, so make
the command revert to not using lvmetad.
When not obtaining device from udev, we are doing deep devdir scan,
and at the same time we try to insert everything what /sys/dev/block
knows about. However in case lvm2 is configured to use nonstardard
devdir this way it will see (and scan) devices from a real system.
lvm2 test suite is using its own test devdir with its
own device nodes. To avoid touching real /dev devices, validate
the device node exist in give dir and do not insert such device
into a cache.
With obtain list from udev this patch has no effect
(the normal user path).
We have _insert_dirs() for udev and non-udev compilation.
Compiling without udev missed to call dev_cache_index_devs().
Move the call after _insert_dirs() call so both compilation
gets it.
When scanning if device is being usable as PV,
we call STATUS - but this status should not cause
any flushing.
Skip also open_count information as it's not needed.
We don't have any report field of this type yet. Return this patch into
the play if we really need that. Currenly we always report status
(result of "status" dm ioctl) for an LV as a whole where we choose
segment which represents the LV, not calling status for each possible
segment it contains - we don't need this now so I'm removing it to
not make the code more complex uselessly.
Devices without "LVM-" uuid prefix have been generated by very old
version of lvm2 2.00 and 2.01.
Since version 2.02 all lvm2 devices are using prefix "LVM-".
However checking for present of ancient non prefixed devices does
take extra IOCTL per every call and for majority of todays user
it will not find anything new.
So use the assumption that users with kernel 3.X and newer are not
really using such old versions of lvm2 (year <2005) and with their
new kernel they are also using new version of lvm2 and skip
checking for them.
This change also makes trace logs more readable.
This is hotfix for RHBZ: https://bugzilla.redhat.com/1324537
However already the %FREE is not a good fit and we need something
better. Meanwhile make -l%PVS work at least as good as %FREE
for thin-pool.
TODO: this needs rework - it should be allocator to do all the size
decisions at one place.
Improved test script to verify lost mirror image does not
cause mirror corruption while mirror is in use.
TODO: add more cases (lost mlog...), lost image from 3leg mirror...
When leaving preload routine it should not altet state of flush required
when it's been already set to 1 and drop it to 0.
The API here is unclean, but current usage expects the same
variable pointer is for all preload calls and combines 'flush_required'
across all of them.
Commit 844b009584 tried to move
limit for usage of noflush to 'preload' however this was not
correctly processed.
Intead explicitly check for which types we do not want noflush
and also add debug message in this case.
Fix regression caused by commit ba41ee1dc9.
The idea was to use no_flush for changed device only for thin volumes
and thin pools but also to merge this with change made in commit
844b009584.
However the resulting condition has caused misbehavior for the mirror
suspend - as that has been before the ONLY allowed target type
that could have been suspended with noflush.
Result was badly working repair for --type mirror that has been
passing 'flush' to the repaired mirror target whenever preload
wrongly set flush_required.
The origin code has required the flush_required to be set whenever
deivce size is changed.
Now it first detects if device size got smaller
'dm_tree_node_size_changed(root) < 0' - this requires flush.
Otherwise it checks device is not thin volume nor thin pool and its
size has changed (got bigger) and requires flush.
This mean upsize of thin-pool or thin volume will not require flush.
/sys/dev/block is available since kernel version 2.2.26 (~ 2008):
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-dev
The VGID/LVID indexing code relies on this feature so skip indexing
if it's not available to avoid error messages about inability to open
/sys/dev/block directory.
We're not going to provide fallback code to read the /sys/block/
instead in this case as that's not that efficient - it needs extra
reads for getting major:minor and reading partitions would also
pose further reads and that's not worth it.
If compiling without udev_sync support, udev_get_library_context simply
returns NULL so we don't need to remember putting ifdef UDEV_SYNC_SUPPORT
in the code all the time we just need to check whether there's any udev
context initialized or not.
If obtain_device_list_from_udev=0, LVM can make use of persistent .cache
file. This cache file contains only devices which underwent filters in
previous LVM command run. But we need to iterate over all block devices
to create the VGID/LVID index completely for the device mismatch check
to be complete as well.
This patch iterates over block devices found in sysfs to generate the
VGID/LVID index in dev cache if obtain_device_list_from_udev=0
(if obtain_device_list_from_udev=1, we always read complete list of
block devices from udev and we ignore .cache file so we don't need
to look in sysfs for the complete list).
For the case when we print device name associated with struct device
that was not found in /dev, but in sysfs, for example when printing
devices where LV device mismatch is found.
Use meta% to expose highest mapped sector in thinLV.
so showing there 100.00% means thinLV maps latest sector.
Currently using a 'trick' with total_numerator to pass-in
device size when 'seg==NULL'
TODO: Improve device status API per target - current 'percentage'
is not really extensible.
Previous fix missed the fact the we do query for 'percent' with
seg value either set or unset (API overload...)
When 'seg' was unset, we still issue flush with status.
Fix it by cheking segtype by target_type.
As we check for segtype - we could also skip whole percentage
if the 'segtype' is unknown by code directly.
Reported-by: Ming-Hung Tsai <mingnus gmail com
It's correct to have a DM device that has no DM UUID assigned
so no need to issue error message in this case. Also, if the
device doesn't have DM UUID, it's also clear it's not an LVM LV
(...when looking for VGID/LVID while creating VGID/LVID indices
in dev cache).
For example:
$ dmsetup create test --table "0 1 linear /dev/sda 0"
And there's no PV in the system.
Before this patch (spurious error message issued):
$ pvs
_get_sysfs_value: /sys/dev/block/253:2/dm/uuid: no value
With this patch applied (no spurious error message):
$ pvs
If we're using persistent .cache file, we're reading this file instead
of traversing the /dev content. Fix missing indexing by VGID and LVID
here - hook this into persistent_filter_load where we populate device
cache from persistent .cache file instead of scanning /dev.
For example, inducing situation in which we warn about different device
actually used than what LVM thinks should be used based on metadata:
$ lsblk -s /dev/vg/lvol0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vg-lvol0 253:4 0 124M 0 lvm
`-loop1 7:1 0 128M 0 loop
$ lvmconfig --type diff
global {
use_lvmetad=0
}
devices {
obtain_device_list_from_udev=0
}
(obtain_device_list_from_udev=0 also means the persistent .cache file is used)
Before this patch - pvs is fine as it does the dev scan, but lvs relies
on persistent .cache file and it misses the VGID/LVID indices to check
and warn about incorrect devices used:
$ pvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
PV VG Fmt Attr PSize PFree
/dev/loop0 vg lvm2 a-- 124.00m 0
$ lvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
LV VG Attr LSize
lvol0 vg -wi-a----- 124.00m
With this patch applied - both pvs and lvs is fine - the indices are
always created correctly (lvs just an example here, other LVM commands
that rely on persistent .cache file are fixed with this patch too):
$ pvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
PV VG Fmt Attr PSize PFree
/dev/loop0 vg lvm2 a-- 124.00m 0
$ lvs
Found duplicate PV B9gXTHkIdEIiMVwcOoT2LX3Ywh4YIHgR: using /dev/loop0 not /dev/loop1
Using duplicate PV /dev/loop0 without holders, ignoring /dev/loop1
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/loop1 instead of /dev/loop0.
LV VG Attr LSize
lvol0 vg -wi-a----- 124.00m
It's possible that while a device is already referenced in sysfs, the node
is not yet in /dev directory.
This may happen in some rare cases right after LVs get created - we sync
with udev (or alternatively we create /dev content ourselves) while VG
lock is held. However, dev scan is done without VG lock so devices may
already be in sysfs, but /dev may not be updated yet if we call LVM command
right after LV creation (so the fact that fs_unlock is done within VG
lock is not usable here much). This is not a problem with devtmpfs as
there's at least kernel name for device in /dev as soon as the sysfs
item exists, but we still support environments without devtmpfs or
where different directory for dev nodes is used (e.g. our test suite).
This patch covers these situations by tracking such devices in
_cache.sysfs_only_names helper hash for the vgid/lvid check to work still.
This also resolves commit 6129d2e64d
which was then reverted by commit 109b7e2095
due to performance issues it may have brought (...and it didn't resolve
the problem fully anyway).
dmeventd daemon may call further code itself that looks at /dev, e.g.
via dmeventd_lvm2_command call. We need to have a consistent view of
the /dev content at that time. Therefore, sync /dev content before
calling monitoring hook which contacts dmeventd.
This problem was quite hidden before, but now it has manifested itself
because of recent additions to dev-cache code where we started looking
at device holders as seen in sysfs. What happened here was that the
device was already in sysfs, but not yet under /dev and this triggered
the new error message sometimes:
log_error("%s: failed to find associated device structure for holder %s.", devname, devpath);
This problem has manifested recently in our api/pytest.sh test from
testsuite where we create thin pool LVs and thin LVs and hence it also
causes dmeventd to be used as well and these error messages were
visible there.
UUID for LV is either "LVM-<vg_uuid><lv_uuid>" or "LVM-<vg_uuid><lv_uuid>-<suffix>".
The code before just checked the length of the UUID based on the first
template, not the variant with suffix - so LVs with this suffix were not
processed properly.
For example a thin pool LV (as an example of an LV that contains
sub LVs where UUIDs have suffixes):
[0] fedora/~ # lsblk -s /dev/vg/lvol1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vg-lvol1 253:8 0 4M 0 lvm
`-vg-pool-tpool 253:6 0 116M 0 lvm
|-vg-pool_tmeta 253:2 0 4M 0 lvm
| `-sda 8:0 0 128M 0 disk
`-vg-pool_tdata 253:3 0 116M 0 lvm
`-sda 8:0 0 128M 0 disk
Before this patch (spurious warning message about device mismatch):
[0] fedora/~ # pvs
WARNING: Device mismatch detected for vg/lvol1 which is accessing /dev/mapper/vg-pool-tpool instead of (null).
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 0
With this patch applied (no spurious warning message about device mismatch):
[0] fedora/~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda vg lvm2 a-- 124.00m 0
To help out with debug, when an exception is thrown in the dbus service we
will dump all the information we have on the last 16 commands that were
executed along with the stack strace.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
With commit 2d5dc6512e the dbus server
no longer needs to utilize udev to know when to update its internal
state.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Check if the value we read from sysfs is not blank and replace the '\n'
at the end only when needed ('\n' should usually be there for sysfs values,
but better check this).
It's possible for an LVM LV to use a device during activation which
then differs from device which LVM assumes based on metadata later on.
For example, such device mismatch can occur if LVM doesn't have
complete view of devices during activation or if filters are
misbehaving or they're incorrectly set during activation.
This patch adds code that can detect this mismatch by creating
VG UUID and LV UUID index while scanning devices for device cache.
The VG UUID index maps VG UUID to a device list. Each device in the
list has a device layered above as a holder which is an LVM LV device
and for which we know the VG UUID (and similarly for LV UUID index).
We can acquire VG and LV UUID by reading /sys/block/<dm_dev_name>/dm/uuid.
So these indices represent the actual state of PV device use in
the system by LVs and then we compare that to what LVM assumes
based on metadata.
For example:
[0] fedora/~ # lsblk /dev/sdq /dev/sdr /dev/sds /dev/sdt
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdq 65:0 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev1 253:3 0 104M 0 mpath
sdr 65:16 0 104M 0 disk
`-mpath_dev1 253:3 0 104M 0 mpath
sds 65:32 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev2 253:4 0 104M 0 mpath
sdt 65:48 0 104M 0 disk
`-mpath_dev2 253:4 0 104M 0 mpath
In this case the vg-lvol0 is mapped onto sdq and sds becauset this is
what was available and seen during activation. Then later on, sdr and
sdt appeared and mpath devices were created out of sdq+sdr (mpath_dev1)
and sds+sdt (mpath_dev2). Now, LVM assumes (correctly) that mpath_dev1
and mpath_dev2 are the PVs that should be used, not the mpath
components (sdq/sdr, sds/sdt).
[0] fedora/~ # pvs
Found duplicate PV xSUix1GJ2SK82ACFuKzFLAQi8xMfFxnO: using /dev/mapper/mpath_dev1 not /dev/sdq
Using duplicate PV /dev/mapper/mpath_dev1 from subsystem DM, replacing /dev/sdq
Found duplicate PV MvHyMVabtSqr33AbkUrobq1LjP8oiTRm: using /dev/mapper/mpath_dev2 not /dev/sds
Using duplicate PV /dev/mapper/mpath_dev2 from subsystem DM, ignoring /dev/sds
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdq, /dev/sds instead of /dev/mapper/mpath_dev1, /dev/mapper/mpath_dev2.
PV VG Fmt Attr PSize PFree
/dev/mapper/mpath_dev1 vg lvm2 a-- 100.00m 0
/dev/mapper/mpath_dev2 vg lvm2 a-- 100.00m 0
If we're using non-standard /dev layout so we can't get the dm-X name
easily, we can't also look at the /sys/blocl/dm-X/dev to get the major:minor
pair. Use "stat" in this case even though it triggers automounts
(but there's no better way for now).
The /proc/self/mountinfo is not bound to device names like /proc/mounts
and it uses major:minor pairs instead.
This fixes a situation in which a volume is mounted and then renamed
later on - that makes /proc/mounts unreliable when detecting mounted
volumes.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1196910.
Commit b64703401d cause regression
when handling stacked resize of pool metadata volume that would
be a raid LV.
Fix it by properly setting up size also for layer extension.
Allow pvchange and pvresize to process exported VGs,
and have them check for the exported state in their
single function.
Previously, the exported VG state would trigger a
failure in vg_read()/ignore_vg() because the VGs are
being read with READ_FOR_UPDATE. Because these commands
read all VGs to search for the intended PVs, any
exported VG would trigger a failure, even if it was
not related to the intended PV.
Use <> around user entered option parameters to make it visually
different (just like we use Italic style in man page).
TODO:
In the future we should consistently provide such notation and
possibly generate it automagically from some internal data structures.
Preferably for man pages as well so we report actual set of supported
options.
Recent kernel (4.4) start to report values smaller then sector size
(but in reporting size for SSD which support data zeroing on discard).
For now log warning and assume it really means 1 sector.
Addressing RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1313377
Continuing with lvconvert style updates.
Minimize usage of '\:' as zero-width break space, since html renderers
do not handle this groff sign well (some of them seems to even render
':' there).
But since we do not want see badly rendered man pages itself, prefer
the 'man visual' appearence over html page generation (man2html should
be actually fixed).
Start to use 'user input option' with Capitals more consistently.
Fixing vpath usage as it has been checking for existance of
generated file also in the $(scrdir) e.g.:
No need to remake target '10-dm.rules.in'; using VPATH name '...'
If the $(srcdir) had been also $(builddir) and contained already
generated rules file, it's been used instead generating new
one.
(See: http://make.mad-scientist.net/papers/how-not-to-use-vpath/)
Commit abd9618dd8 tried to improve
parsing of vg name from logical path - but still missed couple
corner cases.
This patch further improves the logic and reuses
validate_lvname_param() for parsing of lv_name.
Also explicitly checks for LVM_VG_NAME in the right case.
So now also properly parses cases like:
'lvconvert --repairt vg/'
and will provide correct error message.
There's a window between doing VG read and checking PV device size
against real device size. If the device is removed in this window,
the dev cache still holds struct device and pv->dev still references
that and that PV is not marked as missing. However, if we're trying
to get size for such device, the open fails because that device
doesn't exists anymore.
We called existing pv_dev_size in _check_pv_dev_sizes fn. But
pv_dev_size assigned a size of 0 if the dev_get_size it called failed
(because the device is gone).
So call the dev_get_size directly and check for the return code
in _check_pv_dev_sizes and go further only if we really know the
device size. This is to avoid confusing warning messages like:
Device /dev/sdd1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
While running on F24 a number of warnings were being emitted from using the
deprecated GObject instead of GLib. Tested on python 3.4 and 3.5.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Python 3.5 in F24 was throwing the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/lvmdbusd/main.py", line 73, in process_request
req.run_cmd()
File "/usr/lib/python3.5/site-packages/lvmdbusd/request.py", line 73, in run_cmd
self.register_error(-1, st)
File "/usr/lib/python3.5/site-packages/lvmdbusd/request.py", line 123, in register_error
self._reg_ending(None, error_rc, error)
File "/usr/lib/python3.5/site-packages/lvmdbusd/request.py", line 115, in _reg_ending
self.cb_error(self._rc_error)
File "/usr/lib64/python3.5/site-packages/dbus/service.py", line 669, in <lambda>
keywords[error_callback] = lambda exception: _method_reply_error(connection, message, exception)
File "/usr/lib64/python3.5/site-packages/dbus/service.py", line 293, in _method_reply_error
exception))
File "/usr/lib64/python3.5/traceback.py", line 136, in format_exception_only
return list(TracebackException(etype, value, None).format_exception_only())
File "/usr/lib64/python3.5/traceback.py", line 442, in __init__
if (exc_value and exc_value.__cause__ is not None
AttributeError: 'str' object has no attribute '__cause__'
This was caused because we were calling the dbus error callback with a
string instead of an actual exception. On python 3.4 this was apparently
OK, but not with 3.5. Corrected to pass the exception to error callback.
Change tested on both python 3.4 and 3.5.
Reported-by: Vratislav Podzimek <vpodzime@redhat.com>
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Specifying an output width of 0 now leads to a default minimum width
taken from the width of the column heading. (Most fields should use
this.)
Components of field names are generally separated by underscores (which
are optional at run-time). (Dropped 3 duplicate fields now covered by
this rule.)
Each field heading must be unique and generally doesn't have spaces
between words (which get capitalised) unless they are already short and
the fields are normally longer or clarity demands it.
With the recent conversion of pvcreate/pvremove to the
common toollib processing function, skipping in-use PVs
in _process_pvs_in_vg prevented them from being protected
as intended by the in-use flag.
The processing code for pvcreate/pvremove checks for the
in-use state itself and prevents using an in-use PV.
If a PV is skipped, it looks like an unused device and
is not protected from being used in pvcreate/pvremove.
This command option can be used to trigger a D-Bus
notification independent of the usual notifications
that are sent from other commands as an effect of
changes to PV/VG/LV state. If lvm is not built with
dbus notification support or if notify_dbus is disabled
in the config, this command will exit with an error.
When a command modifies a PV or VG, or changes the
activation state of an LV, it will send a dbus
notification when the command is finished. This
can be enabled/disabled with a config setting.
The code in _print_historical_lv function works with temporary
"descendants_buffer" that is allocated and freed within this
function.
When printing text out, we used "outf" macro which called
"out_text" fn and it checked return value and if failed,
the macro called "return_0" automatically. But since we
use the temporary buffer, if any of the out_text calls
fails, we need to deallocate this buffer properly - that's
the "goto_out", otherwise we'll be leaking memory.
So add new "outfgo" helper macro which does the same as "outf",
but it calls "goto_out" instead of "return_0" so we can jump
to a cleanup hook at the end.
- stdout/stderr for test output
- debug.log* for daemon output
- use install instead of cp for DBus config
- use system bus
- DBus reloads on new file. Better having it created with correct
permissions.
Notes:
- Squashed commits from mcsontos development branch
- Still disabled at this time
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Using lvm1 metadata with lvmetad is not generally allowed,
but nothing has prevented creating new lvm1 metadata with
lvmetad (missing error checking in pvcreate/vgcreate.)
Various tests are using lvm1 with lvmetad and happen to
work because of the missing error checks.
This commit fixes the tests so they won't fail when the
lvm1/lvmetad error checking is fixed. A new variable
LVM_TEST_LVM1 is defined and is used in the scripts to
decide if lvm1 metadata should be tested. LVM_TEST_LVM1
is not defined when lvmetad is being tested, and the
combination of LVM_TEST_LVM1 and LVM_TEST_LVMETAD can
be used to verify the desired lvmetad+lvm1 behavior.
Historical LV is valid as long as there is at least one live LV among
its ancestors. If we find any invalid (dangling) historical LVs, remove
them automatically.
The vg_strip_outdated_historical_lvs iterates over the list of historical LVs
we have and it shoots down the ones which are outdated.
Configuration hook to set the timeout will be in subsequent patch.
The metadata/record_lvs_history is global switch which enables or
disables recording historical LVs in metadata.
If both metadata/record_lvs_history=1 lvm.conf option and
--nohistory command switch is used at the same time, the
--nohistory prevails.
Report proper values for historical LVs in lv_layout and lv_role fields.
Any historical LV doesn't have any layout anymore and the role is "history".
For example:
$ lvs -H -o name,lv_attr,lv_layout,lv_role vg/-lvol1
LV Attr Layout Role
-lvol1 ----h----- none public,history
The lv_full_ancestors reporting field is just like the existing
lv_ancestors field but it also takes into account any history and
indirect relations recorded.
All names for historical LVs are prefixed with '-' character to make clear
difference between live and historical LVs. The '-' can't be set by users
for live LV names during lvcreate hence we never get into a conflict with
the names that user defines for live LVs.
The lv_historical reporting field is a simple binary field that reports
whether an LV is historical one ("historical" value or value of "1" displayed)
or not (blank string "" or value of "0" displayed).
When processing LVs in a VG and when the -H|--history switch is used,
make process_each_lv_in_vg to iterate over historical volumes too.
For each historical LV, we use dummy struct logical_volume instance with
the "this_glv" reference set to a wrapper over proper struct
historical_logical_volume representation. This makes it possible to process
historical LVs just like normal live LVs (though a dummy one without any
segments and all the other fields zeroed and blank) and it also allows
for using all historical LV related information via lv->this_glv->historical
reference.
One can use a simple call to lv_is_historical to make a difference between
live and historical LV in all the code that is called by process_each_* fns.
This patch adds "include_historical_lvs" field to struct cmd_context to
make it possible for the command to switch between original funcionality
where no historical LVs are processed and functionality where historical
LVs are taken into account (and reported or processed further). The switch
between these modes is done using the '-H|--history' switch on command
line.
The include_historical_lvs state is then passed to process_each_* fns
using the "include_historical_lvs" field within struct processing_handle.
Add support for making an interconnection between thin LV segment and
its indirect origin (which may be historical or live LV) - add a new
"indirect_origin" argument to attach_pool_lv function.
Also export historical LVs when exporting LVM2 metadata.
This is list of all historical LVs listed in
"historical_logical_volumes" metadata section with all
the properties exported for each historical LV.
For example, we have this thin snapshot sequence:
lvol1 --> lvol2 --> lvol3
\
--> lvol4
We end up with these metadata:
logical_volume {
...
(lvol1, lvol3 and lvol4 listed here as usual - no change here)
...
}
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "lvol1"
descendants = ["lvol3", "lvol4"]
}
}
By removing lvol1 further, we end up with:
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "-lvol1"
descendants = ["lvol3", "lvol4"]
}
lvol1 {
id = "me0mes-aYnK-nRfT-vNlV-UiR1-GP7r-ojbROr"
creation_time = 1456919608 # 2016-03-02 12:53:28 +0100
removal_time = 1456919767 # 2016-03-02 12:56:07 +0100
}
}
When an LV is being removed, we create an instance of
"struct historical_logical_volume" wrapped up in
"struct generic_logical_volume".
All instances of "struct historical_logical_volume" are then recorded in
"historical_lvs" list which is part of "struct volume_group".
The "historical LV" is then interconnected with "live LVs" to
connect a history chain for the live LV.
The add_glv_to_indirect_glvs is a helper function that registers a
volume represented by struct generic_logical_volume instance ("glv")
as an indirect user of another volume ("origin_glv") and vice versa -
it also registers the other volume ("origin_glv") as indirect_origin
of user volume ("glv").
The remove_glv_from_indirect_glvs does the opposite.
The get_or_create_glv is helper function that retrieves any existing
generic_logical_volume wrapper for the LV. If the wrapper does not exist
yet, it's created.
The get_org_create_glvl is the same as get_or_create_glv but it creates
the glv_list wrapper in addition so it can be added to a list.
Add new structures and new fields in existing structures to support
tracking history of LVs (the LVs which don't exist - the have been
removed already):
- new "struct historical_logical_volume"
This structure keeps information specific to historical LVs
(historical LV is very reduced form of struct logical_volume +
it contains a few specific fields to track historical LV
properties like removal time and connections among other LVs).
- new "struct generic_logical_volume"
Wrapper for "struct historical_logical_volume" and
"struct logical_volume" to make it possible to handle volumes
in uniform way, no matter if it's live or historical one.
- new "struct glv_list"
Wrapper for "struct generic_logical_volume" so it can be
added to a list.
- new "indirect_glvs" field in "struct logical_volume"
List that stores references to all indirect users of this LV - this
interconnects live LV with historical descendant LVs or even live
descendant LVs.
- new "indirect_origin" field in "struct lv_segment"
Reference to indirect origin of this segment - this interconnects
live LV (segment) with historical ancestor.
- new "this_glv" field in "struct logical_volume"
This references an existing generic_logical_volume wrapper for this
LV, if used. It can be NULL if not needed - which means we're not
handling historical LVs at all.
- new "historical_lvs" field in "struct volume group
List of all historical LVs read from VG metadata.
Pair kernel_cache_settings with kernel_cache_policy.
There is very small chance in error case that the value in table
might be differnet from the value stored in metadata
so make it 'checkable'.
Showing 'u' in the pv_attr reporting field is mostly unnecessary because
most PVs are allocatable, and being allocatable implies it is (u)sed,
and this is already obvious from other fields in the default 'pvs'
output like the VG name.
So move the new (u)sed pv_attr from character position 4 to 1, and only
show it in those rare cases when the PV is not (a)llocatable or the
relevant metadata is missing.
(Scripts should not be using pv_attr, but rather pv_allocatable,
pv_exported, pv_missing, pv_in_use etc.)
Helping with understanding we will not try to deref NULL pointer,
as if the sizes are initialized to NULL it also means 'mem' would
be NULL, but thats too hard to model so make it obvious.
Correct name is lvm2-lvmdbusd.service not lvmdbusd.service.
This makes the bus-activation (auto-activation) work.
Signed-off-by: Vratislav Podzimek <vpodzime@redhat.com>
Make the data_alignment variable 64 bits so it
can hold the invalid command line arg used in
pvreate-usage.sh pvcreate --dataalignment 1e.
On 32 bit arches, the smaller variable wouldn't
hold the invalid value so the error would not
trigger as expected by the test.
This simple tool calls the Manager.Refresh method on the dbus service
to check and see if the dbus service has the most up to date state.
This is to be used for testing to ensure that event driven updates are
working as planned.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When we use udev or have lvm call back into the dbus service when a
change occurs, even if that change originated from the dbus service
we end up refreshing the state of the system twice which is not
needed or wanted. This change handles this case by removing any
pending refreshes in the worker queue if the state of the system
was just updated.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Since we want to read env LVM_VG_NAME vg names,
we cannot just check LV names which do contain '/'.
So before the patch commands like:
> lvconvert --repair vg
Before:
Please provide a valid volume group name
After:
Path required for Logical Volume "vg".
Please provide a valid volume group name
> LVM_VG_NAME=vg lvconvert --repair vg
Before:
Please provide a valid volume group name
After:
Can't find LV vg in VG vg
Commit ca878a3426 introduced an issue
that zero sized extesion suddenly started to be accepted and
missed to return error.
Properly check invalid input values for sizes.
Add tests for the "dmstats report" command:
* report
* report --count
* report --histogram
So far the tests just check the command runs as expected when a
correctly configured stats region exists: validation of output
can be added later.
Add tests for the "dmstats create" command:
* simple whole-device region
* region using --start/--len options
* region using --segments option
* region with precise timestamps (--precise)
* region with histogram bounds (--bounds)
The install target already depends on .tests-stamp - since this
in turn depends on lib/version-expected there is no need to have
this as a dependency of install.
Add initial dmstats tests to 000-basic.sh. These tests ensure that
the dmsetup binary is built and linked correctly when called as
'dmstats' and that the version of the binary matches the expected
library version used for the build.
"pvcreate_each_params" was a temporary name used
to transition from the old "pvcreate_params".
Remove the old pvcreate_params struct and rename the
new pvcreate_each_params struct to pvcreate_params.
Rename various pvcreate_each_params terms to simply
pvcreate_params.
Use the new pvcreate_each_device() function from
toollib, previously added for pvcreate, in place
of the old pvcreate_vol().
This also requires shifting the location where the
lock is acquired for the new VG name. The lock for
the new VG is supposed to be acquired before pvcreate.
This means splitting the vg_lock_newname() out of
vg_create(), and calling vg_lock_newname() directly
before pvcreate, and then calling the remainder of
vg_create() after pvcreate.
The new function vg_lock_and_create() now does
vg_lock_newname() + vg_create(), like the previous
version of vg_create().
The lock on the new VG name is released before the
pvcreate and reacquired after the pvcreate because
pvcreate needs to reset lvmcache, which doesn't work
when locks are held. An exception could likely be
made for the new VG name lock, which would allow
vgcreate to hold the new VG name lock across the
pvcreate step.
This is common code for handling PV create/remove
that can be shared by pvcreate/vgcreate/vgextend/pvremove.
This does not change any commands to use the new code.
- Pull out the hidden equivalent of process_each_pv
into an actual top level process_each_pv.
- Pull the prompts to the top level, and do not
run any prompts while locks are held.
The orphan lock is reacquired after any prompts are
done, and the devices being created are checked for
any change made while the lock was not held.
Previously, pvcreate_vol() was the shared function for
creating a PV for pvcreate, vgcreate, vgextend.
Now, it will be toollib function pvcreate_each_device().
pvcreate_vol() was called effectively as a helper, from
within vgcreate and vgextend code paths.
pvcreate_each_device() will be called at the same level
as other process_each functions.
One of the main problems with pvcreate_vol() is that
it included a hidden equivalent of process_each_pv for
each device being created:
pvcreate_vol() -> _pvcreate_check() ->
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
pvcreate_each_device() reorganizes the code so that
each-VG-each-PV loop is done once, and uses the standard
process_each_pv function at the top level of the function.
This uses the vg->pv_write_list in place of the
vg->pvs_to_write list, and eliminates the use of
pvcreate_params. The label remove and zeroing
steps are shifted out of vg_write() to the higher
level like pvcreate will do.
The vg->pv_write_list contains pv_list structs for which
vg_write() should call pv_write().
The new list will replace vg->pvs_to_write that contains
vg_to_create structs which are used to perform higher-level
pvcreate-related operations. The higher level pvcreate
operations will be moved out of vg_write() to higher levels.
Since we already check in few other places 'info' is not NULL,
do the same for others - however when info would be NULL
it more or less looks like internal error.
Use #define instead, since we do not require actually buffer needs
to exists to eliminated new gcc6 warning:
clvm.h:53:19: warning: ‘CLVMD_SOCKNAME’ defined but not used
[-Wunused-const-variable]
Reshuffle messages during pvremove.
Always print WARNING: when PV is in use so using options
--force --force doesn't make this important user
notification go away.
Simplify variable 'used' usage (so older gcc doesn't warn
about the use of unitilizied variable).
Add some '.' into messages.
Currently it's been checked for 'zero' header for thin-pool,
but lets use it always for cache as well - since it's relatively 'cheap'
detection of read 'error' problems as thin/cache tools
currently do not work fast enough in this case.
export LVMDBUSD_SESSION=True to run on the session bus instead
of the system bus so that we can run the unit test without
installing the dbus conf file.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Reduced the size of LVs created and use actual PE numbers instead of hard
coding them to allow us to work with the loop back devices.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
It appears that the output of lvconvert --merge can vary some. The code
was blowing up as it was trying to parse a line of stdout to retrieve the
% complete, but the line did not have the needed format and an execption
was thrown. The uncaught exception caused the background thread to exit
without updating the job object, which caused the client to hang forever
waiting. Added a default exception handler to prevent unhandled execptions
causing hangs and removed the parameter skip_first_line as it's no longer
needed. The code checks to see if the line can be parsed before doing so.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
After the lockspace has been successfully removed,
invalidate the name field in the lockspace struct.
The struct remains on the list of lockspaces until
the struct can be freed later. Until the struct is
freed, its name will prevent another new lockspace
from being created with the same name.
When update fails in suspend() (sending of messages
fails because metadata space is full) call resume(),
so the locking sequence works properly for clustering.
Also failing deactivation should unlock memory.
Fix reporting of Fail thin-pool target status
as attr[8] letter 'F'.
Report 'needs_check' status from thin-pool target via
attr field [4] (letter 'c'/'C'), and also via CheckNeeded field.
TODO: think about better name here?
TODO: lots of prop_not_implemented_set
Fix parsing of 'Fail' status (using capital letter) for thin-pool.
Add also parsing of 'Error' state for thin-pool.
Add needs_check test for thin-pool.
Detect Fail state for thin.
Ask for confirmation when using pvcreate/pvremove on a PV which is
marked as belonging to a VG, just like we do in case of a PV which
belongs to known VG:
$ pvcreate -ff /dev/sda
Really INITIALIZE physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n
/dev/sda: physical volume not initialized
$ pvremove -ff /dev/sda
Really WIPE LABELS from physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n
/dev/sda: physical volume label not removed
The host that owns foreign VGs is responsible for fixing up PV_EXT_USED
flag - the same already applies to repairing any inconsistent VG.
This patch also moves the iteration over vg->pvs inside
_check_or_repair_pv_ext fn - it's cleaner this way.
pv->vg is not set yet during pvcreate processing. Use pv->fmt instead to
check for these fake PVs (all normal PVs have format defined, devices
which are not PVs don't have this set).
This fixes commit 0000db7f98.
If we know that a PV belongs to some VG and we're missing metadata
(because we have only those PV(s) from VG present in the system that
don't have metadata areas), we should skip such PV when processing
under system ID.
This is because we know that the PV belongs to some VG, but we
really can't decide whether it matches system ID unless the VG
metadata is present again.
Some of the PVs are not even orphan PVs - they're fake PVs - this can
happen if we're listing all devices with "pvs -a". Such PV must not
be marked as used.
The backup_restore_vg is used directly for restoring the VG from backup.
It's also used to do the VG conversions from one metadata format to
another which means vgconvert calls backup_restore_vg too.
When restoring VG from backup, we need to rewrite/write PV headers as
PVs may have been orphans before and now they're becoming part of some
VG - we need to write the PV_EXT_USED flag at least.
When using the backup_restore_vg for vgconvert, we need to write
completely new PV header in different format.
Avoid the special "pv_write" call and handling that was used before
this patch in vgconvert (vgconvert_single function to be more precise)
and reuse existing internal interface to register PV header for writing
(or rewriting) via vg->pvs_to_write list instead like we do it elsewhere
in the code.
This patch also resolves a problem in which PV headers with target
format were written in the vgconvert_single fn as orphans and VG
metadata were added later on - this was a tiny hack actually.
We can't do this now - we need to write the PV as belonging
to a VG because otherwise the PV_EXT_USED flag won't be written
properly (if the PV header is written as orphan, the PV_EXT_USED
is set to 0, of course, even though metadata are attached later).
So this patch removes this tiny inconsistency which was passing
just fine before because we didn't have any relation to the VG
in PV header before. Now we have the PV_EXT_USED flag which says
the "PV is used in some VG".
The same check as we already do for orphan PVs, just the other way
round now: if the PV is surely part of some VG and any PV the VG
contains does not have the PV_EXT_USED flag set, repair it.
For example - /dev/sda here is in VG vg and it's incorrectly not
marked as used by PV_EXT_USED flag:
pvs --binary -o pv_ext_vsn,pv_in_use
WARNING: Volume Group vg is not consistent.
WARNING: Repairing Physical Volume /dev/sda that is in Volume Group vg but not marked as used.
PV VG Fmt Attr PSize PFree ExtVsn PInUse
/dev/sda vg lvm2 a-- 124.00m 124.00m 2 1
PV header extension versions:
0 - the original PV without any extensions
1 - bootloader area support added
2 - PV_EXT_USED flag support added
So do the associated checks related to PV_EXT_USED flag only if
PV header extension found is of version 2 and higher.
If we know that the PV is orphan, meaning there's at least one MDA on
that PV which does not reference any VG and at the same time there's
PV_EXT_USED flag set, we're certainly in an inconsistent state and we
need to fix this.
For example, such situation can happen during vgremove/vgreduce if we
removed/reduced the VG, but we haven't written PV headers yet because
vgremove stopped abruptly for whatever reason just before writing new
PV headers with updated state, including PV extension flags (and so the
PV_EXT_USED flag).
However, in case the PV has no MDAs at all, we can't double-check
whether the PV_EXT_USED is correct or not - if that PV is marked
as used, it's either:
- really used (but other disks with MDAs are missing)
- or the error state as described above is hit
User needs to overwrite the PV header directly if it's really clear
the PV having no MDAs does not belong to any VG and at the same time
it's still marked as being in use (pvcreate -ff <dev_name> will fix this).
For example - /dev/sda here has 1 MDA, orphan and is incorrectly marked
with PV_EXT_USED flag:
$ pvs --binary -o+pv_in_use
WARNING: Found inconsistent standalone Physical Volumes.
WARNING: Repairing flag incorrectly marking Physical Volume /dev/sda as used.
PV VG Fmt Attr PSize PFree InUse
/dev/sda lvm2 --- 128.00m 128.00m 0
For example:
$ pvs -o pv_name,vg_name,pv_in_use
PV VG InUse
/dev/sda vg used
/dev/sdb
/dev/sdc used
(sda is part of vg - it's used
sdb is not part of vg - it's not used
sdc is part of vg, but MDAs missing - it's used)
Scenario:
$ pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
We're adding the PV to a VG.
Before this patch:
$ vgcreate vg /dev/sda
Physical volume "/dev/sda" successfully created
Volume group "vg" successfully created
With this path applied:
$ vgcreate vg /dev/sda
Volume group "vg" successfully created
...and verbose log containing: "Physical volume "/dev/sda" successfully written"
Make sure we won't use a PV that is already marked as used. Normally,
VG metadata would stop us from doing that, but we can run into a
situation where such metadata is missing because PVs with MDAs
are missing and the PVs left are the ones with 0 MDAs.
(/dev/sda in this example has 0 MDAs and it belongs to a VG,
but other PVs with MDA are missing)
$ pvs -o pv_name,pv_mda_count /dev/sda
PV #PMda
/dev/sda 0
$ pvcreate /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't initialize PV '/dev/sda' without -ff.
$ pvchange -u /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't change PV '/dev/sda' without -ff.
Physical volume /dev/sda not changed
0 physical volumes changed / 1 physical volume not changed
$ pvremove /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
(If you are certain you need pvremove, then confirm by using --force twice.)
$ vgcreate vg /dev/sda
Physical volume '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Unable to add physical volume '/dev/sda' to volume group 'vg'.
We'll use this struct in subsequent patches for PVs which should
be rewritten, not just created. So rename struct pv_to_create to
struct pv_to_write for clarity.
This is a hotfix for a bug introduced in
6d7dc87cb3.
The bug description: First we allocate memory for
processing handle (at an address 1) then we
allocate some memory on the same pool for later use
in pvmove_poll function inside the process_each_pv
function (at an address 2). After we jump out of
process_each_pv we called destroy_processing_handle.
As a result of destroying the handle memory pool could
deallocate all memory at address 1 or higher. The
pvmove_poll function tried to copy a memory allocated
at address 2 that could be returned to the system.
If it was so it led to segfault.
We need to rethink proper fix but in the same time
cmd->mem pool is recreated per each lvm command so
this should not cause problems even when we run
multiple commands in lvm shell.
A valgrind snapshot of the corruption:
Invalid read of size 1
at 0x4C29F92: strlen (mc_replace_strmem.c:403)
by 0x5495F2E: dm_pool_strdup (pool.c:51)
by 0x1592A7: _create_id (pvmove.c:774)
by 0x159409: pvmove_poll (pvmove.c:796)
by 0x1599E3: pvmove (pvmove.c:931)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Address 0xf15df8a is 138 bytes inside a block of size 8,192 free'd
at 0x4C28430: free (vg_replace_malloc.c:446)
by 0x5494E73: dm_free_wrapper (dbg_malloc.c:357)
by 0x5495DE2: _free_chunk (pool-fast.c:318)
by 0x549561C: dm_pool_free (pool-fast.c:151)
by 0x164451: destroy_processing_handle (toollib.c:1837)
by 0x1598C1: pvmove (pvmove.c:903)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Address this gcc warning:
metadata/lv.c:243: warning: initialized field overwritten
metadata/lv.c:243: warning: (near initialization for 'status.seg_status')
Present with e.g.: gcc version 4.3.2 (Debian 4.3.2-1.1)
Simplify calculation of extents rounding needed for
segment size.
Segment size has to divisible by 'extent count' needed to contain
whole stripe. LVM currently does not support stripes across segment.
In case the stripe size is bigger then extent size,
require bigger rounding.
Fixing regression caused by 197b5e6dc7.
So the 'TODO' part now finally know the answer - there is 'sparc64'
architecture which imposes limitation to read 64b words only through
64b aligned address.
Since we never could know how is the user going to use the returned
pointer and the userusually expects it's aligned on the highest CPU
required alignement, preserve it also for char*.
Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=809685
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Wrong thin-pool feature flag ordering in dm table: It will lead to
unnecessary table reload.
Fix it by placeing feature flags in order they are returned from the
kernel so current 'table line diff' code will not see a difference.
'verbose' was marked as a boolean option while it
takes integer args - so it has limited usage to 0 or 1,
but we supported 0-4 at least.
Fix it by switching to corrent int type.
(Hopefully noone was trying to use this variable as true/yes/false/no
way - as the would be unsupported/undocumented).
Reporter noticed lvm2 incorrectly translated
lvm2 threshold value to water mark in commit:
99237f0908
Fix it by properly translating size to number of
blocks in thin-pool and then calc for free blocks
matching configured lvm2 threshold value.
Reported-by: Ming-Hung Tsai <mingnus@gmail.com>
Normally, we generate and provide lvm.conf file where use_blkid_wiping
is set based on whether support for this is compiled in or not. This was
generated properly based on configure.
However, if lvm.conf is not used at all (someone deletes it) or the value
in lvm.conf is commented out (user edited it), we still need to use
proper default value that is based on DEFAULT_USE_BLKID_WIPING taken
from configure script - we used hardcoded value of "1" in this case
by mistake.
We already do check for suspended devs within udev rules where
the pvscan is to update lvmetad. So the check for suspended devs
in "pre-lvmetad" chain is not useful here - remove it - it may
be a source of hardly to detect races anyway (if udev rule detects
the device is not suspended and then the pvscan instance sees the
dev as suspended, we may end up not reacting to the event properly).
lvm1 and pool format do not support bootloader areas and we need to
remove any existing associated bootloader areas when we read lvm1 and
pool labels.
This has its importance if we're converting from one format to another
and we're reusing lvmcache in long-running commands (e.g. clvmd or lvm
shell) and we need to make lvmcache consistent and valid for current format.
Non-dm devices have ID_PART_TABLE_TYPE variable exported in
udev db from blkid scan for *both* whole devices and partitions.
We used ID_PART_ENTRY_DISK in addition to decide whether this
is the whole device or partition and then we filtered out only
whole devices where the partition table really is.
However, ID_PART_ENTRY_DISK was added in blkid 2.20 so we need
to use a different set of variables to decide on whole devices
and partitions on systems where older blkid is still used.
Now, we use ID_PART_TABLE_TYPE to detect that there's something
related to partitioning with this device and we use DEVTYPE variable
instead to decide between whole device (DEVTYPE="disk") and partition
(DEVTYPE="partition").
For dm devices it's simpler, we have ID_PART_TABLE_TYPE variable\
set in udev db for whole devices. It's not set for partitions,
hence we don't need more variable in addition to make the decision
on whole device vs. partition (dm devices do not have regular
partitions, hence DEVTYPE can't be used anyway, it's always set
to "disk" for whole disks and partitions).
Add "size" and "size_seqno" to struct device to cache device's size
and also to control its lifetime - the cached value is valid as long
as the global _dev_size_seqno is equal to the device's size_seqno,
otherwise we need to get the size again and cache the new value.
This patch also adds new dev_size_seqno_inc() fn for the appropriate
parts of the code to increment current global value of _dev_size_seqno
and hence to cause all currently cached values for device sizes to
be invalidated.
The device size is now cached because we're planning to reuse this
information for further checks and we want to avoid checking it more
than necessary to save resources.
Fix regression caused by c9f021de0b.
This commit actually transfered real-action (e.g. device removal)
into the next loop which has however missed to check for break.
So add check for break also there.
When creating a list in 'context of command' - use proper mempool.
vg->vgmem is mempool related to VG metadata - and can be eventually
locked read-only when VG struct is shared.
W: manual-page-warning /usr/share/man/man8/lvm.8.gz 491: warning: macro `_cdata',' not defined
rpmlint actually notices we had few hidden word in man page.
the line cannot start with apostrophe as it has then a different
meaning.
If not using explicit --enable-blkid-wiping/--disable-blkid-wiping
configure option, the configure script tries to enable/disable blkid
wiping feature automatically based on blkid library version found.
The script incorrectly set default value for lvm.conf's
allocation/use_blkid_wiping" setting to "1" (enabled) if proper
blkid library version was not found or the version found was less
than the minimum required. It should be set to "0" in this case.
The extent size must fits all blocks in 4294967295 sectors
(in 512b units) this is 1/2 KiB less then 2TiB.
So while previous statement 'suggested' 2TiB is still acceptable value,
make it clear it's not.
As now we support any multiples of 128KB as extent size -
values like 2047G will still 'flow-in' otherwise the largest power-of-2
supported value is 1TiB.
With 1TiB user needs 8388608 extents for 8EiB device.
(FYI such device is already unusable with todays glibc-2.22.90-27)
4GiB extent size is currently the smallest extent size which allows
a user to create 8EiB devices (with 2GiB it's less then 8EiB).
TODO: lvm2 may possibly print amount of 'lost/unused space' on a PV,
since using such ridiculously sized extent size may result in huge
space being left unaccessible.
Since commit 2fc126b00d, the library
code requires udev to be initialised for device scanning and
clvmd can fail to find VGs if devices/external_device_info_source
is set to "udev".
There are two basic groups of fields for LV segment device reporting:
- related to LV segment's devices: devices and seg_pe_ranges
- related to LV segment's metadata devices: metadata_devices and seg_metadata_le_ranges
The devices and metadata_devices report devices in this format:
"device_name(extent_start)"
The seg_pe_ranges and seg_metadata_le_ranges report devices in
this format:
"device_name:extent_start-extent_end"
This patch reverts partly what commit 7f74a99502
(v 2.02.140) introduced in this area - it added [] for
hidden devices to mark them for all four fields mentioned above.
We won't be marking hidden devices in devices and metadata_devices
fields.
The seg_metadata_le_ranges field will have hidden devices marked -
it's new enough that we don't need to care about compatibility much
yet.
The seg_pe_ranges is old enough that we shouldn't be changing this
one - so we're reverting to not marking hidden devices here.
Instead, there's going to be a new field "seg_le_ranges" which
is going to replace the seg_pe_ranges and it will mark hidden devices -
this is going to be introduced in a patch later.
So in the end we'll end up with:
(LV segment's devices)
devices field with "device_name(extent_start)" format, not marking hidden devices
seg_pe_ranges field with "device_name:extent_start-extent_end" format, not marking hidden devices (deprecated, new seg_le_ranges should be used instead for standardized format)
seg_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
(LV segment's metadata devices)
metadata_devices field with "device_name:extent_start-extent_end" format, not marking hidden devices
seg_metadata_le_ranges field with "device_name:extent_start-extent_end" format, marking hidden devices
Also, both seg_le_ranges and seg_metadata_le_ranges will honour the
report/list_item_separator setting which can be used to configure
the delimiter used for list items.
So, to sum it up, we will recommend using the new seg_le_ranges and
seg_metadata_le_ranges fields because they display devices with
standard extent range format, they can mark hidden devices and they
honour the report/list_item_separator setting.
We'll be keeping devices,seg_pe_ranges and metadata_devices fields
for compatibility.
The associated devices,metadata_devices,seg_pe_ranges and
seg_metadata_le_ranges are reported as genuine string lists now.
This allows for using the items separately in -S|--select
(so searching for subsets etc.) and also it allows for
configuring the separator using report/list_item_separator
which may be useful in scripts (however, we'll enable this
only for seg_le_metadata_ranges and not for devices,seg_pe_ranges
and seg_metadata_devices for compatibility reasons - see following
patch).
Add a comment in _process_pvs_in_vg() to document the
place where there have been problems with processing
PVs twice.
For a while we had a hacky workaround here where we'd
skip processing a PV if its device wasn't found in
all_devices (and !is_missing_pv since we want to
process PVs with missing devices.). That workaround
was removed in commit 5cd4d46f because it was no
longer needed.
The workaround had originally been needed to prevent
a device from being processed twice when the PV had
no MDAs -- it would be processed once in its real VG
and then the workaround would prevent it from being
processed a second time in the orphan VG.
Wrongly appearing as an orphan likely happened because
lvmcache would consider the no-MDA PV an orphan unless
the real VG holding that PV was also in lvmcache.
This issue is also mentioned in pvchange where holding
the global lock allows VGs to remain in lvmcache so
PVs with 0 mdas are not considered orphans.
The workaround in _process_pvs_in_vg() was originally
intended for reporting commands, not for pvchange.
But, it was accidentally helping pvchange also because
the method described by the pvchange global lock
comment had been subverted by commit 80f4b4b8.
Commit 80f4b4b8 was found to be unnecessary, and was
reverted in commit e710bac0. This restored the
intended global lock lvmcache effect to pvchange, and
it no longer relied on the workaround in toollib.
When reporting on LVs, take the end of the range from the size of the
underlying (hidden) LV rather than the logical size of the current
segment (that PVs use).
Previously, pvmove used the function find_pv_in_vg() which did the
equivalent of process_each_pv() by doing:
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
With the found 'pv', it would do vg_read() on pv_vg_name(pv),
and then do the actual pvmove processing.
This commit simplifies by using process_each_pv() and putting
the actual pvmove processing into the "single" function.
This eliminates both find_pv_by_name() and the vg_read().
The processing code that followed vg_read remains the same.
The return code for the pvmove command is not based on the
process_each_pv return code, but is based on the success/fail
conditions in the existing code.
Make the lvb validation rules for convert match
those for unlock (even though it would be very
unlikely or impossible for convert to deal with
zero lvb.)
When an orphan PV is changed/resized, the
lvmlockd global lock is converted from sh
to ex. If the command is changing two
orphan PVs, the conversion to ex should
be done only once.
Existing cache_settings field displays the settings which are
saved in metadata. Add new kernel_cache_settings fields to display
the settings which are currently used by kernel, including fields
for which default values are used.
This way users have complete view of the set of cache settings
supported (and which they can set) and their values which are used
at the moment by kernel.
For example:
$ lvs -o name,cache_policy,cache_settings,kernel_cache_settings vg
LV Cache Policy Cache Settings KCache Settings
cached1 mq migration_threshold=1024,write_promote_adjustment=2 migration_threshold=1024,random_threshold=4,sequential_threshold=512,discard_promote_adjustment=1,read_promote_adjustment=4,write_promote_adjustment=2
cached2 smq migration_threshold=1024 migration_threshold=1024
cached3 smq migration_threshold=2048
Fix lvm2app to return either 0 or 1 for lvm_vg_is_{clustered,exported},
including internal functions pvseg_is_allocated and vg_is_resizeable
which are not yet exposed in lvm2app but make them consistent with the
rest.
This reverts e28e22b9e1
The problem that that commit was fixing (pytest failure)
no longer appears with the current code, so the commit is
not needed.
That commit is a problem for pvchange, because it prevents
lvmcache from retaining VG metadata even while the global
lock is held. pvchange holds the global lock to ensure
that VG metadata is kept in lvmcache throughout processing.
If the cache is not kept, a PV with zero MDAs will appear
first in its actual VG and then appear again in the orphan VG.
It wrongly appears a second time in the orphan VG only if
the actual VG is dropped from lvmcache.
Thin pool discard mode set in metadata can be different from the one
actually used if any device underneath does not support that mode. Add
kernel_discard report field to make it possible to see this difference.
Internal _alloc_init() is only called from allocate_extents(),
which already does prevent usage of virtual segments.
So mark as internal error early and do not process it any further.
Add new test for lv_is_snapshot().
Also move few other bitchecks into same place as remaining bit tests.
TODO: drop lv_is_merging_origin() and keep using lv_is_merging().
The problem addressed by this workaround no longer
seems to exist, so remove it. PVs with no mdas
no longer appear in both their actual VG and in
the orphan VG.
Include brackets for the name if the dev is invisible.
This change applies to all callers of _format_pvsegs fn:
- lvseg_devices (the "lvs -o devices")
- lvseg_metadata_devices (the "lvs -o metadata_devices)
- lvseg_seg_pe_ranges (the "lvs -o seg_pe_ranges")
- lvseg_seg_metadata_le_ranges (the "lvs -o seg_metadata_le_ranges")
The common lv_pool_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_metadata_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_data_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_mirror_log_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_origin_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
The common lv_convert_lv fn avoids code duplication and also
the reporting part now uses _lvname_disp and _uuid_disp to display
name and uuid respectively, including brackets for the name if the
dev is invisible.
Use common _lvname_disp to report lv_parent. The _lvname_disp
takes care of properly marking LVs which are not visible - such
LVs are always enclosed in brackets when reported within any
other field.
For example, thin pool over RAID.
Before:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] cache_pool_tdata
[cache_pool_tdata_rimage_1] cache_pool_tdata
[cache_pool_tdata_rmeta_0] cache_pool_tdata
[cache_pool_tdata_rmeta_1] cache_pool_tdata
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] cache_pool_tmeta
[cache_pool_tmeta_rimage_1] cache_pool_tmeta
[cache_pool_tmeta_rmeta_0] cache_pool_tmeta
[cache_pool_tmeta_rmeta_1] cache_pool_tmeta
[lvol0_pmspare]
With this patch applied:
$ lvs -a -o name,lv_parent,data_lv,metadata_lv vg
LV Parent Data Meta
cache_pool [cache_pool_tdata] [cache_pool_tmeta]
[cache_pool_tdata] cache_pool
[cache_pool_tdata_rimage_0] [cache_pool_tdata]
[cache_pool_tdata_rimage_1] [cache_pool_tdata]
[cache_pool_tdata_rmeta_0] [cache_pool_tdata]
[cache_pool_tdata_rmeta_1] [cache_pool_tdata]
[cache_pool_tmeta] cache_pool
[cache_pool_tmeta_rimage_0] [cache_pool_tmeta]
[cache_pool_tmeta_rimage_1] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_0] [cache_pool_tmeta]
[cache_pool_tmeta_rmeta_1] [cache_pool_tmeta]
[lvol0_pmspare]
Do not mix dm_report_field_set_value and _field_set_value and
use single function call throughout for clarity. The same applies
for dm_report_field_string and _string_disp.
Fix regression caused by commit c2d4330f27
which removed the dm_pool_strdup for the cache policy name in
_cache_policy_disp report function.
This regression was hit with buffered reporting only (which is
used by default). The reason is that for buffered reporting, we're
iterating over LVs in VG (process_each_lv) while gathering
all the information that is needed for the report. In this case,
the LV's cache policy name has not been duped, but only the pointer
to the original VG buffer was stored. When the LV iteration finished,
the VG buffer was freed and any report to output called later
(dm_report_output call) accessed already freed VG data.
This didn't appear if unbuffered reporting was used (--unbuffered)
because in this case, the data were reported to output as
soon as they were processed, hence it was reported to output
before the VG data was freed.
The lvm2-activation{-early,-net}.service systemd unit statuses were missing
in dump gathered by lvmdump -s. These are quite important when debugging
scenarios with systemd environment and where lvmetad is not used.
Have commands send lvmlockd the update message
in vg_write instead of vg_commit, so that it's
not done while LVs are suspended. If the vg_write
is not committed, and the seqno sent to lvmlockd
is not used, then lvmlockd can detect this when
the next update uses the same seqno.
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
Also always clear the internal lvmcache after rescanning, and
reinstate a test for --trustcache so that 'pvs --trustcache'
(for example) avoids rescanning.
Before commit c1f246fedf,
_get_all_devices() did a full device scan before
get_vgnameids() was called. The full scan in
_get_all_devices() is from calling dev_iter_create(f, 1).
The '1' arg forces a full scan.
By doing a full scan in _get_all_devices(), new devices
were added to dev-cache before get_vgnameids() began
scanning labels. So, labels would be read from new devices.
(e.g. by the first 'pvs' command after the new device appeared.)
After that commit, _get_all_devices() was called
after get_vgnameids() was finished scanning labels.
So, new devices would be missed while scanning labels.
When _get_all_devices() saw the new devices (after
labels were scanned), those devices were added to
the .cache file. This meant that the second 'pvs'
command would see the devices because they would be
in .cache.
Now, the full device scan is factored out of
_get_all_devices() and called by itself at the
start of the command so that new devices will
be known before get_vgnameids() scans labels.
Since we mark cache-pool as 'hidden/private' while it is in-use,
we may still allow user to change it's name.
It should not cause any harm and user may prefer better naming
for a cache-pool in use.
If an existing fifo has the wrong attributes it cannot be trusted
so we must unlink it and recreate it correctly.
(Replaces 2c8d6f5c90: if the other end of
the fifo already got opened while its mode was insecure, delaying the
chmod isn't going to make any difference!)
Reinstate and extend checks removed by e1b111b02a.
The code has always assumed that only root has access to the directory
containing the fifos and that they are under the complete control of
dmeventd code. If anything is found not to be as expected, then open()
should certainly not be attempted!
It's getting a bit more complex here.
Basic idea behind is - check_current_backup() should not
log error when a user is using a read-only filesystem,
so e.g. vgscan will not report any error when it tries
to take missing backup.
We still have cases when error could be reported though,
e.g. the backup this would be a symbolic link, but these
are rather misconfiguration and unexpected case.
We have to modes of 'archive()' usage -
1. compulsory - fail stops command and user may try '-An' option
to do a command.
2. non-compulsory - some fails in archiving are ignorable (i.e.
read-only filesystem where archive dir is located).
Those 2 cases needs to be properly handle - i.e. the non-compulsory
logging should not be tampering error logging message production.
So more work here is needed
Pass full buffer size to printf() function - no reason to make
buffer 1 char smaller.
Also rename locn buffer to message buffer directly since it's
not used for anything else.
TODO: we may use same buffer also for 'buf[]' since there is
no collision - so may safe 1K on stack usage.
In general, --select should be used to specify a VG by UUID,
but vgrename already allows a uuid to be substituted for
the name, so continue to allow it in that case.
If the VG arg from the command line does not match the
name of any known VGs, then check if the arg looks like
a UUID. If it's a valid UUID, then compare it to the
UUID of known VGs. If it matches the UUID of a known VG,
then process that VG.
Pass the single vgname as a new process_each_vg arg
instead of setting a cmd flag to tell process_each_vg
to take only the first vgname arg from argv.
Other commands with different argv formats will be
able to use it this way.
When two different VGs with the same name exist,
they are both stored in lvmcache using the vginfo->next
list. Previously, the code would print warnings (sometimes)
when adding VGs to this list. Now the duplicate VG names
are handled by higher level code, so this list no longer
needs to print warnings about duplicate VG names being found.
After recent changes to process_each, vg_read() is usually
given both the vgname and vgid for the intended VG.
However, in some cases vg_read() is given a vgid with
no vgname, or is given a vgname with no vgid.
When given a vgid with no vgname, vg_read() uses lvmcache
to look up the vgname using the vgid. If the vgname is
not found, vg_read() fails.
When given a vgname with no vgid, vg_read() should also
use lvmcache to look up the vgid using the vgname.
If the vgid is not found, vg_read() fails.
If the lvmcache lookup finds multiple vgids for the
vgname, then the lookup fails, causing vg_read() to fail
because the intended VG is uncertain.
Usually, both vgname and vgid for the intended VG are passed
to vg_read(), which means the lvmcache translations
between vgname and vgid are not done.
If two different VGs with the same name exist on the system,
a command that just specifies that ambiguous name will fail
with a new error:
$ vgs -o name,uuid
...
foo qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
foo vfhKCP-mpc7-KLLL-Uh08-4xPG-zLNR-4cnxJX
$ lvs foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ vgremove foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ lvs -S vg_uuid=qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
lv1 foo ...
This is implemented for process_each_vg/lv, and works
with or without lvmetad. It does not work for commands
that do not use process_each.
This change includes one exception to the behavior shown
above. If one of the VGs is foreign, and the other is not,
then the command assumes that the intended VG is the local
one and uses it.
This makes process_each_vg/lv always use the list of
vgnames on the system. When specific VGs are named on
the command line, the corresponding entries from
vgnameids_on_system are moved to vgnameids_to_process.
Previously, when specific VGs were named on the command
line, the vgnameids_on_system list was not created, and
vgnameids_to_process was created from the arg_vgnames
list (which is only names, without vgids).
Now, vgnameids_on_system is always created, and entries
are moved from that list to vgnameids_to_process -- either
some (when arg_vgnames specifies only some), or all (when
the command is processing all VGs, or needs to look at
all VGs for checking tags/selection).
This change adds one new lvmetad lookup (vg_list) to a
command that specifies VG names. It adds no new work
for other commands, e.g. non-lvmetad commands, or
commands that look at all VGs.
When using lvmetad, 'lvs foo' previously sent one
request to lvmetad: 'vg_lookup foo'.
Now, 'lvs foo' sends two requests to lvmetad:
'vg_list' and 'vg_lookup foo <uuid>'.
(The lookup can now always include the uuid in the request
because the initial vg_list contains name/vgid pairs.)
When not using lvmetad, this uses the system_id field in
the cached vginfo structs that are populated during a scan.
When using lvmetad, this requests the VG from lvmetad, and
checks the system_id field in the returned metadata.
When the command already knows both the vgid and vgname,
it should send both to lvmetad for a more exact request,
and it can save lvmetad the work of a name lookup.
Remove long outstand unused code lines, which were already
been obsoleted by other code.
Statuses and snapshot tree creation is already handled differently.
Also drop some 'extra' log_error() and use only stack;
since error has already been reported.
Since we do not use dev_manager in a way we would have destroyed VG
content while in-use - we could safely keep just pointer.
So dropping strdup.
Also it seems we actually no longer use vg_name for anything
so it may possibly go away completely unless it would be useful
for debugging...
Just for convenience to display all new configuration settings
introduced since given version (before, there was only --atversion
to display settings introduced in concrete version).
For example:
$ lvmconfig --type new --sinceversion 2.2.120
allocation {
# cache_mode="writethrough"
# cache_settings {
# }
}
global {
use_lvmlockd=0
# lvmlockd_lock_retries=3
# sanlock_lv_extend=256
use_lvmpolld=1
}
activation {
}
# report {
# compact_output_cols=""
# time_format="%Y-%m-%d %T %z"
# }
local {
# host_id=0
}
Unifying terminology.
Since all the metadata in-use are ALWAYS on disk - switch
to terminology committed and precommitted.
Patch has no functional change inside.
lv preload for detached LVs started to be used also
for various other types which just happens to pass through
weak if() condition.
TODO: find here better solution to rather explicitly check
for types we really need to preload.
We do not won't to 'expose' internals of VG struct.
ATM we use lists to keep all LVs - we may want to switch
to better struct for quicker 'search'.
Since we do not need 'lists' but always actual LV,
switch find_lv_in_vg_by_lvid() to return LV,
and replaces some use case of find_lv_in_vg()
with 'better' working find_lv() which already
returns LV.
When 'lvextend -L+XX vg/thinpool' do not leave inactive table
loaded for 'wrapping' LV on top of resized thin-pool
(ATM we use linear LV for this with same size as thin-pool).
Udev recently start to 'link-in' major amount of useless libs.
(Seem to be faulty 'systemd' link-in all issue)
Anyway - avoid locking those libs in RAM.
When preloading thin-pool device node for already
existing/running thin-pool do not resume such thin-pool.
This allows to properly schedule commit point for metadata,
when thin-pool data or metadata volume is resized.
Extra space between 'cache' target and metadata device caused
string comparation being not equal and thus always causing
table reload even when uneeded.
Avoid internal error message where thin pool repair code tries to
fix cache pool - was catched later in code stack, so rather
catch this early and make the repair function exlusive
to thin pools.
So far we have no code for repairing cache pools
(other then the automatic during activation/deactivation).
To handle multiple VGs with the same name.
Simply using the VG name is ambiguous, and
lvmetad requires the VG uuid be used to
specify which one is meant.
Coverity here is a bit 'blind' here and cannot resolve which
code paths are actually able to hit this code path.
(It's using 'statistic' to resolve all possible paths,
and it's not scanning 'individual' code paths.)
This just cleans warns and add 'cheap' tests.
Use 'mda' instead of NULL to quite Coverity warn.
However this code seems to be actually not even possible to hit.
With proper analysis it may possibly be dropped from code to
simplify logic.
Skip testing target_pvs for NULL, we already
dereference it in many other places.
If check would ever be needed - it needs to be
in front of _raid_extract_images().
In lookup, return a count of entries with the
same key rather than the value from a second
entry with the same key.
Using some slightly different names.
Simply use lookup_withval right away rather than doing a
standard lookup, checking for the wrong mapping, then
repeating with lookup_withval to get the right mapping.
Coverity is not able to understand assembly language in
system's header file, so provide model for such macro.
Note: to really see model in-use: #nodef FD_ZERO model_FD_ZERO
need to go to coverity/config/user_nodefs.h
Add missing display_lvname in _lvconvert_merge_thin_snapshot().
Also when we detect missing origin, report Internal error,
which would likely be the primary fault here
(and avoid dereft of NULL origin as noticed by Coverity).
When reading older lvm2 metadata for cache-pool - we now handle more
extended syntax - basically we want to enter most setting when
actually creating cached LV.
For this new validation code has been added. However older metadata
without new settings set is now found as invalid.
Fix it by adding default settings for cache policy mq
and cache mode writethrough.
If the data len is passed into the hash table
and saved there, then the hash table internals
do not need to assume that the data value is
a string at any point.
New hash table functions are added that allow for
multiple entries with the same key. Use of the
vgname_to_vgid hash table is converted to these
new functions since there are multiple entries
in vgname_to_vgid that have the same key (vgname).
When multiple VGs with the same name exist, commands
that reference only a VG name will fail saying the
VG could not be found (that error message could be
improved.) Any command that works with the select
option can access one of the VGs with -S vg_uuid=X.
vgrename is a special case that allows the first VG
name arg to be replaced by a uuid, which also works.
(The existing hash table implementation is not well
suited for handling this case, but it works ok with
the new extensions. Changing lvmetad to use its own
custom hash tables may be preferable at some point.)
Here Coverity cannot see the pointer cannot be NULL in this
code path - opened coverity case #00531860.
We could make a model to avoid seeing related reports,
but then we loose coverage for modeled function.
So decided to add minor hint for this case.
Recent change 2c8d6f5c90
actually droped restart when the reason of failing open is missing
device completely - check for ENOENT now as another reason
to start new dmeventd server (when there is no systemd to maintain it).
The udev_device_get_is_initialized is available since libudev version
165. Older versions are still used somewhere (e.g. RHEL6). So better
check for this fn and use it only if it's available.
Udev db records are marked as not initialized (incomplete) on timeout.
Issue an error message whenever LVM finds such records so users are
aware that something's going wrong with udev db.
This is important in case we use devices/external_device_info_source="udev"
where udev database records are used to do various filtering decisions.
For example:
udev log of timed out worker:
Nov 11 13:02:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' is taking a long time
Nov 11 13:04:25 raw.virt systemd-udevd[607]: seq 1997 '/devices/virtual/block/dm-2' killed
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] terminated by signal 9 (Killed)
Nov 11 13:04:25 raw.virt systemd-udevd[607]: worker [11221] failed while handling '/devices/virtual/block/dm-2'
...
LVM also issues error message visibly if incomplete udev db record is found,
devices/external_device_info_source="udev" is set:
$ pvs
Udev database has incomplete information about device /dev/dm-2.
Failed to get external handle for device /dev/dm-2 [udev].
...
Coverity noticed this condition is always false and the error
path could never be visited.
So check for all mismatches of supported messages
and actually mark log_error as internal error.
When the first arg is a UUID and vgrename translates
that UUID to a current VG name, the old and new VG
names are not being checked for equality. If they
are equal, it produces an internal error rather than
a proper error.
Use delay_dev to slow down mirror sync so we could more
easily check for race and proper reject of parallel mirror
leg addition/reduction.
Also expose fail in mirror allocation of parallel leg.
Rather then skipping whole test - just do not use it.
Failing tests that have required delay need to deal with reality
and shell either check for HAVE_DM_DELAY and skip portion
of test or using should when needed.
While through all codepaths we never 'read' lock_id unless LCKF_CONVERT,
coverity cannot decrypt this.
As since it's usually better to pass in 'well-defined' data structures
preset lock_id to 0.
Use fputs() when printing plain string,
easier then fprintf which needs to parse it.
Also check fd before close is >= 0 -
it is - but coverity fail to see it, so eliminate
this false-positive warning.
Doing 'stat' checking first and later opening is racy.
And since we do not really care about any 'status' info
here and we read 'sysfs' here - just drop whole 'stat()'
call and directly handle error from failing 'fopen()'.
Coverity here is not fully-in-picture - but please it
with validation of pointer which currently cannot be null,
since we always return at least empty string.
Check for arg_vgid_lookup and arg_name_lookup not being NULL.
Drop checking arg_vgid and arg_name for NULL since they
are already dereference earlier - thus mostly must be NOT NULL.
(If that would be possible larger rework of this function would be
required).
Put calls related to fifo opening into a single function.
Fix Time-Of-Check-Time-Of-Use and use fstat()
and fchmod() on already opened fd instead of
checking first path and then risking to open something
different.
Currently the code creates the log separately after allocating space for
the data and as no data allocation is needed this second time,
total_extents ends up holding zero so use new_extents directly instead.
When reading a foreign VG we cannot write it, since
it belongs to another host. When reading a shared VG
we cannot write it because we may not have an ex lock.
(Or we may be reading the shared VG while not using
lvmlockd in which case it's like reading a foreign VG.)
Add the same checks for wiping outdated PVs. We may
read a foreign or shared VG, or see the PVs, while
another host is part way through writing a new version
of the VG to the PVs. This might cause us to think
some of the PVs are outdated. We do not want to
write another host's PVs, especially when we may
wrongly conclude they are outdated.
When the command gets a list of alternate devices
from lvmetad, log each one directly. This is not
the same as the warnings when adding lvmcache,
which are related to which duplicate is preferred.
If two PVs have different VGs with the same name
(different uuids), one of the VGs is ignored by
lvmetad. A FIXME exists in lvmetad to find a
better response.
update_metadata and pv_found update the cached metadata;
these are both reworked to improve the code, organize it
by each possible state and transition, make it much more
clear what's changing, add more error checking and
handling, and add comments.
The state and content of the cache (hash tables) does not
change (apart from some things that didn't work before),
and the communication to/from commands does not change.
The implementation and organization of the code making
the state changes does change significantly.
One detail related to the content of the cache does change:
different hash tables do not reference the same memory any more;
the target values in each hash table are allocated and freed
individually.
* You should have received a copy of the GNU Lesser General Public License
* along with this program; if not, write to the Free Software Foundation,
* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
* Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef _LVM_LVMPOLLD_CMD_UTILS_H
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.