IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
"vgchange -aay --autoactivation event" is called by our udev rule.
When the udev rule runs, symlinks for devices may not all be created
yet. If the regex filter contains symlinks, it won't work correctly.
This command uses devices that already passed through pvscan. Since
pvscan applies the regex filter correctly, this command inherits the
filtering from pvscan and can skip the regex filter itself.
See the previous commit b2d80db12e13
"pvscan: use alternate device names from DEVLINKS to check filter"
pvscan --cache <dev> is called by our udev rule at a time when all
the symlinks for <dev> may not be created yet (by other udev rules.)
The regex filter in lvm.conf may refer to <dev> using a symlink name
that hasn't yet been created, which would cause <dev> to not match
the filter regex. The DEVLINKS env var, set by udev, contains all
the symlink names for <dev> that have been or will be created.
So, we add all these symlink names to dev->aliases, as if we had
found them in /dev. This allows <dev> to be recognized by a regex
filter containing a symlink for <dev>.
It looks like force was not being used to enable crypt resize,
but rather to force an inconsistency between LV and crypt
sizes, so this is either not needed or force in this case
should have some other meaning.
This reverts commit ed808a9b54.
Update previous commit
"lvresize: only resize crypt when fs resize is enabled"
to enable crypt resizing when --force is set and --resizefs
is not set. This is because it's been allowed in the past
and people have used it, but it's not a good idea.
There were a couple of cases where lvresize, without --fs resize,
was resizing the crypt layer above the LV. Resizing the crypt
layer should only be done when fs resizing is enabled (even if the
fs is already small enough due to being independently reduced.)
Also, check the size of the crypt device to see if it's already
been reduced independently, and skip the cryptsetup resize if
it's not needed.
While the output of building looks more polished, text editors fail to
find source file from compile errors - so until we start to print
all file with full paths - comment out this make build parameter.
Enhance checking vdo constains so it also handles changes of active VDO LVs
where only added difference is considered now.
For this also the reported informational message about used memory
was improved to only list consuming RAM blocks.
Introduce struct vdo_pool_size_config usable to calculate necessary
memory size for active VDO volume.
Function lv_vdo_pool_size_config() is able to read out this
configuration out of runtime DM table line.
Cover a case missed by the recent commit e0ea0706d
"report: query lvmlockd for lv_active_exclusively"
Fix the lv_active_exclusively value reported for thin LVs.
It's the thin pool that is locked in lvmlockd, and the thin
LV state was mistakenly being queried and not found.
Certain LV types like thin can only be activated exclusively, so
always report lv_active_exclusively true for these when active.
If one of the PVs in the VG does not hold metadata, then the
command would fail, thinking that PV was from a different VG.
Also add missing free on that error path.
typo used "cryptresize" as command name
this affects cases where the file system is resized
independently, and then the lvresize command is used
which only needs to resize the crypt device and the LV.
18722dfdf4 lvresize: restructure code
mistakenly changed the overprovisioning check from applying
to all lv_is_thin_type lvs to only lv_is_thin_pool lvs, so
it no longer applied when extending thin lvs. The result
was missing warning messages when extending thin lvs.
The recent change that verifies sys_serial system.devices entries
using the PVID did not exclude non-PV devices from being checked.
The verification code would attempt to use du->pvid which was null
for the non-PVs causing a segfault.
With newer kernels (>5.13) DM_CREATE no longer generates
uevent for DM devices without table.
There are even no sysfs block device entries in such case,
although device has asigned major:minor and it is being listed
by 'dmsetup info'.
So this patch calculates amount of 'table' lines and in case
no table line comes from cmdline or stdin - waiting on cookie
is avoided generically instead of disabling just case with
option --notable - which then also skipped handling of an
option --addnodeoncreate (which is however historical and
should be avoided)
As a result there should be no leaking udev cookies and endlessly
waiting commands like this:
dmsetup create mytestdev </dev/null
We can't assume that strerror_r returns char* just because _GNU_SOURCE is
defined. We already call the appropriate autoconf test, so let's use its
result (STRERROR_R_CHAR_P).
Note that in configure, _GNU_SOURCE is always set, but we add a defined
guard just in case for futureproofing.
Bug: https://bugs.gentoo.org/869404
Query LV lock state in lvmlockd to report lv_active_exclusively
for active LVs in a shared VGs. As with all lvmlockd state,
it is from the perspective of the local node.
Signed-off-by: corubba <corubba@gmx.de>
Add a note to the manpage that lvmlockd is unable to determine
accurately and without side-effects whether a LV is remotely active.
Also change the value of the lv_active_remotely option from false to
undefined for shared VGs to distinctly communicate that inability to
users. Only for local VGs it can be definitely stated that they are not
remotely active.
Signed-off-by: corubba <corubba@gmx.de>
Since now we change deduplication with V4 table line change,
the modification tends to be faster and we can capture for a few ms
also 'status' about opening or closing deduplication index.
Use 'grep -E' to handle both words.
Improve detection of VDO virtual size - so it's not reading VDO
metadata when VDO device is already active and instead we reuse
existing table line for knowing existing metadata size.
NOTE: if there is ever going to be added support for reduction
of VDO virtual size - this method will need to be reworked to
allow size difference only within 'extent_size' alignment.
As we actully use reading of VDO metadata only as extra 'information' source,
and not error command - switch to 'log_debug()' severity with messages
out of parser code.
Handle multiple devices using the same serial number as
their device id. After matching devices to devices file
entries, if there is a discrepency between the ondisk PVID
and the devices file PVID, then rematch devices to
devices file entries using PVID, looking at all disks
on the system with the same serial number.
Only /sys/dev/block/major:minor/device/serial was read to find
a disk serial number, but a serial number seems to be reported
more often in other locations, so check these also:
/sys/dev/block/major:minor/device/vpd_pg80
/sys/class/block/vda/serial (for virtio disks only)
Add very simplistic parser of vdo metadata to be able to obtain
logical_blocks stored within vdo metadata - as lvm2 may
submit smaller value due to internal aligment rules.
To avoid creation of mismatching table line - use this number
instead the one provided by lvm2.
Previously we utilized udev until we got a dbus notification from lvm
command line tools. This however misses the case where something outside
of lvm clears the signatures on a block device and we fail to refresh the
state of the daemon. Change the behavior so we always monitor udev events,
but ignore those udev events that pertain to lvm members.
Note: --udev command line option no longer does anything and simply
outputs a message that it's no longer used.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1967171
The lvm dbus daemon will auto activate on dbus API calls. To
prevent the dbus daemon starting when lvm command line tools are
being used we will check to see if the daemon is running first.
If the daemon is not running, we will not notify the daemon.
For this check to work it requires the changes done previously
with commit: 3fdf449348
Reviewed-by: David Teigland <teigland@redhat.com>
The number of extents for the sanlock lvmlock lv is calculated using
integer division, which rounds towards zero. With a physical extent size
of 129M, instead of the requested 256M the lv is only 129M (1 extent).
With any physical extent size greater than 256M the lv creation fails
because the number of extents is zero.
This is fixed by replacing the integer division with a division macro
that rounds up and thus guarantees that the size of the lv will always
be equal or greater than the requested size. Using the examples above, a
pes of 129M will result in a 258M lv (2 extents), pes of 300M in a 300M
lv (1 extent).
The re-calculation of the lv size in bytes and megabytes is only so the
debug output shows the correct values. The size in mb there is still
not byte-perfect-accurate, but good enough for a human-readable estimate;
and the exact size in bytes and extents is right next to it.
Signed-off-by: corubba <corubba@gmx.de>
When executing process_each_lv_in_vg, we process live LVs first and
after that, we process any historical LVs. In case we have just removed
an LV, which also means we have just made it "historical" and so it
appears as fresh item in vg->historical_lvs list, we have to skip it
when we get to processing historical LVs inside the same process_each_lv_in_vg
call.
The simplest approach here, without introducing another LV list, is to
simply mark such historical LVs as "fresh" directly in struct
historical_logical_volume when we have just removed the original LV
and created the historical LV for it. Then, we just need to check the
flag when processing historical LVs and skip it if it is "fresh".
When we read historical LVs out of metadata, they are marked as
"not fresh" and so they can be processed as usual.
This was mainly an issue in conjuction with -S|--select use:
# lvmconfig --type diff
metadata {
record_lvs_history=1
}
(In this example, a thin pool with lvol1 thin LV and lvol2 and lvol3 snapshots.)
# lvs -H vg -o name,pool_lv,full_ancestors,full_descendants
LV Pool FAncestors FDescendants
lvol1 pool lvol2,lvol3
lvol2 pool lvol1 lvol3
lvol3 pool lvol2,lvol1
pool
# lvremove -S 'name=lvol2'
Logical volume "lvol2" successfully removed.
Historical logical volume "lvol2" successfully removed.
...here, the historical LV lvol2 should not have been removed because
we have just removed its original non-historical lvol2 and the fresh
historical lvol2 must not be included in the same processing spree.
The new device_id types are: wwid_naa, wwid_eui, wwid_t10.
The new types use the specific wwid type in their name.
lvm currently gets the values for these types by reading
the device's vpd_pg83 sysfs file (this could change in the
future if better methods become available for reading the
values.)
If a device is added to the devices file using one of these
types, prior versions of lvm will not recognize the types
and will be unable to use the devices.
When adding a new device, lvm continues to first use sys_wwid
from the sysfs wwid file. If the device has no sysfs wwid file,
lvm now attempts to use one of the new types from vpd_pg83.
If a devices file entry with type sys_wwid does not match a
given device's sysfs wwid file, the sys_wwid value will also
be compared to that device's other wwids from its vpd_pg83 file.
If the kernel changes the wwid type reported from the sysfs
wwid file, e.g. from a device's t10 id to its naa id, then lvm
should still be able to match it correctly using the vpd_pg83
data which will include both ids.
t10 wwids are now edited in the same way that multipath does,
which is replacing a series of spaces with one _. Previously
lvm replaced every space with one _. Devices file entries
with the old form will be converted to the new shorter form.
Move the functions handling dev wwids.
Add dev flags indicating that wwids have been read from
sysfs wwid file or sysfs vpd_pg83 file. This can be
used to avoid rereading these.
Improve filter-mpath search for a device's wwid in
/etc/multipath/wwids, to avoid unnecessary rereading
of wwids from sysfs files.
Type 8 wwids from vpd_pg83 with naa or eui names should be
saved as those types.
If lvmlockd in cluster is killed accidently or any other reason, the
lock resources will become orphaned in the VG lockspace. When the
cluster manager tries to restart this daemon, the LVs will probably
become inactive because of resource schedule policy and thus the lock
resouce will be omited during the adoption process. This patch will
try to purge the lock resources left in previous lockspace, so the
following actions can work again.
Exclude the new fs resizing capabilities at build time
(rather than run time) if the necessary libblkid features
are not available. When excluded, all fs resizing options
are translated to resize_fsadm. Accessing the new
features now requires rebuilding lvm if libblkid is
upgraded.
Place the addCleanup at the end as we don't want to go through clean up
if we don't make it through setUp. If we don't do this we can remove VGs
that we didn't create in the unit test.
Previously when the __del__ method ran on LVMShellProxy we would blindly
call terminate(). This was a race condition as the underlying process
may/maynot be present. When the process is still present the SIGTERM will
end up being seen by lvmdbusd too. Re-work the code so that we
first try to wait for the child process to exit and only then if it hasn't
exited will we send it a SIGTERM. We also ensure that when this is
executed we will briefly ignore a SIGTERM that arrives for the daemon.
If we plan to use dm throttling for mirror targets - we actually
have to check whether kernel runs with CONFIG_HZ_1000 - if it does
not the whole idea of throttling is actually not working in the
testsuite as within a single 'tick' with HZ 100 way too much date
is being moved on any modern hardware - and since there is no plan
to change this in kernel - we simply avoid using throttling on such
kernel and test needs to work differently - either ignore results
or use much larger mirror sizes...
When checking to see if the PV is missing we incorrectly checked that the
path_create was equal to PV creation. However, there are cases where we
are doing a lookup where the path_create == None. In this case, we would
fail to set lvm_id == None which caused a problem as we had more than 1
PV that was missing. When this occurred, the second lookup matched the
first missing PV that was added to the object manager. This resulted in
the following:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 25, in _main_thread_load
(changes, remove) = load_pvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/pv.py", line 46, in load_pvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 55, in common
del existing_paths[dbus_object.dbus_object_path()]
Because we expect to find the object in existing_paths if we found it in
the lookup.
resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2085078
Latest upstream build of lvm results in the following error when
trying to use lvmshell.
"Argument --reportformat cannot be used in interactive mode.,
Error during parsing of command line."
When lvm is compiled with editline, if the file descriptors don't look like
a tty, then no "lvm> " prompt is done. Having lvm output the shell prompt
when consuming JSON on a report file descriptor is very useful in
determining if lvm command is complete.
Historically we have seen a few different errors which occur when we call
fullreport. Failing exit code and JSON which is missing one or more keys.
Instruct lvm to dump the debug to a file during fullreport calls when we
fork & exec lvm. If we encounter an error, ouput the debug data.
The reason this isn't being done when lvmshell is used is because we
don't have an easy way to test the error paths.
This change is complicated by the following:
1. We don't know if fullreport was good until we evaluate all the JSON.
This is done a bit after we have called into lvm and returned.
2. We don't want to orphan the debug file used by lvm if the daemon is
killed. Thus we try to minimize the window where the debug file hasn't
already been unlinked. A RFE to pass an open FD to lvm for this
purpose is outstanding.
The temp. file is:
-rw------. 1 root root /tmp/lvmdbusd.lvm.debug.XXXXXXXX.log
Introduce an exception which is used for known existing issues with lvm.
This is used to distinguish between errors between lvm itself and lvmdbusd.
In the case of lvm bugs, when we simply retry the operation we will log
very little. Otherwise, we will dump a full traceback for investigation
when we do the retry.
Instead of lumping all the exceptions, break them out to handle the dbus
exceptions separately, to reduce the amount of debug information that ends
up in the journal that has questionable value.
Lvm occasionally fails to return all the request JSON keys in the output of
"fullreport". This happens very rarely. When it does the daemon was reporting
the resulting informational exception:
MThreadRunner: exception
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 143, in load_lvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 95, in lvs_state_retrieve
l['vdo_operating_mode'],
KeyError: 'vdo_operating_mode'
The daemon retries the operation, which usually works and the daemon continues.
However, simply reporting this informational stack trace is causing CI and other
automated tests to fail as they expect no tracebacks in the log output.
Remove the reporting of this code path unless it persists and causes the daemon
to give up and exit.
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2120267
When the daemon isn't started with --debug we will keep a circular
buffer of the past N number of debug messages which we will output
when we encounter an issue.
Rather than trying to bubble up return codes that get us to exit cleanly
it's better to just raise an exception to bail. In some cases functions
don't have return codes, so they cannot be checked.
We were incorrectly only setting this if --udev wasn't present on the
command line. In all cases when we see a manager.ExternalEvent we want
to set this.
Depending on when an occurs, it maynot have any information available for
lastlog. In this case try to grab an error message from the original
response.
Introduce a new lock for the flight recorder, so that we can dump it when
a command is block waiting for lvm to complete. Also in all paths we will
addthe metadata to the flight recorder before it's done, so we will have
it when a command hangs and we dump the flight recorder. Add the missing
bits after the command has finished.
Cleaned up the output too.
We put this in to test one of the paths in the damon, but unfortunately
if we hit the race condition where the job actually is done we will try
to call j.Wait(1) after the remove. This fails with:
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.UnknownMethod:
Method "Wait" with signature "i" on interface "com.redhat.lvmdbus1.Job"
doesn't exist
Which is caused by the dbus object no longer existing. We could handle
this, but the issue is we no longer have the ability to get the result to
return, they have been lost.
A better solution would be to write a specific unit test to force this code
path and handle all the possible outcomes.
These files are automatically cleared on reboot given
that /run is tmpfs, and that remains the primary way
of clearing online files.
But, if there's extreme use of vgcreate+pvscan+vgremove
between reboots, then removing online files in vgremove
will limit the number of unused online files using space
in /run.
The new option "--fs String" for lvresize/lvreduce/lvextend
controls the handling of file systems before/after resizing
the LV. --resizefs is the same as --fs resize.
The new option "--fsmode String" can be used to control
mounting and unmounting of the fs during resizing.
Possible --fs values:
checksize
Only applies to reducing size; does nothing for extend.
Check the fs size and reduce the LV if the fs is not using
the affected space, i.e. the fs does not need to be shrunk.
Fail the command without reducing the fs or LV if the fs is
using the affected space.
resize
Resize the fs using the fs-specific resize command.
This may include mounting, unmounting, or running fsck.
See --fsmode to control mounting behavior, and --nofsck to
disable fsck.
resize_fsadm
Use the old method of calling fsadm to handle the fs
(deprecated.) Warning: this option does not prevent lvreduce
from destroying file systems that are unmounted (or mounted
if prompts are skipped.)
ignore
Resize the LV without checking for or handling a file system.
Warning: using ignore when reducing the LV size may destroy the
file system.
Possible --fsmode values:
manage
Mount or unmount the fs as needed to resize the fs,
and attempt to restore the original mount state at the end.
nochange
Do not mount or unmount the fs. If mounting or unmounting
is required to resize the fs, then do not resize the fs or
the LV and fail the command.
offline
Unmount the fs if it is mounted, and resize the fs while it
is unmounted. If mounting is required to resize the fs,
then do not resize the fs or the LV and fail the command.
Notes on lvreduce:
When no --fs or --resizefs option is specified:
. lvextend default behavior is fs ignore.
. lvreduce default behavior is fs checksize
(includes activating the LV.)
With the exception of --fs resize_fsadm|ignore, lvreduce requires
the recent libblkid fields FSLASTBLOCK and FSBLOCKSIZE.
FSLASTBLOCK*FSBLOCKSIZE is the last byte used by the fs on the LV,
which determines if reducing the fs is necessary.
This test could only be run when user passes LVM_TEST_DEVDIR=/dev
as it requires and expects actions to be going in this dir, skip
otherwise.
Also 'extend_filter' manages multiple args in on lvm.conf update.
mdraid has some breakage - so 5.19 is crashing
(possibly even some more older version - that can be added as well)
Test seems to pass with 6.0-rc kernel.
Add new define 'newline' for use in 'foreach()'
Add new $(SHOW) for makefile printing output
Add 'make print-VAR' for easier debugging of Makefiles' variables.
Fix lv_active to be of BIN type instead of STR. This allows lv_active to
follow the report/binary_values_as_numeric setting as well as --binary
cmd line switch. Also, it makes it possible to use -S|--select with
either textual or numeric representation of the value, like 'lvs -S
active=active' but also 'lvs -S active=1'.
Take the devices file lock before creating a new devices file.
(Was missed by the change to preemptively create the devices
file prior to setup_devices(), which was done to improve the
error path.)
When using --all, if one VG is skipped, continue adding
other VGs, and do not return an error from the command
if some VGs are added. (A VG is skipped if it's missing PVs.)
If the command fails during devices file setup or device
scanning, then remove the devices file if it has been
newly created by the command, and exit with an error.
If devices from a named VG are not imported (e.g. the
VG is missing devices), then remove the devices file if
it has been newly created by the command, and exit with
an error.
If --all VGs are being imported, and no devices are found
to include in the devices file, then remove the devices
file if it has been newly created by the command, and
exit with an error.
Names matching internal code layout.
Functionc in thin_manip.c uses thin_pool in its name.
Keep 'pool' only for function working for both cache and thin pools.
No change of functionality.
Certain args can't be used in lvm shell ("interactive mode") because
they are not supported there. Add ARG_NONINTERACTIVE flag to mark
such args and error out if we're in interactive mode and at the same
time we detect use of such argument.
Currently, this is the case for --reportformat arg - we don't support
changing the format per command in lvm shell. The whole shell is running
under a reportformat chosen at shell's start.
Commit 73ec3c954b added a way to print
only a part of the report string (repstr) to support decoding individual
string list items out of repstr.
The repstr is normally printed through _safe_repstr_output so that any
JSON_QUOTE character ('"') found within the repstr is escaped to not
interfere with value quoting in JSON format.
However, the commit 73ec3c954b missed
checking the 'len' argument passed to _safe_repstr_output function when
adding the rest of the repstr after all previous JSON_QUOTE characters
were escaped (when calling the last dm_pool_grow_object). When 'len'
is 0, we need to calculate the 'len' ourselves in the function by
simply calling strlen. This is because 'len' is passed to the function
only if we're taking a part of repstr, not as a whole.
If we failed or logged anything before we actually execute given command
in lvm shell, we couldn't report the log using lastlog command after.
This patch adds specific 'pre-cmd' log report object type to identify
such log messages and enables lastlog to report even this log.
pvcreate with --uuid would segfault if a devices file entry matched
the specified pvid, but the devices file entry had no device_id, which
could happen if the entry has a devname idtype.
When the volume size is extended, there is no need to flush
IO operations (nothing can be targeting new space yet).
VDO target is supported as target that can safely work with
this condition.
Such support is also needed, when extending VDOPOOL size
while the pool is reaching its capacity - since this allows
to continue working without reaching 'out-of-space' condition
due to flushing of all in flight IO.
When we wanted to insert '#' before a config line (to comment it out),
we used dm_pool_strndup to temporarily copy the space prefix first so
we can assemble the final line with:
"<space_prefix># <key>=<value>":
out of original:
"<space_prefix><key>=<value>".
The space_prefix copy is not necessary, we can just use fprintf's
precision modifier "%.*s" to print the exact part if we alrady
know space_prefix length.
The new --valuesonly option causes the lvmconfig output to contain only
values without keys for each config node. This is practical mainly in
case where we use lvmconfig in scripts and we want to assign the value
to a different custom key or simply output the value itself without the
key.
For example:
# lvmconfig --type full activation/raid_fault_policy
raid_fault_policy="warn"
# lvmconfig --type full activation/raid_fault_policy --valuesonly
"warn"
# my_var=$(lvmconfig --type full activation/raid_fault_policy --valuesonly)
# echo $my_var
"warn"
Internally, NUM and BIN fields are marked as DM_REPORT_FIELD_TYPE_NUM_NUMBER
through libdevmapper API. The new 'json_std' format mandates that the report
string representing such a value must be a number, not an arbitrary string.
This is because numeric values in 'json_std' format do not have double quotes
around them. This practically means, we can't use string synonyms
("named reserved values") for such values and the report string must always
represent a proper number.
With 'json' and 'basic' formats, this is not an issue because 'basic' format
doesn't have any structure or typing at all and 'json' format puts all values
in quotes, including numeric ones.
When we tested lvm2, the kernel injected various random faults.
(gdb) bt
...
(gdb) p vg
$1 = (struct volume_group *) 0x0
(gdb) p use_previous_vg
$2 = (unsigned int *) 0x0
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Update to more recent version of configure script to support more
new architecture types like RISCV64. Tools in use ATM:
autoconf-2.71-3.fc37.noarch
autoconf-archive-2022.02.11-3.fc37.noarch
automake-1.16.5-9.fc37.noarch
Resolves https://bugzilla.redhat.com/show_bug.cgi?id=2118243
When lvcreate is makeing VDO pool and user has not specified -V size,
ATM we actually run 'vdoformat' twice to get properly 'extent' aligned
size matching lvm2 properties - so the 2nd. run of vdoformat actually
can stay with 'log_verbose()' so the standard printed result
is not showing confusing info (which is now also correctly using
print_unless_silent)
The 'pe_start' column was incorrectly marked as being of type NUM.
This was not correct as pe_start is actually of type SIZ, which means
it can have a size suffix and hence it's not a pure numeric value.
Proper column type is important for selection to work correctly, so we
can also do comparisons while using suffixes.
This is also important for new "json_std" output format which does not
put double quotes around pure numeric values. With pe_start incorrectly
marked as NUM instead of SIZ, this produced invalid JSON output
like '"pe_start" = 1.00m' because it contained the 'm' (or other)
size suffix. If properly marked as SIZ, this is then put in double
quotes like '"pe_start" = "1.00m"'.
In JSON format, we print string list this way:
"key" = "item1,item2,...,itemN"
while in JSON_STD format, we print string list this way:
"key" = ["item1","item2",...,"itemN"]
Before, we stored only the report string itself for a string list
in field->report_string. The field->report_string has either
sorted items or not, depending on what we need for a field -
some report fields have sorted output, some don't...
The field->sort_value.value then contains pointer to the exact
field->report_string. The field->sort_value.items ALWAYS keeps
sorted array of individual items, represented as '[position,length]'
pairs pointing to the field->sort_value.value string.
This approach was fine as far as we didn't need to apply further
formatting to field->report_string. However, if we need to apply
further formatting to field->report_string content, taking into
account individual items, we also need to know where each item
starts and what is its length. Before, we only knew this when
items in report string were sorted, but not in the unsorted version.
We can't rely on the delimiter (default ",") only to separate items
back out of report string, because that delimiter can be contained
in the item value itself.
So this patch enhances the field->report_string for a string list so
it also contains '[position,length]' pairs for each individual item
inside field->report_string. We store this array right beyond the
string itself and we encode it in the same manner we already did for
field->sort_value.items before.
If field->report_string has sorted items, the field->sort_value.items
just points to the array of items we store beyond the report string.
If field->report_string has unsorted items, we store separate array
of items for both field->report_string and field->sort_value.
This patch also cleans up the _report_field_string_list function a bit
so it's easier and more straightforward to follow than the original
version.
Example. If we have "abc", "xy", "defgh" as list on input with ","
as delimiter, then:
- field->report_string will have:
- if we need field->report_string unsorted:
abc,xy,defgh\0{[3,12],[0,3],[4,2],[7,5]}
|____________||________________________|
string array of [pos,len] pairs
|____||________________|
#items items
- if we need field->report_string sorted:
repstr_extra
|
V
abc,defgh,xy\0{[3,12],[0,3],[4,5],[10,2]}
|____________||________________________|
string array of [pos,len] pairs
|____||________________|
#items items
- field->sort_value will have:
- if field->report_string is unsorted:
field->sort_value.value = field->report_string
field->sort_value.items = {[0,3],[0,3],[7,5],[4,2]}
(that is 'abc,defgh,xy')
- if field->report_string is sorted already:
field->sort_value.value = field->report_string
field->sort_value.items = repstr_extra
(that is also 'abc,defgh,xy')
For JSON_STD format, use 'null' if a field has no value at all.
In JSON format, we print undefined numeric values this way:
"key" = ""
while in JSON_STD format, we print undefined numeric values this way:
"key" = null
(Keep in mind that 'null' is different from 0 (zero value) which is
a defined value.)
In JSON format, we print numeric values this way:
"key" = "N"
while in JSON_STD format, we print numeric value this way:
"key" = N
(Where N is a numeric value.)
The original JSON formatting will be still available using the original
DM_REPORT_GROUP_JSON identifier. Subsequent patches will add enhancements
to JSON formatting code so that it adheres more to JSON standard - this
will be identified by new DM_REPORT_GROUP_JSON_STD identifier.
If using JSON format for lvm shell's output, the error message about
exceeding the maximum number of arguments was not reported on output if
this condition was ever hit.
This is because the JSON format (as well as any other future format)
requires extra formatting compared to "basic" format and so it also
requires extra calls when it comes to reporting. The report needs to
be added to a report group and then popped and put on output with
specialized "dm_report_group_output_and_pop_all".
This "output and pop" is normally executed after we execute the command
in the lvm shell. When we didn't get to the command exection at all because
some precondition was not met (like hitting the limit for the number of
arguments for the command here), we skipped this important call and
so there was no log report output.
Right now, it's only this exact error message for which we need to call
"output and pop" directly, all the other error messages are about
initializing and setting the log report itself which we can't report
obviously.
Before this patch:
lvm> pvs 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
lvm>
With this patch applied:
lvm> pvs 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
{
"log": [
{"log_seq_num":"1", "log_type":"error", "log_context":"shell", "log_object_type":"cmd", "log_object_name":"", "log_object_id":"", "log_object_group":"", "log_object_group_id":"", "log_message":"Too many arguments, sorry.", "log_errno":"-1", "log_ret_code":"0"}
]
}
If there's any other error message in the future before we execute the
command itself, we also need to call the "output and pop" directly.
multipath_component_detection=0 has always applied to the filter-based
component detection. Also apply this setting to the duplicate-PV
handling which also eliminates multipath components (based on duplicate
PVs having the same wwid.)
When creating VDO pool based of % values, lvm2 is now more clever
and avoids to create 'unsupportable' sizes of physical backend
volumes as 16TiB is maximum size supported by VDO target
(and also limited by maximum supportable slabs (8192) based on slab
size.
If the requested virtual size is approaching max supported size 4PiB,
switch header size to 0.
Newer VDO kernel target require to have matching virtual size - this
however cause incompatiblity when lvcreate is let to format VDO data
device and read the usable size from vdoformat.
Altough this is a kernel regression and will likely get fixed,
lvm2 can actually reformat VDO device to use properly aligned VDO LV
size to make this problem disappear.
Add function to check for avaialble memory for particular VDO
configuration - to avoid unnecessary machine swapping for configs
that will not fit into memory (possibly in locked section).
Formula tries to estimate RAM size machine can use also with
swapping for kernel target - but still leaving some amount of
usable RAM.
Estimation is based on documented RAM usage of VDO target.
If the /proc/meminfo would be theoretically unavailable, try to use
'sysinfo()' function, however this is giving only free RAM without
the knowledge about how much RAM could be eventually swapped.
TODO: move _get_memory_info() into generic lvm2 API function used
by other targets with non-trivial memory requirements.
Keep single source for most of values printed in lvm.conf
(still needs some conversion)
Correct max for logical threads to 60
(we may refuse some older configuration which might eventually
user higher numbers - but so far let's assume no user have ever set this
as it's been non-trivial and if would complicate code unnecessarily.)
Accept maximum of 4PiB for virtual size of VDO LV
(lvm2 will drop 'header borders to 0 for this case').
Patch 1b070f366b should have
been already fixing this issue but since it the incorrect
patch rebasing the change to vdo_slabSize got lost.
So again now with explicit one-line patch.
Fixes commit b8f4ec846 "display: ignore --reportformat"
by restoring the --reportformat option to pvdisplay.
Adding -C to pvdisplay turns the command into a reporting
command (like pvs, vgs, lvs) in which --reportformat can
be useful.
to compare with wwids in /etc/multipath/wwids when
excluding multipath components. The wwid printed
from the sysfs wwid file may not be the wwid used
in multipath wwids. Save the wwids found for each
device on dev->wwids to avoid repeating reading
and parsing the sysfs files.
Update calling vdo manager since our vdo wrapper has a simple shell
arg parser so it needs args without '='
Also correct using DM_DEV_DIR for 'pvcreate'
Introduce a replacement vdo manager wrapper for testing.
When using test suite on a system without vdo manager (which has got
deprecated) - we still need its functionality to prepare 'vdo volume'
for testing lvm_import_vdo.
Wrapper currently need 2 binaries from older 'vdo 6.2' package -
to be named:
oldvdoformat - format VDO metadata with older format
oldvdoprepareforlvm - shift vdo metadata by 1MiB
When converting VDO volume, the parameter vdo_slabSize was
incorrectly copied as vdo_blockMapCacheSize, however this parameter
is then no longer used for any table line creation so the wrong
value was only stored in metadata.
Also use just single get_kb_size_with_unit_ and remove it's duplicate
functionality with get_mb_size_with_unit_.
Use $VERB for vdo remove call.
Fixes commit 494372b4ee
"filter-mpath: use multipath blacklist"
to handle wwids with initial type digits 1 and 2 used
for t10 and eui ids. Originally recognized type 3 naa.
A typo of the filename after --devicesfile should result in a
command error rather than the command falling back to using no
devices file at all. Exception is vgcreate|pvcreate which
create a new devices file if the file name doesn't exist.
When processing historical LVs inside process_each_lv_in_vg for
selection, we need to use dummy "_historical_lv" for select_match_lv.
This is because a historical LV is not an actual LV, but only a tiny
representation with subset of original properties that we recorded
(name, uuid...).
To use the same processing functions we use for full-fledged non-historical
LVs, we need to use the prefilled "_historical_lv" structure which has all
the other missing properties hard-coded.
Allow to use --vdosettings with lvcreate,lvconvert,lvchange.
Support settings currenly only configurable via lvm.conf.
With lvchange we require inactivate LV for changes to be applied.
Settings block_map_era_length has supported alias block_map_period.
Explicit wwid's from these sections control whether the
same wwid in /etc/multipath/wwids is recognized as a
multipath component. Other non-wwid keywords are not
used, and may require disabling the use of the multipath
wwids file in lvm.conf.
in vgcreate for shared sanlock vg, if sanlock_write_resource
returns an unexpected error, then make init_vg_sanlock fail
which will cause the vgcreate to fail.
When a VG has PVs with different device id types,
it would try to use the idtype of the previous PV
in the loop. This would produce an unncessary warning,
or could lead to using the devname idtype when a better
idtype is available.
In most installations, /dev/sda* or /dev/vda* should be included
in system.devices because the root, home, etc LVs are usually on
sda or vda.
Add a special case warning when a pvscan autoactivation command
sees that /dev/sda* or /dev/vda* are excluded by system.devices,
either not listed or having a different device id.
Change messages that refer to devices being "excluded by filters"
to say just "excluded". This will avoid mistaking the word
"filters" with the lvm.conf filter setting.
Warn if a scsi device is listed in the devices file that
is used by a multipath device that is not listed. This
will happen if a scsi device is listed in the devices
file and then an mpath device is set up to use it.
The way to correct this would be to remove the devices
file entry for the component device and add a new entry
for the multipath device.
When thin-pool had queued some delete message on extension operation
such message has been 'lost' and thin-pool kernel metadata has been
left with a thin volume that no longer existed for lvm2 metadata.
It's more logical to warn about --nolocking in the man page
before it's used rather than after it's used and too late.
Also, warnings are usually for things the user may not know.
dev_name(dev) returns "[unknown]" if there are no names
on dev->aliases. It's meant mainly for log messages.
Many places assume a valid path name is returned, and
use it directly. A caller that wants to use the path
from dev_name() must first check if the dev has any
paths with dm_list_empty(&dev->aliases).
Use dev_cache_get_existing() in a few common, high level
locations where it's obvious that only existing dev-cache
entries are wanted. This can be expanded and used in more
locations (or dev_cache_get can stop creating new entries.)
along with some basic checks for cases when a device
has no aliases.
lvm itself creates many situations where a struct device
has no valid paths, when it activates and opens an LV,
does something with it, e.g. zeroing, and then closes
and deactivates it. (dev-cache is intended for PVs, and
the use of LVs should be moved out of dev-cache in a
future patch.)
Different target type for LV it's not an internal error.
i.e. when target type is replaced with 'error' type - it should be
reported as regular warning and not cause interruption of command with
internall error.
With very old version of DM target driver we have to avoid
trying to use newuuid setting - otherwise we get error
during ioctl preparation phase.
Patch is fixing regression from commit:
988ea0e94c
In a certain disconnected state, a block device is present on
the system, can be opened, reports a valid size, reports the
correct device id (wwid), and matches a devices file entry.
But, reading the device can still fail. In this case,
device_ids_validate() was misinterpreting the read error as
the device having no data/label on it (and no PVID).
The validate function would then clear the PVID from the
devices file entry for the device, thinking that it was
fixing the devices file (making it consistent with the on disk
state.) Fix this by not attempting to check and correct a
devices file entry that cannot be read. Also make this case
explicit in the hints validation code (which was doing the
right thing but indirectly.)
When compiled and used with:
CFLAGS="-fsanitize=address -g -O0"
ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
we have few reported issue - they where not normally spotted, since
we were still accessing our own memory - but ouf of buffer-range.
TODO: there is still something to enhance with handling of #orphan vgids
If a dm device is suspended, we can't run blkid on it. But earlier
rules (e.g. 11-dm-parts.rules) might have imported previously scanned
properties from the udev db, in particular if the device had been correctly
set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing
ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority
devices (such as multipath components) might take over the symlink
temporarily.
Likewise, we should't stop watching a temporarily suspended, but previously
correctly configured dm device.
Signed-off-by: Martin Wilck <mwilck@suse.com>
event based autoactivation is now the only method that lvm
provides for autoactivation.
Setting lvm.conf event_activation=0 can still be used to disable
event based autoactivation commands, but doing so will no longer
enable static autoactivation.
Removes some incorrect and unnecessary checks for other entries
when adding a new devices. The removed checks and corrections were
mostly redundant with what is already done by device id matching.
Other checking is reworked so the warnings are a bit different.
When a device has a wwid (from sysfs), but lvm ignores the wwid,
e.g. because it contains an unreliable "QEMU" value, then lvm
falls back to using IDTYPE=devname (the device name) as the
device id. If the device name changes after reboot, then lvm
automatically searches for the PV on other devices to find the
new device name and correct system.devices. When searching for
the PV, lvm skips looking at devices that would use other id types,
e.g. if a device would use a wwid and not a devname, then it
skips checking it. However, it failed to account for the fact
that a device may have wwid that was ignored, in which case it
should be checked.
. error exit means that lvmdevices --update would make a change.
. remove check of PART field from --check because it isn't used.
. unlink searched_devnames file to ensure check|update will search
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
After a vg_write, this function was used to attempt to
make lvmcache data match the new state written to disk.
It was not updated correctly in a many or most cases,
and the resulting lvmcache is not actually used after
vg_write, making the update unnecessary.
The destination vg is first written with the EXPORTED flag,
then the source vg is written, then the destination vg is
written again without the EXPORTED flag. Remove an unnecessary
vg_read of the destination vg just before the second write.
This reverts commit bd2baeaaa6.
This commit broke vgrename because vgrename relies on old bugs
in lvmcache_update_vg_from_write and lvmcache_update_vgname
which need to be fixed first.
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
When storage is lost under a sanlock VG, and kill_vg/drop_vg
are used, sanlock_rem_lockspace() may return an error, but
the cleanup steps should still be performed. Without the
cleanup, gl_lsname_sanlock was not cleared. This caused
future lock requests to fail with ENOLS, but the NO_GL_LS
flag was not set due to gl_lsname_sanlock being set.
This caused lockd_global(sh) in lvm commands to fail when
they could succeed.
Fix from guozhonghua216
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
New API extension (internal ATM) for getting a list
of active DM device with extra features like i.e. uuid.
To easily lookout for existing UUID in device list,
there is: dm_device_list_find_by_uuid()
Once the returned structure is no longer usable call:
dm_device_list_destroy()
Struct dm_active_device {} holds all the info,
but is always allocated and destroyed within library.
TODO: once it's stable, copy to libdm
We used to reset 'settings' to their defaults after command is finished.
This however has a drawback we lose all the logging after this point.
So this patch disables this 'reset' to observe for side-effects.
lvm shell should be getting reset when next command is run -
so this might or might not have some 'hidden' effects.
ATM it looks like nothing really bad should happen - we just should be
able to get more logs - at least from normal commands.
If the transient service remains after it's done, then
it prevents the same transient service from being run
again later if the PVs are detached and reattached
(although the behavior of a second autoactivation is not
well defined and may only work in limited cases.)
Description stolen from linux d/b/rbd.c L3:
rbd.c -- Export ceph rados objects as a Linux block device
16 partitions seem to make sense according to L90:
#define RBD_SINGLE_MAJOR_PART_SHIFT 4
Running *scan -vvvvvvdddddd yields
#filters/filter-type.c:28 /dev/rbd1p5: Skipping: Unrecognised LVM device type 252
#filters/filter-persistent.c:131 filter caching bad /dev/rbd1p5
right now, and adding
types = ["rbd", 252]
to /e/l/lvm.conf (with the matching "252 rbd" in /p/devices) works as a
per-machine fix:
rbd1 252:16 0 1T 1 disk
|-rbd1p1 252:17 0 243M 1 part
|-rbd1p2 252:18 0 1K 1 part
`-rbd1p5 252:21 0 1023.8G 1 part
`-dev01--vg-root 253:0 0 1023.8G 0 lvm
but rbd is supported by upstream so it'd be nice to have it work OOB
Fixes commit "pvscan: match device arg to filter symlink"
which failed to account for the fact that filter entries
are not just path names but include "a" or "r", etc.
The device_hint name in the metadata was meant to prevent
autoactivation from md components, but the name checks were
more general and would catch unnecessary cases.
This fixes an issue related to the optimization in
"pvscan: only add device args to dev cache"
If the devices file is not used, and the lvm.conf filter
accepts devices via symlink names, then those devices won't
be accepted by pvscan for autoactivation. To resolve this,
recognize when the filter contains symlinks and disable the
optimization. When the optimization is disabled, a full
dev_cache_scan is performed, and symlinks are associated
with the device names passed to pvscan. filter-regex
will accept a device if symlinks to that device are accepted.
Improve handling of md components that get through the
filter, like the previous improvement for multipath.
If md components get through the filter and trigger
duplicate PV code, then eliminate any devs entirely
that are not an md device.
If multipath component devices get through the filter and
cause lvm to see duplicate PVs, then check the wwid of the
devs and drop the component devices as if they had been
filtered. If a dm mpath device was found among the duplicates
then use that as the PV, otherwise do not use any of the
components as the PV.
"duplicate PVs" associated with multipath configs will no
longer stop commands from working.
Remove the searched_devnames file in a couple more places:
. When hints need refreshing it's possible that a missing
devices file entry could be found by searching devices
again.
. When a devices file entry devname is first found to be
incorrect, a new search for missing entries may be
useful.
When devnames are used as device ids and devnames change,
then new devices need to be located for the PVs. If the old
devname is now used by a filtered device, this was preventing
the code from searching for the new device, so the PV was
reported as missing.
If the optimized label scan fails (using online files),
then clear the device state prior to falling back to the
standard label_scan. This avoids printing output about
unexpected state.
Consistently create the pvs_lookup file for VGs with
more than one PV. Previously the file create would be
skipped if all the PVs happened to already be online.
That led to unpredicatable results in an uncommon case
(when the last PV to come online is the only PV with
metadata.)
Copy another optimization from pvscan -aay to vgchange -aay.
When using the optimized label scan for only one VG, acquire the
VG lock prior to the scan. This allows vg_read to then skip the
repeated label scan that normally happens after locking the vg.
Include the device name in the /run/lvm/pvs_online/pvid files.
Commands using the pvid file can use the devname to more quickly
find the correct device, vs finding the device using the
major:minor number. If the devname in the pvid file is missing
or incorrect, fall back to using the devno.
For completeness and consistency, adjust the behavior
for some variations of:
vgchange -aay --autoactivation event [vgname]
The current standard use is with a VG name arg, and the
command is only called when all pvs_online files exist.
This is the optimal case, in which only pvs_online devs
are read. This remains the same.
Clean up behaviors for some other unexpected uses of the
command:
. With no VG name arg, the command activates any VGs
that are complete according to pvs_online. If no
pvs_online files exist, it does nothing.
. If a VG name is used but no PVs online files exist for
the VG, or the PVs online files are incomplete, then
consider there could be a problem with the pvs_online
files, and fall back to a full label scan prior to
attempting the activation.
Part of the optimization to avoid a full dev_cache_scan requires
translating major:minor numbers to a device name. If this devno
translation fails, then fall back to doing a full dev_cache_scan
which is slower but certain to provide the info. This preserves
the most important part of the label scanning optimization in the
vgchange aay (avoiding dev_cache_scan is a relatively small part
of the optimized activation compared to label scanning.)
Port another optimization from pvscan -aay to vgchange -aay:
"pvscan: only add device args to dev cache"
This optimization avoids doing a full dev_cache_scan, and
instead populates dev-cache with only the devices in the
VG being activated.
This involves shifting the use of pvs_online files from
the hints interface up to the higher level label_scan
interface. This specialized label_scan is structured
around creating a list of devices from the pvs_online
files. Previously, a list of all devices was created
first, and then reduced based on the pvs_online files.
The initial step of listing all devices was slow when
thousands of devices are present on the system.
This optimization extends the previous optimization that
used pvs_online files to limit the devices that were
actually scanned (i.e. reading to identify the device):
"vgchange -aay: optimize device scan using pvs_online files"
Port the old pvscan -aay scanning optimization to vgchange -aay.
The optimization uses pvs_online files created by pvscan --cache
to derive a list of devices to use when activating a VG. This
allows autoactivation of a VG to avoid scanning all devices, and
only scan the devices used by the VG itself. The optimization is
applied internally using the device hints interface.
The new option "--autoactivation event" is given to pvscan and
vgchange commands that are called by event activation. This
informs the command that it is being used for event activation,
so that it can apply checks and optimizations that are specific
to event activation. Those include:
- skipping the command if lvm.conf event_activation=0
- checking that a VG is complete before activating it
- using pvs_online files to limit device scanning
The information in /run/lvm/pvs_online/<pvid> files can
be used to build a list of devices for a given VG.
The pvscan -aay command has long used this information to
activate a VG while scanning only devices in that VG, which
is an important optimization for autoactivation.
This patch implements the same thing through the existing
device hints interface, so that the optimization can be
applied elsewhere. A future patch will take advantage of
this optimization in vgchange -aay, which is now used in
place of pvscan -aay for event activation.
When a device id is set for a device, using an idtype other
than devname, it means that sysfs has been used with the device
to match the device id. So, checking for a sysfs entry for the
device in filter-sysfs is redundant. For any other cases not
covered by this (e.g. devname ids), have filter-sysfs simply
stat /sys/dev/block/major:minor to test if the device exists
in sysfs.
The extensive processing done by filter-sysfs init is removed.
It was taking an immense amount of time with many devices, e.g.
. 1024 PVs in 520 VGs
. 520 concurrent vgchange -ay <vgname> commands
. vgchange scans only PVs in the named VG (based on pvs_online
files from a pending patch)
A large number of the vgchange commands were taking over 1 min,
and nearly half of that time was used by filter-sysfs init.
With this patch, the vgchange commands take about half the time.
Help bootstrapping existing shared vgs into the devices file.
Reading the vg in vgimportdevices would require locking to be
started, but vgchange lockstart won't see the vg if it's not
in the devices file. The lvmlockd locks are not protecting
vg modifications so skipping them here won't be a problem.
As we are not using 'enable-compat' for anything, remove this section.
Also remove duplicated check for blkid.
Move setting of some AC_ARG_ENABLE defaults into the macro so it's always
defined.
When scanning configured /dev dir, avoid entring
directories with different filesystem.
This minimizes risk we will block on i.e. entring
directory with mount point.
Enhance logic for checking supported systemd version,
while doing only a single check for systemd package.
For version checking use PKG_CHECK_EXISTS() macro.
Also use one pkg check for blkid.
Avoid checking version for thin/cache_check when tools are not present
on system.
Resolve event_activation configure option just once.
Do not print debug_devs about 'bad' filtering, when
actually filter already printed reason for skipping
Do not trace more then once about backup being disabled.
No debug when unlinked file does not exists in pvscan.
Handle automatically new setttings
--disable-systemd-journal
--disable-appmachineid
Both setting will check presence of apropriate header files.
In case they are present, build will try to automatically build with
them (adding systemd dependency)
User can anytime disabled them and drop systemd dependency.
Also add --with-default-use-devices-file configure option to
select automatically default value for this option.
For this moment keep default upstream as 0
Reporting non-PVs / "all devices" is only done by
pvs -a or pvdisplay -a, so avoid the work managing
a list of all devices in process_each_pv.
In the case when it's needed, use the results of
label_scan which already determines which devs
are not PVs.
Just setting lvm.conf level=N should not send messages to
syslog (now the journal by default.)
Sending messages to syslog should require setting lvm.conf
log { syslog=1 level=N }.
'.ID_FS_TYPE_NEW' is a custom property added by an LVM UDev rule
which is now being removed and 'ID_FS_TYPE' has the same value.
Signed-off-by: Vojtech Trefny <vtrefny@redhat.com>
new udev rule 69-dm-lvm.rules replaces
69-dm-lvm-meta.rules and lvm2-pvscan.service
udev rule calls pvscan directly on the added device
pvscan output indicates if a complete VG can be activated
udev env var LVM_VG_NAME_COMPLETE is used to pass complete
VG name from pvscan to the udev rule
udev rule uses systemd-run to run vgchange -aay <vgname>
Configure via lvm.conf log/journal or command line --journal.
Possible values:
"command" records command information.
"output" records default command output.
"debug" records full command debugging.
Multiple values can be set in lvm.conf as an array.
One value can be set in --journal which is added to
values set in lvm.conf
pvscan --cache <dev>
. read only dev
. create online file for dev
pvscan --listvg <dev>
. read only dev
. list VG using dev
pvscan --listlvs <dev>
. read only dev
. list VG using dev
. list LVs using dev
pvscan --cache --listvg [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. [check online files and report if VG is complete]
pvscan --cache --listlvs [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. list LVs using dev
. [check online files and report if VG is complete]
. [check online files and report if LVs are complete]
[--vgonline]
can be used with --checkcomplete, to enable use of a vg online
file. This results in only the first pvscan command to see
the complete VG to report 'VG complete', and others will report
'VG finished'. This allows the caller to easily run a single
activation of the VG.
[--udevoutput]
can be used with --cache --listvg --checkcomplete, to enable
an output mode that prints LVM_VG_NAME_COMPLETE='vgname' that
a udev rule can import, and prevents other output from the
command (other output causes udev to ignore the command.)
The list of complete LVs is meant to be passed to lvchange -aay,
or the complete VG used with vgchange -aay.
When --checkcomplete is used, lvm assumes that that the output
will be used to trigger event-based autoactivation, so the pvscan
does nothing if event_activation=0 and --checkcomplete is used.
Example of listlvs
------------------
$ lvs -a vg -olvname,devices
LV Devices
lv_a /dev/loop0(0)
lv_ab /dev/loop0(1),/dev/loop1(1)
lv_abc /dev/loop0(3),/dev/loop1(3),/dev/loop2(1)
lv_b /dev/loop1(0)
lv_c /dev/loop2(0)
$ pvscan --cache --listlvs --checkcomplete /dev/loop0
pvscan[35680] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
LV vg/lv_a complete
LV vg/lv_ab incomplete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop1
pvscan[35681] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
LV vg/lv_b complete
LV vg/lv_ab complete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop2
pvscan[35682] PV /dev/loop2 online, VG vg is complete.
VG vg complete
LV vg/lv_c complete
LV vg/lv_abc complete
Example of listvg
-----------------
$ pvscan --cache --listvg --checkcomplete /dev/loop0
pvscan[35684] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop1
pvscan[35685] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop2
pvscan[35686] PV /dev/loop2 online, VG vg is complete.
VG vg complete
The new system_id_source="appmachineid" will cause
lvm to use an lvm-specific derivation of the machine-id,
instead of the machine-id directly. This is now
recommended in place of using machineid.
Do not store full path with each archived name reduces memory usage if
the directory has thousands of entries and just add 'dir' path when
needed.
Also emit info print message to a user if the total size of archived
files for a VG is more then 128MiB or 8192 files.
TODO: Consider wheather adding a new 'lvm.conf archive{option}' to support
trimming these wild archive sizes can make situation better.
We already support retain_min && retain_days - but if user is
generating too many and too large archives with minutes - maybe archiving
should be disabled by a user - as it's not producing anything largely usable
and just slows-down command ??
If we add 'retain_max & retain_max_size' the condition will go against
each other and we need to chose priorities.
mm
Consider missing config tree from vg read to be an internal error
since we do not want to 'regenerate' this one in expesive parsing way.
Also if there is any failure on recreating committed VG, make it also
a 'vg_write' error.
Corrupt metadata text (with good mda header) was being handled
in the label_scan phase, but not in the vg_read phase. This
was sufficient because metadata areas would always be read and
checksummed during label_scan (metadata parsing was skipped
previously as an optimization.)
This changed with the optimization in
commit 61a6f9905e
"metadata: optimize reading metadata copies in scan"
Now, some metadata areas will not be read and checksummed
at all during the label_scan phase, only during the vg_read
phase. This means that bad metadata text may first be detected
in the vg_read phase. So, add equivalent bad metadata handling
to the vg_read path to match the label_scan path.
With larger metadata, decoding 'localtime()' for hinting time creation
of every LV may cause excessive check of /etc/localtime file.
Set TZ to ":/etc/localtime" so glibc reads this file just once
instead of check everytime if there has anything changed.
While being in lockless scanning phase, we can avoid reading and checking
matching metadata copies if we already know them from other PV
and just rely on matching metadata header information.
These copies will be examined later during locked metadata read/write
access.
This patch may postpone discovering some read failures to locked phase.
When creating lvm2 metadata for VG, lvm2 allocate some buffer,
and if buffer is not big enough, the buffer is 'reallocated' bigger,
and whole metadata creation is repeated until metadata fits.
We can try to use 'previous' metadata size as hint to reduce looping
here.
Preserve computed crc32 check from first written PV, just like we
preserve generated metadata.
Also there is no need to call crc32 twice on wrapping buffer with 2 calcs,
result must be always the same as with single crc32 checking.
Kernel patch 8b638081bd4520f63db1defc660666ec5f65bc15
introduced support to return UUID in DM_LIST_DEVICES_CMD.
Useful when asking for UUID of each device where the list
could be now returned directly with NAME && UUID for each device.
Returning UUID is done in backward-compatible way. There's one unused
32-bit word value after the event number. This patch sets the bit
DM_NAME_LIST_FLAG_HAS_UUID if UUID is present and
DM_NAME_LIST_FLAG_DOESNT_HAVE_UUID if it isn't (if none of these bits is
set, then we have an old kernel that doesn't support returning UUIDs). The
UUID is stored after this word. The 'next' value is updated to point after
the UUID, so that old version of libdevmapper will skip the UUID without
attempting to interpret it.
It's basically irrelavant which value we assing to optarg,
since it's set by getopt() function, but Coverity tool
is incorrectly reporting possibly dereference of NULL.
sysconf() may also return -1 although rather theoretically.
Default to 4K when such case would happen.
Also in function call it just once and keep as static variable.
Analyzer here was rather confused about possiblity of loosing previously
assigned device pointers - fixed by passing zero initialize memory
before first assign.
Ensure only nonNULL 'du' pointer is dereference altough the comment
to the last assign 'du' pointer already suggest 'NULL' case should not happen.
So just being explicit.
mer du
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
The cmd memory space is allocated by zalloc, and the registration
fails and is not released.
Although this code would be ever triggered just in the case
of some internal (likely compilation) bug.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
When building lvm2 in Gentoo/ChromeOS with the ASAN memory
sanitizer enabled, man-generator fails with the following
error. Initializing makes the error go away.
* SUMMARY: MemorySanitizer: use-of-uninitialized-value /build/amd64-generic/tmp/portage/sys-fs/lvm2-2.02.187-r3/work/LVM2.2.02.187/tools/man-generator.c:3316:6 in _include_description_file
* Exiting
* ASAN error detected:
* ==2548047==WARNING: MemorySanitizer: use-of-uninitialized-value
* #0 0x558b00ab4730 in _include_description_file /build/amd64-generic/tmp/portage/sys-fs/lvm2-2.02.187-r3/work/LVM2.2.02.187/tools/man-generator.c:3316:6
* #1 0x558b00ab4730 in _print_man /build/amd64-generic/tmp/portage/sys-fs/lvm2-2.02.187-r3/work/LVM2.2.02.187/tools/man-generator.c:3426:21
* #2 0x558b00ab4730 in main /build/amd64-generic/tmp/portage/sys-fs/lvm2-2.02.187-r3/work/LVM2.2.02.187/tools/man-generator.c:3570:7
* #0 0x7fa9b2cbb807 in find_derivation /var/tmp/portage/cross-x86_64-cros-linux-gnu/glibc-2.33-r8/work/glibc-2.33/iconv/gconv_db.c:583:15
* #1 0x558b00a29559 in ?? ??:0
*
* Uninitialized value was created by an allocation of 'statbuf.i.i' in the stack frame of function 'main'
* #0 0x558b00ab1d4d in main /build/amd64-generic/tmp/portage/sys-fs/lvm2-2.02.187-r3/work/LVM2.2.02.187/tools/man-generator.c:3505
It's not a good idea to change passed 'argv[]' and replace it with
pointers to local stack - although in this case we are not using
this argv[] after return from this function.
Mask for strncpy() Coverity report warning would
actually need to copy buffer from 'tmp_name' instead of 'str'.
But replace it directly with single 'strncpy()' again for better readbility,
just mask out the warning reported for this strncpy instance
(so we do not need to put comment fro every call of strcpy_name_len).
Looks like newer version of mkfs.ext4 consumes more 'real' disk space,
when formating virtual volumes - so double the size of thin-pool to
have enough backend chunks for provisioning.
Existing mechanism was not able to trace root volume issue.
Simplify the functionality by using simply using activated flag
and trace the dtree in reverse order.
When cache creation fails on table reload path, implemen more
advanced revert solution, that tries to restore state of LVM
metadata into is look before actual caching started.
Loading invalid MQ/SMQ policy settings table line cause immediate
rejection - to prevent such failure, automatically filter valid
settins before it is being uploaded by lvm2.
For invalid setting issue a warnning informing user how to remove
them.
This solution is used to keep running cached LVs that might had
been created in the past with invalid settings that have been actually
unused due to another code bug.
When generating table line for cache target line,
the estimation of added arguments was incorrectly
calculated as the evaluation order of "?" is
made after "+".
However the result was 'masked' by the
Reported-by: Jian Cai jcai19
New versions of kvdo module exposes statistics at new location:
/sys/block/dm-XXX/vdo/statistics/...
Enhance lvm2 to access this location first.
Also if the statistic info is missing - make it 'debug' level info,
so it is not failing 'lvs' command.
Do not call 'dmsetup info' for non-dm devices.
Better handling for VGNAME and LVNAME - so when convering LV as
backend device automatically recognize it and reuse LV name for VDOLV.
Add prompt for final confirmation before actual conversion is started
(once confirmed, lvm2 commands no longer prompts to avoid leaving
conversion in the middle of its process.)
Compilation needs to generate 'C' locale sorted command file
definitions. To always enforce 'C' sorting rules user LC_ALL
instead of LANG, as LANG settings can be overuled by
other LC settings like LC_COLLATE and may result into miscompiled
lvm2 binary if locales ordering differs from 'C'.
Reported-by: jmp-lvm2@ookaze.fr
Since VDO is always returns 'zero' on unprovisioned read
and every provisioned block is always 'zeroed' on partial writes,
we can avoid 'zeroing' of such LVs.
Some device id types can only be used with specific device major
numbers, so use this restriction to avoid some comparisions.
This is more efficient, and can avoid some incorrect matches.
pvid and vgid are sometimes a null-terminated string, and
other times a 'struct id', and the two types were often
cast between each other. When a struct id was cast to a char
pointer, the resulting string would not necessarily be null
terminated. Casting a null-terminated string id to a
struct id is fine, but is still avoided when possible.
A struct id is: int8_t uuid[ID_LEN]
A string id is: char pvid[ID_LEN + 1]
A convention is introduced to help distinguish them:
- variables and struct fields named "pvid" or "vgid"
should be null-terminated strings.
- variables and struct fields named "pv_id" or "vg_id"
should be struct id's.
- examples:
char pvid[ID_LEN + 1];
char vgid[ID_LEN + 1];
struct id pv_id;
struct id vg_id;
Function names also attempt to follow this convention.
Avoid casting between the two types as much as possible,
with limited exceptions when known to be safe and clearly
commented.
Avoid using variations of strcpy and strcmp, and instead
use memcpy/memcmp with ID_LEN (with similar limited
exceptions possible.)
- Use a new function for all instances of copying
a null-terminated string into a fixed size struct
field that is not null-terminated.
- use memcpy when copying between struct fields of
the same size
When multiple lvchange refresh processes executed at the same time,
suspend/resume ioctl on the same dm, some of these commands will be failed
for dm aready change status, and ioctl will return EINVAL in _do_dm_ioctl function.
to avoid this problem, add READ_FOR_ACTIVATE flags in lvchange refresh process,
it will hold LCK_WRITE lock and avoid suspend/resume dm at the same time.
Signed-off-by: Long YunJian <long.yunjian@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Fix memory leaks on error paths for allocated
path and backup_file name by converting allocation to
dm_pool_alloc and also change devicefile structure to contain
embeded path as last struct member - so we could allocate
only needed string size instead of PATH_MAX from pool.
TODO: still to be fixed 'mf' struct.
Recent commit 84bd394cf9
"writecache: use block size 4096 when no fs is found"
failed to account for the case where writecache is attached
to thin pool data. Checking fs block size on the thin pool
data LV is wrong, and checking the fs block on each thin LV
would be impractical, so default to 512 which cannot break
any existing file systems, and require the user to specify
4k when appropriate.
When splitting VG with thin/cache pool volume, handle pmspare during
such split and allocate new pmspare in new VG or extend existing pmspare
there and eventually drop pmspare in original VG if is no longer needed
there.
As pmspare is an invisible LV it's not getting automatically removed
since vgremove removes only visible LVs and it depending LVs.
If there was no other thin/cache pool volume, such pmspare stayed
undeleted and caused command failure.
So handle explicitelly such forgotten pmspare and remove it.
When check active componet of thinLV with external origin,
we need to check if the external origin isn't already active.
For this however we need to use layered check for -real device.
The lvm2-pvscan service runs pvscan --cache -aay <dev> for
device addition, and pvscan --cache <dev> on device removal.
For event_activation=0, the addition does nothing. Fix
device removal to also do nothing for event_activation=0.
Device removal was previously doing some work to process
the removal which slowed down stopping lvm2-pvscan services.
sysfs-based multipath component detection quit if a
device had multiple holders, and in this case would
fail to detect a device was an mpath component.
For the pv_min_size check, always use dev_get_size()
which is commonly used elsewhere, and don't bother
asking libudev for the device size when
external_device_info_source=udev.
related to config settings:
obtain_device_info_from_udev (controls if lvm gets
a list of devices from readdir /dev or from libudev)
external_device_info_source (controls if lvm asks
libudev for device information)
. Make the obtain_device_list_from_udev setting
affect only the choice of readdir /dev vs libudev.
The setting no longer controls if udev is used for
device type checks.
. Change obtain_device_list_from_udev default to 0.
This helps avoid boot timeouts due to slow libudev
queries, avoids reported failures from
udev_enumerate_scan_devices, and avoids delays from
"device not initialized in udev database" errors.
Even without errors, for a system booting with 1024 PVs,
lvm2-pvscan times improve from about 100 sec to 15 sec,
and the pvscan command from about 64 sec to about 4 sec.
. For external_device_info_source="none", remove all
libudev device info queries, and use only lvm
native device info.
. For external_device_info_source="udev", first check
lvm native device info, then check libudev info.
. Remove sleep/retry loop when attempting libudev
queries for device info. udev info will simply
be skipped if it's not immediately available.
. Only set up a libdev connection if it will be used by
obtain_device_list_from_udev/external_device_info_source.
. For native multipath component detection, use
/etc/multipath/wwids. If a device has a wwid
matching an entry in the wwids file, then it's
considered a multipath component. This is
necessary to natively detect multipath
components when the mpath device is not set up.
How to trigger:
```
~ # export LVM_SYSTEM_DIR=_
~ # pvscan
No matching physical volumes found
double free or corruption (!prev)
Aborted (core dumped)
```
when LVM_SYSTEM_DIR is empty, _load_config_file() won't be called.
when LVM_SYSTEM_DIR is not empty, cfl->cft links into cmd->config_files
by _load_config_file()@lib/commands/toolcontext.c
core dumped code: _destroy_config()@lib/commands/toolcontext.c
```
/* CONFIG_FILE/CONFIG_MERGED_FILES */
if ((cft = remove_config_tree_by_source(cmd, CONFIG_MERGED_FILES)))
config_destroy(cft);
else if ((cft = remove_config_tree_by_source(cmd, CONFIG_FILE)))
config_destroy(cft); <=== first free the cft
dm_list_iterate_items(cfl, &cmd->config_files)
config_destroy(cfl->cft); <=== double free the cft
```
Fixes: c43f2f8ae0
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
expands commit d5a06f9a7d
"pvscan: skip indexing devices used by LVs"
The dev cache index is expensive and slow, so limit it
to commands that are used to observe the state of lvm.
The index is only used to print warnings about incorrect
device use by active LVs, e.g. if an LV is using a
multipath component device instead of the multipath
device. Commands that continue to use the index and
print the warnings:
fullreport, lvmdiskscan, vgs, lvs, pvs,
vgdisplay, lvdisplay, pvdisplay,
vgscan, lvscan, pvscan (excluding --cache)
A couple other commands were borrowing the DEV_USED_FOR_LV
flag to just check if a device was actively in use by LVs.
These are converted to the new dev_is_used_by_active_lv().
Add tool 'vdoimport' to support easy conversion of an existing VDO manager managed
VDO volumes into lvm2 managed VDO LV.
When physical converted volume is already a logical volume, conversion
happens with the VG itself, just with validation for extent_size, so
the virtually sized logical VDO volume size can be expressed in extents.
Example of basic simple usage:
vdoimport --name vg/vdolv /dev/mapper/vdophysicalvolume
dev_cache_index_devs() is taking a large amount of time
when there are many PVs. The index keeps track of
devices that are currently in use by active LVs. This
info is used to print warnings for users in some limited
cases.
The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.
This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
There have been two separate checks for metadata
validity: first that the metadata text begins with
a valid VG name, and second the checksum of the
metadata text. These happen in different places,
which means there have been two separate error paths
for invalid metadata. This also causes large metadata
to be read in multiple parts, the first part is read
just to check the vgname, and then remaining parts are
read later when the full metadata is needed.
This patch moves the vg name verification so it's
done just before the checksum verification, which
results in a single error path for invalid metadata,
and causes the entire metadata to be read together
rather that in parts from different parts of the code.
If label_scan encounters bad vg metadata, invalidate
bcache data for the device and reread the mda_header
and metadata text back to back. With concurrent commands
modifying large metadata, it's possible that the entire
metadata area can be rewritten in the time between a
command reading the mda_header and reading the metadata
text that the header points to. Since the label_scan
is just assembling an initial overview of devices, it
doesn't use locking to serialize with other commands
that may be modifying the vg metadata at the same time.
Recent commit 84bd394cf9
"writecache: use block size 4096 when no fs is found"
changed the default writecache block size from 512 to 4096
when no file system is detected. The fs block size detection
requires the libblkid BLOCK_SIZE feature, so skip tests on
systems without this. Otherwise, 4096 writecache added to
512 xfs leads fs io or mount failures.
Add profilable configurable setting for vdo pool header size, that is
used as 'extra' empty space at the front and end of vdo-pool device
to avoid having a disk in the system the may have same data is real
vdo LV.
For some conversion cases however we may need to allow using '0' header size.
TODO: in this case we may eventually avoid adding 'linear' mapping layer
in future - but this requires further modification over lvm code base.
In testing where we inject large amounts of additional output in stderr
we can occassionally get truncated stdout from lvm. Catching and dumping
the json for debug before we re-raise the exception. As this doesn't
happen without the error injecting wrapper around lvm, the error seems to
be with the wrapper.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When exec'ing lvm, it's possible to get large amounts of both stdout
and stderr depending on the state of lvm and the size of the lvm
configuration. If we allow any of the buffers to fill we can end
up deadlocking the process. Ensure we are handling stdout & stderr
during lvm execution.
Ref. https://bugzilla.redhat.com/show_bug.cgi?id=1966636
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When we are walking the new lvm state comparing it to the old state we can
run into an issue where we remove a VG that is no longer present from the
object manager, but is still needed by LVs that are left to be processed.
When we try to process existing LVs to see if their state needs to be
updated, or if they need to be removed, we need to be able to reference the
VG that was associated with it. However, if it's been removed from the
object manager we fail to find it which results in:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/lvmdbusd/utils.py", line 666, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.6/site-packages/lvmdbusd/fetch.py", line 36, in _main_thread_load
cache_refresh=False)[1]
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 146, in load_lvs
lv_name, object_path, refresh, emit_signal, cache_refresh)
File "/usr/lib/python3.6/site-packages/lvmdbusd/loader.py", line 68, in common
num_changes += dbus_object.refresh(object_state=o)
File "/usr/lib/python3.6/site-packages/lvmdbusd/automatedproperties.py", line 160, in refresh
search = self.lvm_id
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 483, in lvm_id
return self.state.lvm_id
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 173, in lvm_id
return "%s/%s" % (self.vg_name_lookup(), self.Name)
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 169, in vg_name_lookup
return cfg.om.get_object_by_path(self.Vg).Name
Instead of removing objects from the object manager immediately, we will
keep them in a list and remove them once we have processed all of the state.
Ref:
https://bugzilla.redhat.com/show_bug.cgi?id=1968752
When execute IDM testing, the command reports error:
/usr/bin/install: cannot stat ‘lib/idm_inject_failure’: No such file
or directory
Since there have a stale program in my local environment, thus Makefile
always uses the stale program and doesn't report any issue. In the
brand new repository, it doesn't contain an idm_inject_failure program,
and Makefile doesn't build it without specifying the dependency, thus
the test command complaints the file 'idm_inject_failure' is not found.
This patch adds the dependency 'lib/idm_inject_failure' for IDM testing,
so it can firstly build the injection program and dismiss the error.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
When adding a device to the devices file with --adddev, lvm
by default chooses the best device ID type for the new device.
The new --deviceidtype option allows the user to override the
built in preference. This is useful if there's a problem with
the default type, or if a secondary type is preferrable.
If the specified deviceidtype does not produce a device ID,
then lvm falls back to the preference it would otherwise use.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
Instead of calling explicit archive with command processing logic,
move this step towards 1st. vg_write() call, which will automatically
store archive of committed metadata.
This slightly changes some error path where the error in archiving
was detected earlier in the command, while now some on going command
'actions' might have been, but will be simply scratched in case
of error (since even new metadata would not have been even written).
So general effect should be only some command message ordering.
As SUSE build tool reports the warning:
lvmlockd-core.c: In function 'client_thread_main':
lvmlockd-core.c:4959:37: warning: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size 6 [-Wformat-truncation=]
snprintf(buf, sizeof(buf), "path[%d]", i);
^~
lvmlockd-core.c:4959:31: note: directive argument in the range [0, 2147483647]
snprintf(buf, sizeof(buf), "path[%d]", i);
^~~~~~~~~~
To dismiss the compilation warning, enlarge the array "buf" to 17
bytes to support the max signed integer: string format 6 bytes + signed
integer 10 bytes + terminal char "\0".
Reported-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to test timeout handling after activate LV with shareable
mode. It has the same logic with the testing for LV exclusive mode,
except it verifies the locking with shareable mode.
On the host A:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_sh_timeout_hosta.sh
On the host B:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_sh_timeout_hostb.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to test timeout handling after activate LV with exclusive
mode. It contains two scripts for host A and host B separately.
The script on host A firstly creates VGs and LVs based on the passed
back devices, every back device is for a dedicated VG and a LV is
created as well in the VG. Afterwards, all LVs are activated by host A,
so host A acquires the lease for these LVs. Then the test is designed
to fail on host A.
After the host A fails, host B starts to run the paired testing script,
it firstly fails to activate the LVs since the locks are leased by
host A; after lease expiration (after 70s), host B can achieve the lease
for LVs and it can operate LVs and VGs.
On the host A:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_ex_timeout_hosta.sh
On the host B:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_ex_timeout_hostb.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to add LV testing on multi hosts. There have two scripts,
the script multi_hosts_lv_hosta.sh is used to create LVs on one host,
and the second script multi_hosts_lv_hostb.sh will acquire
global lock and VG lock, and remove VGs. The testing flow verifies the
locking operations between two hosts with lvmlockd and the backend
locking manager.
On the host A:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_hosta.sh
On the host B:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_lv_hostb.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to add VG testing on multi hosts. There have two scripts,
the script multi_hosts_vg_hosta.sh is used to create VGs on one host,
and the second script multi_hosts_vg_hostb.sh afterwards will acquire
global lock and VG lock, and remove VGs. The testing flow verifies the
locking operations between two hosts with lvmlockd and the backend
locking manager.
On the host A:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_vg_hosta.sh
On the host B:
make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3,/dev/sdl3 \
LVM_TEST_MULTI_HOST=1 T=multi_hosts_vg_hostb.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
If the IDM lock manager fails to access drives, might partially fail to
access drives (e.g. it fails to access one of three drives), or totally
fail to access drives, the lock manager should handle properly for these
cases. When the drives are partially failure, if the lock manager still
can renew the lease for the locking, then it doesn't need to take any
action for the drive failure; otherwise, if it detects it cannot renew
the locking majority, it needs ti immediately kill the VG from the
lvmlockd.
This patch adds the test for verification the IDM lock manager failure;
the command can be used as below:
# make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdp3,/dev/sdl3,/dev/sdq3 \
LVM_TEST_FAILURE=1 T=idm_ilm_failure.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
If the fabric is broken instantly and the partial drives connected on
the fabric disappear from the system. For this case, according to the
locking algorithm in idm, the lease will not lose since the half drives
are still alive so can renew the lease for the half drives. On the
other hand, since the VG lock requires to acquire the majority of drive
number, but half drives failure cannot achieve the majority, so it
cannot acquire the lock for VG and thus cannot change metadata for VG.
This patch is to add half brain failure for idm; the test command is as
below:
# make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdp3,/dev/sdo3 LVM_TEST_FAILURE=1 \
T=idm_fabric_failure_half_brain.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
If the fabric is broken instantly, the drives connected on the fabric
will disappear from the system. For worst case, the lease is timeout
and the drives cannot recovery back. So a new test is added to emulate
this scenario, it uses a drive for LVM operations and this drive is also
used for locking scheme; if the drive and all its associated paths (if
the drive supports multiple paths) are disconnected, the lock manager
should stop the lockspace for the VG/LVs.
And afterwards, if the drive recovers back, the VG/LV resident in the
drive should be operated properly. The test command is as below:
# make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdp3 LVM_TEST_FAILURE=1 \
T=idm_fabric_failure_timeout.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
When the fabric failure occurs, it will lose the connection with hosts
instantly, and after a while it can recovery back so that the hosts can
continue to access the drives.
For this case, the locking manager should be reliable for this case and
can dynamically handle this case and allows user to continue to use the
VG/LV with associated locking scheme.
This patch adds a testing to emulate the fabric faliure, verify LVM
commands for this case. The testing usage is:
# make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdo3,/dev/sdp3,/dev/sdp4 \
LVM_TEST_FAILURE=1 T=idm_fabric_failure.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
After the lvmlockd abnormally exits and relaunch the daemon, if LVM
commands continue to run, lvmlockd and the backend lock manager (e.g.
sanlock lock manager or IDM lock manager) should can continue to serve
the requests from LVM commands.
This patch adds a test to emulate lvmlockd failure, and verify the LVM
commands after lvmlockd recovers back. Below is an example for testing
the case:
# make check_lvmlockd_idm \
LVM_TEST_BACKING_DEVICE=/dev/sdo3,/dev/sdp3,/dev/sdp4 \
LVM_TEST_FAILURE=1 T=lvmlockd_failure.sh
Signed-off-by: Leo Yan <leo.yan@linaro.org>
When the drive failure occurs, the IDM lock manager and lvmlockd should
handle this case properly. E.g. when the IDM lock manager detects the
lease renewal failure caused by I/O errors, it should invoke the kill
path which is predefined by lvmlockd, so that the kill path program
(like lvmlockctl) can send requests to lvmlockd to stop and drop lock
for the relevant VG/LVs.
To verify the failure handling flow, this patch introduces an idm
failure injection program, it can input the "percentage" for drive
failures so that can emulate different failure cases.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to add the stress testing, which launches three threads,
one thread is for creating/removing PV, one thread is for
creating/removing VG, and the last one thread is for LV operations.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to add the stress testing, which launches two threads,
each thread creates LV, activate and deactivate LV in the loop; so this
can test for multi-threading in lvmlockd and its backend lock manager.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to add the stress testing, which loops to create LV,
activate and deactivate LV in the single thread.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Add checking for lvmlockd log, this can be used for the test cases which
are interested in the interaction with lvmlockd.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
For testing idm locking scheme, it's good to cleanup the idm context
before run the test cases. This can give a clean environment for the
testing.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
In current implementation, the option "LVM_TEST_BACKING_DEVICE" only
supports to specify one backing device; this patch is to extend the
option to support multiple backing devices by using comma as separator,
e.g. below command specifies two backing devices:
make check_lvmlockd_idm LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3
This can allow the testing works on multiple drives and verify the
locking scheme if can work as expected for multiple drives case. For
example, for Seagate IDM locking scheme, if a VG uses two PVs, every PV
is resident on a drive, thus the locking operations will be sent to two
drives respectively; so the extension for "LVM_TEST_BACKING_DEVICE" can
help to verify different drive configurations for locking.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
This patch is to introduce testing option LVM_TEST_LOCK_TYPE_IDM, with
specifying this option, the Seagate IDM lock manager will be launched as
backend for testing. Also add the prepare and remove shell scripts for
IDM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Calling clear_hint_file() to invalidate hints would acquire
the hints flock before the global flock which could cause deadlock.
The lock order requires the global lock to be taken first.
pvchange was always invalidating hints, which was unnecessary;
only invalidate hints when changing a PV uuid. Because of the
lock ordering, take the global lock before clear_hint_file which
locks the hints file.
The macro LOCKDIDM_SUPPORT is missed in configure.h.in file, thus when
execute "configure" command, it has no chance to add this macro in the
automatic generated header include/configure.h.
This patch adds macro LOCKDIDM_SUPPORT into configure.h.in.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
To allow the IDM locking scheme be used by users, this patch hooks the
IDM wrapper; it also introducs a new locking type "idm" and we can use
it for global lock with option '-g idm'.
To support IDM locking type, the main change in the data structure is to
add pvs path arrary. The pvs list is transferred from the lvm commands,
when lvmlockd core layer receives message, it extracts the message with
the keyword "path[idx]". Finally, the pv list will pass to IDM lock
manager as the target drives for sending IDM SCSI commands.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Alongside the existed locking schemes of DLM and sanlock, this patch is
to introduce new locking scheme: In-Drive-Mutex (IDM).
With the IDM support in the drive, the locks are resident in the drive,
thus, the locking lease is maintained in a central place: the drive
firmware. We can consider this is a typical client-server model,
every host (or node) in the server cluster launches the request for
leasing mutex to a drive firmware, the drive firmware works as an
arbitrator to grant the mutex to a requester and it can reject other
applicants if the mutex has been acquired. To satisfy the LVM
activation for different modes, IDM supports two locking modes:
exclusive and shareable.
Every IDM is identified with two IDs, one is the host ID and another is
the resource ID. The resource ID is a unique identifier for what the
resource it's protected, in the integration with lvmlockd, the resource
ID is combined with VG's UUID and LV's UUID; for the global locking,
the bytes in resource ID are all zeros, and for the VG locking, the
LV's UUID is set as zero. Every host can generate a random UUID and
use it as the host ID for the SCSI command, this ID is used to clarify
the ownership for mutex.
For easily invoking the IDM commands to drive, like other locking
scheme (e.g. sanlock), a daemon program named IDM lock manager is
created, so the detailed IDM SCSI commands are encapsulated in the
daemon, and lvmlockd uses the wrapper APIs to communicate with the
daemon program.
This patch introduces the IDM locking wrapper layer, it forwards the
locking requests from lvmlockd to the IDM lock manager, and returns the
result from drives' responding.
One thing should be mentioned is the IDM's LVB. IDM supports LVB to max
7 bytes when stores into the drive, the most significant byte of 8 bytes
is reserved for control bits. For this reason, the patch maps the
timestamp in macrosecond unit with its cached LVB, essentially, if any
timestamp was updated by other nodes, that means the local LVB is
invalidate. When the timestamp is stored into drive's LVB, it's
possbile to cause time-going-backwards issue, which is introduced by the
time precision or missing synchronization acrossing over multiple nodes.
So the IDM wrapper fixes up the timestamp by increment 1 to the latest
value and write back into drive.
Currently LVB is used to track VG changes and its purpose is to notify
lvmetad cache invalidation when detects any metadata has been altered;
but lvmetad is not used anymore for caching metadata, LVB doesn't
really work. It's possible that the LVB functionality could be useful
again in the future, so let's enable it for IDM in the first place.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
When the device is not a PV print
"No PV found on device ..."
instead of
"Failed to read lvm info for ... PVID ."
an earlier check had been added with a different
message for the same condition.
error reading dev and no pvid on dev were both
returning 0. make it easier for callers to
know which, if they care.
return 1 if the device could be read, regardless
of whether a pvid was found or not.
set has_pvid=1 if a pvid is found and 0 if no
pvid is found.
While we heavily try to spot arrays that are not yet in-sync,
some kernels tends to block our lvm2 command in kernel,
while we resume these smaller raid arrays even for 5 seconds.
But since the result is not really wrong - report these
check failures only as TEST WARNING.
Missed -l option in man page, although users should prefer
lvresize -r when the also want to do a volume management,
as there they can specify i.e. extents for allocation.
Also mention dm-crypt support in command description.
If the 'act' has been already processed by add_client_result()
it could have been possibly release - so avoid accessin 'act->'
afterward and go for next item directly.
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path not match.
We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
With commit 0b18c25d93 there
was introduced 'zalloc()' for allocation of outdates pvs,
but no matching 'free()' is present.
Switch to use cmd mempool instead of adding free() code into
several places.
If a cmd def implies an LV type without --type
in the required options, then include the implied
type in the cmd def as AUTOTYPE: <type>
instead of including the redundant --type foo
in the OO list of options.
Including an implied --type in the OO list would
often cause multiple cmd defs to potentially be
identical when options were used, and a user
command could match more than one cmd def.
The AUTOTYPE values are listed in man page and
help output as
[ --type foo (implied) ]
If a user command includes --type, it will usually
match a cmd def with --type in the required options.
But, if the user command matches a cmd def with
AUTOTYPE, then the specifed --type and AUTOTYPE must
match.
The man-generator program has a new --check
option that compares cmd defs to find any cmd defs
that are equivalent with the use of options,
and should have their options adjusted.
Compares cmd defs based on two principles for avoiding repeated
commands (where a given command could match more than one cmd def):
. a cmd def should be a unique combination of required
option args and position args
. avoid adding optional options to a cmd def that if
used would make the command match a different cmd def
FIXME: record when repeated cmd defs are found so we can
avoid reporting them twice, e.g. once for A vs B and
second time for B vs A.
Correcting some usage of Bold and Italics (files).
Adding some missing SEE ALSO.
Fixing missed replaceable paths that are configurable.
Be careful about .P in .TP sections - need to use .sp for space line.
Use .UR/.UE for URL references.
Use #DEFAULT_SYS_DIR# replaceable string for devicesfile
so the man pages installation respects configured settings.
Update some missing lvm.conf(5) references.
Add missing VG into description of thin pool creation command.
Remove one duplicated thin-pool creation command.
Remove options --discards and --errorwhenfull from the list when the command describes
only creation of a thin volume - as these options do apply for thin-pool.
Also use here more correct name OO_LVCONVERT_THINPOOL instead of OO_LVCONVERT_THIN.
Reorder extra options for cache & thin-pool before common pool options.
Order consistenly --stripes and --stripesize after --extents option
so the options related to pools are better together.
Remove invalid snapshot creation description - since this case is
handled through our configurable spare volume creation.
Add some missing optional --type parameters for few command instancies.
Emit .ad l / .ad b less frequently around larger blocks
we want to keep left aligned.
Avoid emittting empty lines.
Reduce .HP usage and replace it with .TP.
However keep .HP for all option listings, as i.e. html rendering
can't handle well combintion of .TP an .HP together and .TP alone
is not indenting 2nd. line of long option line.
(For .TP line we don't need to emit .br)
Surround .SH with dots for better look.
For some .TP use plain more readable .I for a line.
Support rendering of optional [Number] (for --units).
Use better markup for units and instead of long markup string,
show individual units with markup.
Enhance man typography decoration of optional option
prefixes like --[raid]writebeind and use regular font to render []
as these are not part of the option name itself.
Sed replacements script missed to properly replace several '-' to '\-'.
Replace it with simpler set of regexes.
Also add new target 'make checksed' for testing with examples,
where the replacement should or should not occure for easier testing.
Previously, accepted LV types were presented as a series of suffixes
after the "LV" on the command line. The addition of many new types
resulted in this becoming too long, e.g
lvconvert --type cache --cachepool LV LV_linear_striped_thinpool_vdo_vdopool_vdopooldata_raid
For man pages, move these types from the command line to a new line
dedicated to listing accepted LV types:
lvconvert --type cache --cachepool LV LV1
...
LV1 types: linear striped thinpool vdo vdopool vdopooldata raid
The special "LV1" is used as a reference to avoid confusion
with other LVs that may appear on the command line. There
are currently no commands with more than one typed LV, but
if there are cases with more, then "LV2" could also be used.
For command line usage/-h output, drop the LV types from the
command line specification. The more detailed is not needed
in the help output and can be found in the man page.
With to use .TP where it's easy and doesn't change layout
(since .HP is marked as deprecated) - but .TP is not always perfetc match.
Avoid submitting empty lines to troff and replace them mostly with .P
and use '.' at line start to preserve 'visual' presence of empty line
while editing man page manually when there is no extra space needed.
Fix some markup.
Add some missing SEE ALSO section.
Drop some white-space at end-of-lines.
Improve hyphenation logic so we do not split options.
Use '.IP numbers' only with first one the row (others in row
automatically derive this value)
Use automatic enumeration for .SH titles.
Guidelines in-use:
https://man7.org/linux/man-pages/man7/groff.7.htmlhttps://www.gnu.org/software/groff/manual/html_node/Man-usage.htmlhttps://www.gnu.org/software/groff/manual/html_node/Lists-in-ms.html
This reverts commit 8e7690b798.
Actully this was bad idea - to make it on pair.
-Zn for thin-pools is already used - so here user must have
create new pool and swap existing thin-pool metadata into.
So reverting this commit to avoid any possible regression.
It would be complicated to handle ',' alignment after hyphenation
changes ATM, but these commas seems to be there rather unneeded
so remove them and make the man output more clear.
Disable hyphenation around longer option lists (>42 chars)
and use \: to markup places for line splits.
The code ATM is somewhat mixtured so it's not easy to encapsulate
section .nh ... .hy.
ATM global _was_hyphen is used to properly finish sections after
disabled hyphenation.
When --with-... option is used as --without-... it gets
assigned value 'no' - so support it better where we can.
Also remove 'shared' from help as it's not supported.
The autoactivation property can be specified in lvcreate
or vgcreate for new LVs/VGs, and the property can be changed
by lvchange or vgchange for existing LVs/VGs.
--setautoactivation y|n
enables|disables autoactivation of a VG or LV.
Autoactivation is enabled by default, which is consistent with
past behavior. The disabled state is stored as a new flag
in the VG metadata, and the absence of the flag allows
autoactivation.
If autoactivation is disabled for the VG, then no LVs in the VG
will be autoactivated (the LV autoactivation property will have
no effect.) When autoactivation is enabled for the VG, then
autoactivation can be controlled on individual LVs.
The state of this property can be reported for LVs/VGs using
the "-o autoactivation" option in lvs/vgs commands, which will
report "enabled", or "" for the disabled state.
Previous versions of lvm do not recognize this property. Since
autoactivation is enabled by default, the disabled setting will
have no effect in older lvm versions. If the VG is modified by
older lvm versions, the disabled state will also be dropped from
the metadata.
The autoactivation property is an alternative to using the lvm.conf
auto_activation_volume_list, which is still applied to to VGs/LVs
in addition to the new property.
If VG or LV autoactivation is disabled either in metadata or in
auto_activation_volume_list, it will not be autoactivated.
An autoactivation command will silently skip activating an LV
when the autoactivation property is disabled.
To determine the effective autoactivation behavior for a specific
LV, multiple settings would need to be checked:
the VG autoactivation property, the LV autoactivation property,
the auto_activation_volume_list. The "activation skip" property
would also be relevant, since it applies to both normal and auto
activation.
Switch to plain 'kill' we should no longer need SIGKILL
as polling can be interrupted.
Resolve problem in aux wait_pvmove_lv_ready() that was using
lvm command to check for UUID - but this was interferring with
VG lock and it's been delaying confirmation.
So reducing slow-down of test - so it can run faster.
Enhance handling of interruptions of polling process and lvmpoll daemon.
Daemon should now react much faster on interrups (i.e. shutdown
sequence) and avoid taking lenghty sleep waiting on pvmove signaling.
If we are signaled with SIGTERM it should be at least as good
as with SIGINT - as the command should stop ASAP.
So when lvm2 command allows signal handling we also
enable SIGTERM handling. If there are some other signals
we should handle equally - we could just extend array.
Avoid emitting Local symbol and sort symbols from
start and add dependency on previous version
Should not change anything, just better followup
linkage guidlines.
Add missing description for profile usage with cache pool.
List cache-pools as first option for dm-cache as it provides
better performance and more functionality over cachevols.
Not all libc (like musl, uclibc dietlibc) libraries support full symbol
version resolution in runtime like glibc.
Add support to not generate symbol versions when compiling against them.
Additionally libdevmapper.so was broken when compiled against
uclibc. Runtime linker loader caused calling dm_task_get_info_base()
function recursively, leading to segmentation fault.
Introduce --with-symvers=STYLE option, which allows to choose
between gnu and disabled symbol versioning. By default gnu symbol
versioning is used.
__GNUC__ check is replaced now with GNU_SYMVER.
Additionally ld version script is included only in
case of gnu option, which slightly reduces output size.
Providing --without-symvers to configure script when building against
uclibc library fixes segmentation fault error described above, due to
lack of several versions of the same symbol in libdevmapper.so
library.
Based on:
https://patchwork.kernel.org/project/dm-devel/patch/20180831144817.31207-1-m.niestroj@grinn-global.com/
Suggested-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Looks like there was some missed versioning increase during devel.
So with kernel >= 4.18 version 1.19 is enough to look like 1.20
However backported 1.19 targets seems to not provide all
the capabilities.
Test has to use PV with suffix pv[0..9] otherwise
it's ignored by test suite filter.
Better fix for VG names to use prefix LVMTEST.
Skip the test for runs without LVM_TEST_DEVDIR != /dev
Always use PREFIX for vg header - all tests must use this prefix,
VGs without are not allowed.
Modify pv_symlink test - as the test was checking unsupportable
combination - since lvm2 commands withing testsuite are only
allowed to manipulate with /dev/mapper/LVMTESTXXXX path -
nothing else allowed and fails on being filtered.
Added comment the 'lvs' already initiates dmeventd
Note: we don't have any query mechanism to check if dmeventd
is already running except access of socket which basically
starts dmeventd if it's not running.
For determinist test results lvm2/dm service shall not be present
and running in the system as it may randomize test results.
In case they are found present, this test ends with warning (not failure).
Some older instancies of 'mdadm' opened legs in RW and
closed and opened again and expected exlusive access.
But here udev rule can be fired - so on these versions
slow down whole mdadm runtime by using strace, to
give system a bit more time to finish udev rule.
Make check_lvmpolld_init_rq_count() more compatible with older gawk,
where some functionality was not working properly.
Also change 'not not' condition.
Fsadm wants to print its own error message when it can't detect
type of the filesystem on a block device.
Otherwise fsadm exits with no message on an unused block device.
When blocksize --getsize64 gives empty result we want to fallback
to ancient --getsize * --getss calculation (RHBZ #1942486).
Reported by: ajschorr@alumni.princeton.edu
Just like lvm command ignores 0/xxxx report from judging the status.
Avoid using infinite loop and limit report checking to 100 checks.
If it would need more - something is not right.
Our tests may result in producation of huge set of
invalid links in /dev/disk directory depeding on version
of udev and various kinds of failures.
Also we happen to invoke some on-system pvscans generating
local /etc/lvm/archive & backups - remove them when
test is finished.
Try to synchronize with colliding udev.
Also retry once if there is some failure with some
sleep between next retry.
Use oflag=direct for wipping without wipefs.
Combination of throttling and slowed device is a bit faster.
Also add FIXME about the mutliple spawn polling processing
when activating invidual LV for a pvmove.
Use for testing new mdadm_create aux wrapper.
Place functionality into a 2 pass loop - one for 'auto' other for 'start'.
Share same tests between raid level 0 and level 1 version of raid.
Add generic wrapper for mdadm --create which takes
normal 'mdadm' args - but allows us to handle differences of
mdadm usage across various version of mdadm tool.
Resulting MD device is availalble in $(< MD_DEV).
Automatic cleaning is made through cleanup_md_dev
Calling of mdadm_create cleans previous MD dev if it exists.
When testing installed binaries on system, use more 'built-in'
predefined settings to usethem with their compiled-in values.
Also it's better to use same locking dir so the system's pvscan
is not unexpectedly interferring with test commands.
We have here some kind of catch-22 - since older kernels do
use 'resync' while new 'recover' for initial raid synchronization.
So now - how do we recognize in which state of raid we are.
ATM seems to be simplest to simply keep disabled droping of primary raid
leg unless we are in sync.
FIXME: we should add a target version check and enable removal
Function is not having the best name since it does check
no just raid LVs to be in sync.
Restore the mirror percentage checking - although without retries,
since only raid target is currently known to need it - for other
types it would be ATM a bug to get inconsistent result.
Our new faster deps generation missed support for
buildirs != srcdir - as it can be usable to have
several builds from unchanged directory with sources.
Whiel waiting for raid to return consistent status,
use interruptible sleep - so command can break quickly.
Use lv_raid_status() to get percentage easily from status.
Always shift created virtual PVs on backing device by 1MiB
and leave 1MiB free space at the end of device.
This way the system doesn't see same PV headers at multiple devices.
Since lvm does support external users of thin-pool when thin devices
are managed outside it can be useful to support conversion to
thin pool from data and metadata LV without zeroing.
TransactionID will be 0 in lvm2 metadata.
lvconvert -Zn --thinpool vg/data --poolmetadata vg/meta
Renables usage of --type zero and --type error LVs to serve as
backend for _tdata device. Clearly not very useful in practice,
as it can't store any real data, but usable for some testing
and some sort of perfomance checking.
lvcreate --type zero -L1T -n pool vg
lvconvert --thinpool vg/pool
Will create a thin-pool with zero device backend.
Enabled extension/mixing of stripes/linears, error and zero
segtype LVs with stripes/linear, error and zero segtypes.
It is not very useful in practice, as the user cannot store any real
data on error or zero segtypes, but it may get some uses in
some scenarios where i.e. some portion of the device should not be
readable. Mixing of types happens on 'extent_size' level:
lvcreate -L1 -n lv vg
lvextend --type error -L+1 vg/lv
lvextend --type zero -L+1 vg/lv
lvextend --type linear -L+1 vg/lv
lvextend --type striped -L+1 vg/lv
lvs -o+segtype,seg_size vg
Note: when the type is not specified, the last segment type is
automatically selected.
It's also a small 'can of worms' since we can't tell LVs if
the LV is linear/error/zero or their mixtures. So the meaning behind
them may need some updates.
We already have this types of LV created i.e by:
vgreduce --removemissing --force
where missing LV segments have been replaced by either
error or zero segtype (lvm.conf).
TODO: it might be worth adding a message while such device is activated.
Add some extra code to handle differently sized thin-pool
from thin-pool data volume.
ATM this can't really happen, but once we start to use multiple
commits while resizing stacked LV, we may actually get into
the position, where data LV has been already resized,
but thin-pool stayed with old size.
But for now - report difference as internal error.
Devices made only from 'error' target cannot be used,
but if the device is also combined from 'zero' target
the same rule can be applied as such device cannot be used.
Just like with other segtype use this function to get whole
raid status info available per a single ioctl call.
Also it nicely simplifies read of percentage info about
in_sync portion of raid volume.
TODO: drop use of other calls then lv_raid_status call,
since all such calls could already use status - so it just
adds unnecessary duplication.
Reduce ioctl count and avoid separate info check,
when we can get the same info from status ioctl.
When devmanager calls return 0, then the exists value 0
means the reason of failure is missing device in table.
In such case we avoid stack trace.
Swap the flush parameter for the vdo status function
to match thin pool status.
Adding full filesystem sync, trying to fight with strange error from losetup:
losetup: loopa: failed to set up loop device: Resource temporarily unavailable
loop0: detected capacity change from 0 to 4096
loop_set_block_size: loop0 () has still dirty pages (nrpages=13)
Also reuse internal aux wipefs_a
When multiple polling tasks are watching for same LV, clearly
when some of them wins the game - other polling tasks will fail.
Improve the logic and report success if the merged LV is
actually not a merging origin anymore (since likely someone
else has already finished merging).
Although we support '0' interval - it's highly inefficent to
do so many scans in busy-loop.
So ATM raise minimal rescan time to 100ms.
TODO: revisit whole timing logic here as it does have some sideeffect
hiddent impact and can considerably eat CPU in some cases.
Reuse similar 'acceleration' as used for dependent volumes also
for snapshot - so when origin is being removed with all thick
snapshots, don't bother with individual 'COW' detachments
and write&commits, and when possible handle this all within
a single commit.
Move code for prompting about removed LV to a single function
and use it also to prompt for removal of origin and all its thick
snapshots and also when removing merging origin.
Function does handle postponed write_and_commit so there is
no 'in-flight' operation while waiting on [y|n] answer.
Compat code and handle unusual case, where
thin snapshot is also a 'thick snapshot origin' and such
snapshot gets merged into a thin origin.
However since now lv_is_visible() (which is complex function)
replaced &VISIBLE_LV check, the whole this check seems to be
no longer useful as sum of all 3 will always match??
Since cached LV is going to be removed together with its cache,
there is not much to gain if we try to flush cache first.
User may use 'vgcfgrestore' to get back origin + cache.
Assuming user is not using issue_discards.
When data are discarded after remove there is nothing to restore!
This change allows to futher reduce number of commits
during lvremove/vgremove.
This modification supports dynamically obtaining the value of PAGE_SIZE,
which is compatible with systems with PAGE_SIZE of (4K/64K)
Signed-off-by: wuguanghao <wuguanghao3@huawei.com>
There is really no practical reason to continue running
when we fail on allocation.
It seems we may need further fine frained errors, as for
some error type we simply need to exit ASAP, while
others may still produce usable results.
When lvremove/vgremove removes thin volumes with its thin-pool as well,
try to skip any updates of such thin-pool, so when everything properly
deactivates, there is no message send to this thin-pool and whole
thin-pool is removed with a single commit.
When generating list of processed LV, add thin-pool to the head of the
list, while other LVs are added on tail.
This makes it easier when removing many thin volumes, to recognize easily
when its thin-pool is also supposed to be removed.
Returning NULL for lv_committed is basically instant crash,
so instead try with passed LV instead.
It shouldn't matter as this is internall error path anyway,
but coverity should be happier.
Another step towards better automatic handling of backup,
and automatically setup needs_backup after commit.
In some next step we should reduce number of backups and takem
then only at the command finish with vg_committed content.
The correct test needs to actually check 'lv->snapshot' is not NULL,
so the 'find_snapshot()' can work.
Test lv_is_snapshot was actually irrelavant for this case.
Also initialize device_id.
With commit b44db5d1a7
needs to check allocated pointer for failed malloc().
Existing check was actually no checking anything so failing
malloc here would result in segfault (although with very
low chance to ever happen).
Match VG uuid just once per list of all LVs in VG.
TODO: maybe some more efficeint tree or hash could be better here,
but since it's used not so often, the total benefit is not so great,
so ATM just reducing amount of checked bytes.
Add Bob Jenkins hash function to get better working hash function,
which does genarate way less colisions (especially with similar
strings).
For a comparision also a kernel function used in DM kernel is include.
While it's better then our existing one, it's still far worse,
then Bob Jenkins hash.
Enhance hash perfomance by remembering the computed
hash value for the hashed node - this allows to speedup
lookup of nodes with the hash collision when
different keys map to the same masked hash value.
For the easier use 'num_slots' become 'mask_slots',
so we only add '1' in rare case of need of the original
num_slots value.
Also add statistic counters for hashes and print stats in
debug build (-DDEBUG) on hash wiping.
(so badly performing hash can be optimized.)
Use different 'hint' size for dm_hash_create() call - so
when debug info about hash is printed we can recognize which
hash was in use.
This patch doesn't change actual used size since that is always
rounded to be power of 2 and >=16 - so as such is only a
help to developer.
We could eventually use 'name' arg, but since this would have changed
API and this patchset will be routed to libdm & stable - we will
just use this small trick.
Just like with deactivation, call of 'lv_is_not_in_use()'
now has embeded report for inactivate LV.
Note: this patch cannot be backported to stable-2.02 - as
there lv_is_active() has 'cluster' meaning and differs from lvinfo().
When LV is deactivativate, we check for presence, and later
for some LV types also for being in use.
We can however do this check in 1 step for them a remove extra ioctl.
Add return value '2' to lv_check_not_in_use() to recognize LV is not
present.
Existing users were just testing for 0, so no change for them.
When parsing VG metadata we can create from a single config tree
also 'vg_committed' that is always created for writable VG.
This avoids extra uncessary step of serializing and deserilizing
just parsed VG.
Every vg_write stores new 'metadata' into precommitted slot.
For this step we use 'serialized buffer' to ascii metadata.
Instead of recreating this buffer after whole 'vg_write()' we
use this buffer instantly for creating of precommitted VG.
This has also the advantage of catching any problems with
reparsing of ascii metadata back to VG early before any write.
This patch postpones update of lvm metadata for each removed
LV for later moment depending on LV type.
It also queues messages to be printed after such write & commit.
As such there is some change in the behavior - although before
prompt we do make write&commit happens automatically in some
other error case we rather keep 'existing' state - so there
could be difference in amount of removed & commited LVs.
IMHO introduce logic is slightly better and more save.
But some cases still need the early commit - i.e. thin-removal
and fixing this needs some more thinking.
TODO: improve removal at least with the case of the whole thin-pool.
i.e. we can simply recognize removal of 'all LVs/whole VG'.
with fork and exec to avoid use of shell.
largely copied from lib/misc/lvm-exec.c
require lvmlockctl_kill_command to be full path
use lvm config instead of lvmconfig to avoid need for LVM_DIR
Make the generic "device is not usable" message from filter-usable
more specific in case the device is not usable because it's an LV.
(i.e. when scan_lvs=0)
When 'lv_info()' is called with &info structure,
the presence of node has to be checked from this structure.
Without this we were needlesly trying to look out 0:0 device.
When lvm2 calls archive() or backup() it can be useful to allow handling
break signal so the command can be interrupted at some consistent point.
Signal is accepted during processing these calls - and can be evaluated
later during even lengthy processing loops.
So now user can interrupt lengthy lvremove().
Taking backup with each removed LV is slowing down the process
considerable and is largerly uneeded. We are supposed to take
backup only on significant points and making sure the backup
is correct when the command is finished.
TODO: check how many other commands can be improved.
Use 'C' for alphasort - there is no need to use localized and slower
sorting for internal directory scanning.
Ensure on all code paths allocated dirent entries are released.
Optimize full path construction.
From commit 29abba3785 we have hopefully
fixed most of troubles for deps tracking we had in past - so retry
again.
Drop explicit configure.h from DEPS - as it's automatically gathered
by gcc dependency tracking anyway.
Drop the comment "This setting is no longer used." which
was printed just before the standard deprecation comment:
"This configuration option is deprecated."
When lvmconfig --typeconfig full printed a deprecated
entry it would attempt to print a non-existing
deprecation comment resulting in output like:
# (null) # This setting is no longer used.
This reverts commit 99b6173f10.
These tests are disabled with lvmlockd because they use
snapshots without an origin which is not permitted in a
shared vg.
user creates a file listing real devices they want
lvm tests to use, and sets LVM_TEST_DEVICE_LIST.
lvm tests can use these with prepare_real_devs
and get_real_devs.
Other aux functions do not work with these devs.
The LVM devices file lists devices that lvm can use. The default
file is /etc/lvm/devices/system.devices, and the lvmdevices(8)
command is used to add or remove device entries. If the file
does not exist, or if lvm.conf includes use_devicesfile=0, then
lvm will not use a devices file. When the devices file is in use,
the regex filter is not used, and the filter settings in lvm.conf
or on the command line are ignored.
LVM records devices in the devices file using hardware-specific
IDs, such as the WWID, and attempts to use subsystem-specific
IDs for virtual device types. These device IDs are also written
in the VG metadata. When no hardware or virtual ID is available,
lvm falls back using the unstable device name as the device ID.
When devnames are used, lvm performs extra scanning to find
devices if their devname changes, e.g. after reboot.
When proper device IDs are used, an lvm command will not look
at devices outside the devices file, but when devnames are used
as a fallback, lvm will scan devices outside the devices file
to locate PVs on renamed devices. A config setting
search_for_devnames can be used to control the scanning for
renamed devname entries.
Related to the devices file, the new command option
--devices <devnames> allows a list of devices to be specified for
the command to use, overriding the devices file. The listed
devices act as a sort of devices file in terms of limiting which
devices lvm will see and use. Devices that are not listed will
appear to be missing to the lvm command.
Multiple devices files can be kept in /etc/lvm/devices, which
allows lvm to be used with different sets of devices, e.g.
system devices do not need to be exposed to a specific application,
and the application can use lvm on its own set of devices that are
not exposed to the system. The option --devicesfile <filename> is
used to select the devices file to use with the command. Without
the option set, the default system devices file is used.
Setting --devicesfile "" causes lvm to not use a devices file.
An existing, empty devices file means lvm will see no devices.
The new command vgimportdevices adds PVs from a VG to the devices
file and updates the VG metadata to include the device IDs.
vgimportdevices -a will import all VGs into the system devices file.
LVM commands run by dmeventd not use a devices file by default,
and will look at all devices on the system. A devices file can
be created for dmeventd (/etc/lvm/devices/dmeventd.devices) If
this file exists, lvm commands run by dmeventd will use it.
Internal implementaion:
- device_ids_read - read the devices file
. add struct dev_use (du) to cmd->use_devices for each devices file entry
- dev_cache_scan - get /dev entries
. add struct device (dev) to dev_cache for each device on the system
- device_ids_match - match devices file entries to /dev entries
. match each du on cmd->use_devices to a dev in dev_cache, using device ID
. on match, set du->dev, dev->id, dev->flags MATCHED_USE_ID
- label_scan - read lvm headers and metadata from devices
. filters are applied, those that do not need data from the device
. filter-deviceid skips devs without MATCHED_USE_ID, i.e.
skips /dev entries that are not listed in the devices file
. read lvm label from dev
. filters are applied, those that use data from the device
. read lvm metadata from dev
. add info/vginfo structs for PVs/VGs (info is "lvmcache")
- device_ids_find_renamed_devs - handle devices with unstable devname ID
where devname changed
. this step only needed when devs do not have proper device IDs,
and their dev names change, e.g. after reboot sdb becomes sdc.
. detect incorrect match because PVID in the devices file entry
does not match the PVID found when the device was read above
. undo incorrect match between du and dev above
. search system devices for new location of PVID
. update devices file with new devnames for PVIDs on renamed devices
. label_scan the renamed devs
- continue with command processing
To better test actually fsadm in test suite - avoid setting
LVM_BINARY locally - since test setup already modifies
PATH to find test's lvm binary as the 1st. in path.
User use 'lvconvert -Zn --type vdo-pool' to convert an existing
vdo formated volume and skip lvm2 internal formating.
This however requires user is passing proper matching parameters.
For them user can use --profile|--metadataprofile option whos
support has been also enhanced.
TODO: add support to read values directly from formated volume.
In some cases we use 'creation' also during conversion.
Here it can be actually unwanted side effect we may remove
not just newly created layers - but also original converted LV.
So until we make clear how to properly revert from some errors
in middle of conversion, disable removal for any 'lvconvert' commands.
When lvdisplay was executed and thin snaphost has be merged to
thin origin and the operation has been postponed till devices
are closed, command crashed.
Check LV is COW before trying to check snapshot percentage.
When converting an existing LV to thin-pool,
user may now pass also '--errorwhenfull' option
like with 'lvcreate'.
Also recalculate chunksize when performace profile is
used with conversion (again matching lvcreate).
Adds missing flagging for uncropped metadata sizes.
Fix clearing persistent filter state when clearing all
the state from a label_scan.
label_scan reads devs and saves info in bcache, lvmcache,
and in the persistent filter. In some uncommon cases, an
lvm command wants to clear all info from a prior label_scan,
and repeat label_scan from scratch. In these cases, info
in lvmcache, bcache and the persistent filter all need to
be cleared before repeating label_scan.
By missing the persistent filter wiping, outdated persistent
filter info, from a prior label_scan, could cause lvm to
incorrectly filter devices that change between polling intervals.
(i.e. if the device changes in such a way that the filtering
results change.)
A case where lvm wants to do multiple label_scans is a
polling command (like lvconvert --merge), when lvmpolld
has been disabled, so that the command itself needs to
to do repeated polling checks.
Automatically figure out resizable layer in the LV stack and
resize it online.
Split check for reshaped raids and postpone removal of
unused space after finished reshaping after metadata archiving.
Drop warning about unsupported automatic resize of monitored thin-pool.
Currently there is not yet support for resize of writecache.
In past we had this control with use_lvmetad check for
pvscan --cache -aay
Howerer this got lost with lvmetad removal commit:
117160b27e
When user sets lvm.conf global/event_activation=0
pvscan service will no longer auto activate any LVs on appeared PVs.
Move extra md component detection into the label scan phase.
It had been in set_pv_devices which was deep within the vg_read
phase, which wasn't a good place (better to detect that earlier.)
Now that pv metadata info is available in the scan phase, the pv
details (size and device_hint) can be used for extra md checking.
Use the device_hint from the pv metadata to trigger a full md
component check if the device_hint begins with /dev/md.
Stop triggering full md component checks based on missing
udev info for a dev.
Changes to tests to reflect that the code is now detecting
md components in some test case that it wasn't before.
A cachevol can be forcibly detached when it's missing devices.
Also allow this if it's damaged/invalid and unrepairable.
This would be needed to recover data from the origin LV after
a cachevol is lost or damaged beyond repair.
In cases where lvconvert does not detect a fs block size on the
device, it falls back to choosing a writecache block size based
on the device's LBS and PBS (tries to match those.)
If the user specifies a writecache block size on the command
line (--cachesettings block_size=4096|512), lvconvert currently
fails and reports an error if the user-specified value does not
match the value lvconvert would have chosen based on LBS and PBS.
The purpose of allowing a user-specified value on the command line
is to override what lvconvert would otherwise do, so change this
to just print a warning that the user value does not match the
value that would be chosen based on the LBS/PBS, and then take
the user-specified value as the writecache block size.
Current allocation limitation requires to fit metadata/log LV on
a single PV. This is usually not a big problem, but since
thin-pool and cache-pool is using this for allocating extents
for their metadata LVs it might be eventually causing errors
where the remaining free spaces for large metadata size is spread
over several PV.
When passing 'pvmove --name arg' try to automatically move
all associated dependencies with given LV.
i.e. 'pvmove --name thinpool vg vgnew'
moves all thins and data and metadata LV into a new VG vgnew.
Use update_pool_metadata_min_max() which is shared with
thin-pool metadata min-max updating.
Gives improved messages when converting volumes to metadata.
There is not much point to let allocate more then this size
even when i.e. converted LV is bigger then 16GiB (%extent_size)
ATM neither thin-pool nor cache-pool supports bigger metadata.
Initial support for thin-pool used slightly smaller max size 15.81GiB
for thin-pool metadata. However the real limit later settled at 15.88GiB
(difference is ~64MiB - 16448 4K blocks).
lvm2 could not simply increase the size as it has been using hard cropping
of the loaded metadata device to avoid warnings printing warning of kernel
when the size was bigger (i.e. due to bigger extent_size).
This patch adds the new lvm.conf configurable setting:
allocation/thin_pool_crop_metadata
which defaults to 0 -> no crop of metadata beyond 15.81GiB.
Only user with these sizes of metadata will be affected.
Without cropping lvm2 now limits metadata allocation size to 15.88GiB.
Any space beyond is currently not used by thin-pool target.
Even if i.e. bigger LV is used for metadata via lvconvert,
or allocated bigger because of to large extent size.
With cropping enabled (=1) lvm2 preserves the old limitation
15.81GiB and should allow to work in the evironement with
older lvm2 tools (i.e. older distribution).
Thin-pool metadata with size bigger then 15.81G is now using CROP_METADATA
flag within lvm2 metadata, so older lvm2 recognizes an
incompatible thin-pool and cannot activate such pool!
Users should use uncropped version as it is not suffering
from various issues between thin_repair results and allocated
metadata LV as thin_repair limit is 15.88GiB
Users should use cropping only when really needed!
Patch also better handles resize of thin-pool metadata and prevents resize
beoyond usable size 15.88GiB. Resize beyond 15.81GiB automatically
switches pool to no-crop version. Even with existing bigger thin-pool
metadata command 'lvextend -l+1 vg/pool_tmeta' does the change.
Patch gives better controls 'coverted' metadata LV and
reports less confusing message during conversion.
Patch set also moves the code for updating min/max into pool_manip.c
for better sharing with cache_pool code.
When detaching writecache, make the first stage send a message
to dm-writecache to set the cleaner option. This is instead of
reloading the dm table with the cleaner option set. Reloading
the table causes udev to process/probe the dm dev, which gets
stalled because of the writeback activity, and the stalled udev
in turn stalls the lvconvert command when it tries to sync with
udev events.
When getting writecache status we do not need to get
open_count or read_head info, which can cause extra steps.
In case legs of a raid0 LV are removed, the lvdisplay command still
reports 'available' though raid0 is not providing any resilience
compared to the other raid levels.
Also lvdisplay does not display '(partial)' in case of missing raid0
legs as oposed to the lvs command.
Enhance lvdisplay to report "NOT available" for any RaidLV type in case
too many legs are inaccessible hence causing data loss. I.e. any leg
for raid0, all for raid1, more than 1 for raid4/5, more than 2 for raid6
and in case of completely lost mirror groups for raid10.
Add test/shell/lvdisplay-raid.sh.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1872678
Read buffersize - 1 so the last byte is always 0.
Simplify init of 0 buffers.
Check snprintf result for error and report internal error as it could
happen only via bad compile parameters.
New VDO targets v6.2.3 corrects support for online rename of VDO device.
If needed if can be disable via new lvm.conf setting:
vdo_disabled_features = [ "online_rename" ]
When removing pool LV from a stacked LV setup, it's been possible
to leak _pmspare and such hidden LV then required manual
user removal.
Fix it by moving automatic removal into _lv_reduce().
When adding replacement raid+integrity images (lvconvert --repair
after a raid image is lost), various errors can cause the function
to exit with an error. On this exit path, the function attempts
to revert new images that had been created but not yet used. The
cleanup failed to account for the fact that not all images needed
to be reverted.
Since commit 77fdc17d70 always include
log_len size into needed extents - however now we may need sometimes
more extents then necessary - mainly when multiple PVs are involved
into allocation.
Add logs_still_needed into calculation of sufficient_pes_free()
When a writecache sublv or an integrity metadata sublv
are partial (missing a dev), set the partial flag on
the upper level LV also, as is done for other sublvs.
Fix the two-step writecache detach in commit c32d7fed4f.
In the case of uncache, the cachevol is removed after
detaching the writecache. When the detach is finished
in the second step, the remove must wait until then.
When using cache with a cachevol, the cache_check tool was
not being run on the cache metadata during activation.
cache_check clears the needs_check flag in the cache
metadata, so if the flag was set due to an unclean
shutdown, the activation would fail.
When 'fsadm resize vg/lv' is used without size, it should just
resize filesystem to match device - but since we now check
for unbound variable in bash - the previous usage no longer
works and needs explicit check.
Verify that corruption is corrected for raid levels other
than raid1. For other raid levels, attempt to corrupt the
given file pattern on each underlying device, since we don't
know which device contains the file being corrupted.
This ensures that corruption is actually be introduced
when testing the other raid levels.
Verify that corruption is being corrected by checking
the integritymismatches count is non-zero for the raid LV,
which includes the total from all images (since we don't
know which image will have the corruption.)
Each integrity image in a raid LV reports its own number
of integrity mismatches, e.g.
lvs -o integritymismatches vg/lv_rimage_0
lvs -o integritymismatches vg/lv_rimage_1
In addition to this, allow the total number of integrity
mismatches from all images to be displayed for the raid LV.
lvs -o integritymismatches vg/lv
shows the number of mismatches from both lv_rimage_0 and
lv_rimage_1.
Enhance VDO man page with description of memory usage
and space requirements chapter.
Remove some unneeded blank lines in man page.
Use more precise terminology.
Correct examples since cpool and vpool are protected names.
filters needing io weren't being run because bcache
wasn't set up. Read the first 4k of the device
before doing filtering or reading ondisk structs to
reduce reads.
It's possible for a machine with a non-4k page size
to create a PV with an mda_header at an offset other
than 4k. Fix pvck --dump to work with these other
mda offsets. pvck --repair will write a new first
mda at 4096 but lvm with other page sizes will work
with this.
The args for pvcreate/pvremove (and vgcreate/vgextend
when applicable) were not efficiently opened, scanned,
and filtered. This change reorganizes the opening
and filtering in the following steps:
- label scan and filter all devs
. open ro
. standard label scan at the start of command
- label scan and filter dev args
. open ro
. uses full md component check
. typically the first scan and filter of pvcreate devs
- close and reopen dev args
. open rw and excl
- repeat label scan and filter dev args
. using reopened rw excl fd
- wipe and write new headers
. using reopened rw excl fd
Older blockdev tool return failure error code with --help,
and since now the tool abort on command failure, lets
detect missing --getsize64 support directly by running
command and check if it returns something usable.
It's likely very hard to have the system with
such old blockdev tool and newer lvm2 compiled.
Since 'kilobytes' could be seen in 2 way - SI as '1000',
while all programmers sees it as '1024' - switch to
commonly acceptted KiB, MiB....
Resolves RHBZ 1496255.
Test case where filesystem has been corrected via fsck.
In such case fsck returns '1' as success and should be
handled in a same way as '0' since fs is correct.
Set more secure bash failure mode for pipilines.
Avoid using unset variables.
Enhnace error reporting for failing command.
Avoid using error via 'case..esac || error'.
In some cases the dev size may not have been read yet
in set_pv_devices(). In this case get the dev size
before comparing the dev size with the pv size.
Restructure the pvscan code, and add new temporary files
that list pvids in a VG, used for processing PVs that
have no metadata.
The new temp files, in /run/lvm/pvs_lookup/<vgname>, allow a
proper pvscan --cache to be done on PVs that have no metadata.
pvscan --cache <dev> is only supposed to read <dev>, but when
<dev> has no metadata, this had not been possible. The
command had to fall back to scanning all devices to read all
VG metadata to get the list of all PVIDs needed to check for
a complete VG. Now, the temp file can be used in place of
reading metadata from all PVs on the system.
To read the lvm headers and set dev->pvid if the
device is a PV. Difference from label_scan_ functions
is this does not read any vg metadata or add any info
to lvmcache.
Filtering in label_scan was controlled indirectly by
the fact that bcache was not yet set up when label_scan
first ran. The result is that filters that needed data
would not run and would return -EAGAIN, which would
result in the dev flag FILTER_AFTER_SCAN being set.
After the dev header was read for checking the label,
filters would be rechecked because of FILTER_AFTER_SCAN.
All filters would be checked this time because bcache
was now set up, and the filters needing data would
largely use data already scanned for reading the label.
This design worked but is hard to adjust for future
cases where bcache is already set up.
Replace this method (based on setting up bcache, or not)
with a new cmd flag filter_nodata_only. When this flag
is set filters that need data will not run. This allows
the same label_scan behavior when bcache has been set up.
There are no expected changes in behavior.
Touch of stack allocation validated given size with rlimit
and if the reserved_stack was above rlimit, its been completely
ignored - now we will always touch stack upto rlimit/2 size.
cmd context has 'threaded' value that used be set
by clvmd - and allowed proper memory locking management.
Reuse same bit for dmeventd.
Since dmeventd is using 300KiB stack per thread,
we will ignore any user settings for allocation/reserved_stack
until some better solution is find.
This avoids crashing of dmevend when user changes this value
and because in most cases lvm2 should work ok with 64K stack
size, this change should not cause any problems.
Check that type is always defined, if not make it explicit internal
error (although logged as debug - so catched only with proper lvm.conf
setting).
This ensures later type being NULL can't be dereferenced with coredump.
DM tree keeps track of created device while preloading a device tree.
When fail occures during such preload, it will now try to remove
all created and preloaded device. This makes it easier to maintain
stacking of device, since we do not need to check in-depth for
existance of all possible created devices during the failure.
Since 'BLKZEROOUT' streams out more block at once, at can easily
zero-out larger set of blocks after 1st. failing one.
So the test is adapted to fully 'hide' swap header under error target.
When ERR_DEV and ZERO_DEV are used, they are automatically
taken down when the last user no longer needs them,
so hide them from 'forgotten' device check.
As there could be few invokes of stacktrace, avoid
repeatedly display logs from commands.
So after first display rename debug.log* -> debug_log
so the file still can remain for reading in test dir.
Since BLKZEROOUT ioctl should be supposedly fastest
way how to clear block device start using this ioctl
for zeroing a device. Commonly we do zero typically
small portion of a device (8KiB) - however since we now
also started to zero metadata devices, in the case
of i.e. thin-pool metadata this can go upto ~16GiB
and here the performance starts to be noticable.
Since dev_set_bytes() now closes dev on error path itself,
remove this unneeded call now (introduced few commits back
in history thus removing comment from WHATS_NEW)
Since lvm2 normally block signals during protected
phase where it does not want to be interrupted.
Support interruptible processing when allowed
in section between sigint_allow() ... sigint_restore())
and let the 'io_getenvents()' finish with EINTR.
When bcache tries to write data to a faulty device,
it may get out of caching blocks and then just busy-loops
on a CPU - so this check protects this by checking
if there is already max_io (~64) errored blocks.
Call _wait_all() which does check whether there is still
some pending IO before sleep. Otherwise it may happen
our submitted IO operations have been already dispatched
and this call then endlessly waits for IO which are all done.
This can be reproduced when device returns quickly errors
on write requests.
When detaching a writecache, use the cleaner setting
by default to writeback data prior to suspending the
lv to detach the writecache. This avoids potentially
blocking for a long period with the device suspended.
Detaching a writecache first sets the cleaner option, waits
for a short period of time (less than a second), and checks
if the writecache has quickly become clean. If so, the
writecache is detached immediately. This optimizes the case
where little writeback is needed.
If the writecache does not quickly become clean, then the
detach command leaves the writecache attached with the
cleaner option set. This leaves the LV in the same state
as if the user had set the cleaner option directly with
lvchange --cachesettings cleaner=1 LV.
After leaving the LV with the cleaner option set, the
detach command will wait and watch the writeback progress,
and will finally detach the writecache when the writeback
is finished. The detach command does not need to wait
during the writeback phase, and can be canceled, in which
case the LV will remain with the writecache attached and
the cleaner option set. When the user runs the detach
command again it will complete the detach.
To detach a writecache directly, without using the cleaner
step (which has been the approach previously), add the
option --cachesettings cleaner=0 to the detach command.
Reorganize checking the device args for pvcreate/pvremove
to prepare for future changes. There should be no change
in behavior. Stop the inverted use of process_each_pv,
which pulled in a lot of unnecessary processing, and call
the check functions on each device directly.
Since we detect already transaction if before starting
to build dm tree - this extra check is a duplicate
that would only capture very tiny 'race' and we later
validate transaction_id with suspended snapshot origin.
Introduce structures lv_status_thin_pool and
lv_status_thin (pair to lv_status_cache, lv_status_vdo)
Convert lv_thin_percent() -> lv_thin_status()
and lv_thin_pool_percent() + lv_thin_pool_transaction_id() ->
lv_thin_pool_status().
This way a function user can see not only percentages, but also
other important status info about thin-pool.
TODO:
This patch tries to not change too many other things,
but pool_below_threshold() now uses new thin-pool info to return
failure if thin-pool cannot be actually modified.
This should be handle separately in a better way.
LVM2 is distributed under GPLv2 only. The readline library changed its
license long ago to GPLv3. Given that those licenses are incompatible
and you follow the FSF in their interpretation that dynamically linking
creates a derivative work, distributing LVM2 linked against a current
readline version might be legally problematic.
Add support for the BSD licensed editline library as an alternative for
readline.
Link: https://thrysoee.dk/editline
Cover the case where two copies of metadata have the
same seqno but different checksums. Also elaborate
on an existing fixme in the code for this case, since
we should be doing something better for this case.
This had been uncovering an issue with reopening
fds in readwrite mode.
There's a bug when lvpoll attempts to write new hints,
related to the fact that lvpoll does not follow the same
scanning process as standard commands.
Fix by disabling the use of hints in lvpoll. We may want
to renable hints in lvpoll in a way that they can be used,
if valid, but not updated if they don't exist or are invalid.
Improve error response and reporting, when creating thin snapshots.
If the thin pool kernel metadata already have device with ID lvm2
tries to create, give more meanigful error message and also properly
restore transaction id to the value known to thin-pool in this case.
Before it's been possible to divert by one from kernel TID value,
and lvm2 stacked delete message for such thin device.
Since ATM kernel does not support this operation,
disable 'lvrename' of an active vdopool.
As a workaround, user may simply deactivate, rename and activate.
This test seems to be hitting some corner case in handling
out-of-metadata space condintion in thin-pool.
Add few more aid notes and functionality.
Also add missing '|| true' with now direct-IO dd command.
Shorten running time of the test.
Fix some issues in invoked resizing script to it returns
correct return code and dmeventd can be a little bit quicker
in this test.
When user tries to extend vdo pool - he needs to go always
at least by 1 full VDO slab (defined as vdo_slab_size_mb).
To avoid all trouble around find 'workable' size - lvm2 automatically
increases the passed (or by --use-policies calculated) extension size
(and informs a user about sometimes possibly large increase as slab
size can go upto 32GiB)
With VDO users need to always 'think-big' anyway and expect such
operation to be in GiB domain range.
When thetable reload fails during suspend() - we were only calling
plain resume() - and this will reload only those devices,
which were left suspend, but will not try to restore
metadata state according to lvm2 reverted metadata.
So if we were reloading device tree - we have restored
only top-level LV and rest of reverted device manipulation
were left alone and possibly mismatched what is in committed
metadata.
FIXME: There are several cases were such revert will likely not work
properly anyway as some operation are currenly handled in single commit,
while they need multiple commits, but it's step towards better correctness.
At least we catch there errors now earlier.
When loop can't handle sector-size option - failure caused double fail
for access of unbound variable
Also fix expression for 'rm' and remove loops after loop release.
If the test runs of loop device backend with 512 sectors,
xfs selects this smaller sector size and then data do not fit
(we would need -l9 with most of 'raids').
With 4K sectors data always fits.
Shorten and make the test easily readable by moving same code into
function and removed one duplicated test for 512,4096 combination.
Always use scsi_debug - since default ramdisk or loop device backend
is unpredictible.
lvm opens devices readonly to scan them, but
needs to open then readwrite to update the metadata.
Previously, the ro fd was closed before the rw fd
was opened, leaving a small gap where the dev was
not held open, and during which the dev could
possibly change which storage it referred to.
With the bcache_change_fd() interface, lvm opens a
rw fd on a device to be written, tells bcache to
change to the new rw fd, and closes the ro fd.
. open dev ro
. read dev with the ro fd (label_scan)
. lock vg (ex for writing)
. open dev rw
. close ro fd
. rescan dev to check if the metadata changed
between the scan and the lock
. if the metadata did change, reread in full
. write the metadata
Add a "device index" (di) for each device, and use this
in the bcache api to the rest of lvm. This replaces the
file descriptor (fd) in the api. The rest of lvm uses
new functions bcache_set_fd(), bcache_clear_fd(), and
bcache_change_fd() to control which fd bcache uses for
io to a particular device.
. lvm opens a dev and gets and fd.
fd = open(dev);
. lvm passes fd to the bcache layer and gets a di
to use in the bcache api for the dev.
di = bcache_set_fd(fd);
. lvm uses bcache functions, passing di for the dev.
bcache_write_bytes(di, ...), etc.
. bcache translates di to fd to do io.
. lvm closes the device and clears the di/fd bcache state.
close(fd);
bcache_clear_fd(di);
In the bcache layer, a di-to-fd translation table
(int *_fd_table) is added. When bcache needs to
perform io on a di, it uses _fd_table[di].
In the following commit, lvm will make use of the new
bcache_change_fd() function to change the fd that
bcache uses for the dev, without dropping cached blocks.
Use new SKIP_WITH_LOW_SPACE and set higher requirement for free space.
But still this test can't run on system's tmpfs directories -
as they typically provide less then 2G of space and when the test
runs there it also provisioning for all READ pages!)
BRD (ramdisk) device should work.
Extend a _wait_recalc() loop for slower hw.
When creating large raid which do not need to be fully synchronized use
them on delay devices - so even less data needs read/write.
Remove unneeded lvchange as lvcreate is already leaving LV inactive.
Replace printf with awk as generator.
mm
Test can set individually a higher value for required free space on
storage.
Note: it is not fully reliable since when 'brd' (ramdisk) device is used
this free space value is rather meanigul, but it might help
in case where a real filesystem is doing back-end for test devices.
When the test exhausts all the available free space on storage device,
then during the fail we cannot write anything as well - yet
the teardown needs to finish it's work - otherwise we leave
basicaly overfilled filesystem for all remaining tests.
In cases where internal functions like zero_dev, delay_dev pass-in
invalid parameter so resulting table can't work, resume at least
previous table line before failing out - so the cleaning process
later on is not stuck waiting on a suspended device.
While the previous commit c9b40083fc
decresed version to 1.19 for using bigger datasets, it's not
been quite right - so from our bb machine it looks like
bigger metadata consumption started with 1.19 and kernel 4.18
(fc27)
Use bigger volume and slowdown writing to cache device.
This allows more simple to reach 'dirty' state.
Also document exactly 1 SIGINT has to fire aborting of flushing.
During removal of a lot of locking code the signal blocking got lost
and signal processing got broken leading to unpredictable
behavior of i.e. activation code the can get interrupted in the
middle of DM table processing.
lvm2 code always expects signals are blocked while lock is held
unless it is explictelly placed into section of:
sigint_allow();....;sigint_restore();
For checking catched interrupt there is sigint_catched();
Metadata size was calculated correctly only for raids.
Fixes problem for crash during lvcreate when thin-pool was created
on a VG where remaining free space had the size to only fit a single
metadata LV and not also its _pmspare.
Lvcreate crashed with this assert message:
lvcreate: metadata/pv_map.c:198: consume_pv_area: Assertion `to_go <= pva->count' failed.
Aborted (core dumped)
TODO: there is probably to large overload of several alloc_handle
variables.
Reported-by: Wu Guanghao<wuguanghao3@huawei.com>
Reported-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cow may not be a snapshot type, the return value of origin_from_cow(cow) may be NULL
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
LV may not be a snapshot type, the return value of find_snapshot(lv) may be NULL.
Here, we will call stack if LV is not a snapshot type.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
The return value of top_level_lv_name() may be NULL, so we should
check return value of top_level_lv_name before calling
strcmp(lv->name, top_level_lv_name(vg, lv_name)).
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
When using cscope to read code, it will generate below 3 files for speedup
cross-refer: cscope.files, cscope.in.out, cscope.po.out
The .gitignore only contains "/cscope.out". It a little bit messy when
executing 'git status', and other git commands.
This patch add all cscope generated files in .gitignore.
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
When using --use-policy for automatic extension of thin-pool,
the extension of thin-pool's metadata itself can actually take
some extra space.
Since I'm not aware of exact compensation formula, add just
1% extra to calculated amount and hope it fits.
Wanted target is to always have usable thin-pool that fits
bellow pool_metadata_min_threshold().
Since we query on regular code these:
lv_raid_has_integrity()
lv_has_integrity_recalculate_metadata()
without prior checking for lv_is_raid() - these 'return 0' should
not use <stacktrace> as they are expected.
Correcting rounding rules for percentage evaluation.
Validate supported range of percentage.
(although ranges are already validated earlier on code path)
Enable compilation of vdo and writecache support as internaly
supported segment types by default.
For disabling use:
--with-vdo=none
--with-writcache=none
For proper checking of extension progress require version 1.15
It looks with older versoin extension happens during very slow
resume within lvm command - although speed is still somewhat slow
with latest version.
This is probably somewhat experimantal patch - but when i.e. raid device
is just extend, there should not be a technical need for flush,
unless the target would stricly need it. It should allow faster
processing of lvm command not being blocked by possibly longer flush.
Since we do not support rimage & rmeta for snapshots - we can
avoid quering for -cow devices and add them as origin_only -
since their snapshots (-cow) could have never existed.
This redumes several ioctl operation during table preloading.
Use '0' for error and '1' as success.
Also drop INTERNAL_ERROR from path - as this error
is ATM used for invalid devices.
(i.e. test lvconvert-raid1-split-trackchanges.sh)
Speed-up a bit the first synchronization with just 50ms write delay,
but later set also delay on read to slowdown lvextend.
FIXME: there are still things to look at:
0 229376 raid raid1 2 AA 229376/229376 idle 0 0
0 229376 raid raid1 2 AA 0/229376 frozen 0 0 -
0 262144 raid raid1 2 AA 229376/262144 repair 0 0 -
0 262144 raid raid1 2 AA 229376/262144 repair 0 0 -
0 262144 raid raid1 2 AA 245888/262144 repair 0 0 -
Just like we have 'writeerror_dev' supporting creation of device
which 'readable' segment and segments where write will fail we
have now support for delay zero mappings.
This is useful if we want to 'fake' large writing areas where we do
not really care about the actual 'disk' content - since we test
operation logic and it doesn't matter we read and write zeroes.
With combination with 'delay' target we can create specific mappings
and avoid using large memory areas of ramdisk.
On test system with 'default' filter (aka accept all) test
after enabling device can suffer from automatic system
activation - so for created LVs setup skipping this automatic
activation. This should prevent getting LVs into table
with pvscan service.
Since we declare dev_name in lib/device/device.h
and pvs in commands.h
rename local dev_name to device_name
and pvs to pvs_list to prevent shadowing warning.
m
Switch remaining zero sized struct to flexible arrays to be C99
complient.
These simple rules should apply:
- The incomplete array type must be the last element within the structure.
- There cannot be an array of structures that contain a flexible array member.
- Structures that contain a flexible array member cannot be used as a member of another structure.
- The structure must contain at least one named member in addition to the flexible array member.
Although some of the code pieces should be still improved.
While normally the 'mmap' file reading is better utilizing resources,
it has also its odd side with handling errors - so while we normally
use the mmap only for reading regular files from root filesystem
(i.e. lvm.conf) we can't prevent error to happen during the read
of these file - and such error unfortunately ends with SIGBUS error.
Maintaing signal handler would be compilated - so switch to slightly
less effiecient but more error resistant read() functinality.
reproducible steps:
1. vgcreate vg1 /dev/sda /dev/sdb
2. lvcreate --type raid0 -l 100%FREE -n raid0lv vg1
3. do remove the /dev/sdb action
4. lvdisplay show wrong 'LV Status'
After removing raid0 type LV underlying dev, lvdisplay still display
'available'. This is wrong status for raid0.
This patch add a new function raid_is_available(), which will handle
all raid case.
With this patch, lvdisplay will show
from:
LV Status available
to:
LV Status NOT available (partial)
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.com>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
It's better to set most of option as 'commented' with some
documented defaults instead of providing strict values.
This has the advantage we can eventually 'change' defualts
and get them working in future. Otherwise once the setting
is stored in lvm.conf in /etc, such setting has strictly
defined value and that can be only change with file update.
Allow the optional '--type raid1' to be included in the lvconvert
command when adding or removing raid images with integrity.
It does not change the meaning of the command (specifying a type
that matches the current type is redundant but generally allowed.)
merge.c:_check_lv_segment() was checking regionsize vs. mirrored LV size on
any 'mirror/raid1/raid10' segment type including type 'mirrored' mirror logs.
Avoid the check only for 'mirrored' mirror logs to allow conversion from log
type 'disk' with regionsize > mirror log SubLV size.
As we disabled support for 'mirrored' mirror logs with
commit e82303fd6a which still conditionally
allows to enable it via global/support_mirrored_mirror_logs=1,
patch is mandatory for all distributions.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1712983
Currently lvm2 is not wiping signatures when creating 'metadata' volumes
and raid _rmeta was the only exception - so make the behavior consistent
with other metadata devices and drop wiping ATM.
Drop also some extra debug since they are now more explanatory in
wipe_lv() function.
Also note - although lvm2 now does not wipe signatures - the error
from such wipping used to be actually 'ignored' before wipe_lv()
started to return error (with recent commit) and raid creation
continued with 'unzeroed' metadata device.
TODO: Several issues to resolve:
1. We may want to flip to wipping with all LVs (in that case we need to
support passing --yet & --force).
2. Also we may want to clear whole metadata device - however current
function is also used for wipping i.e. snapshot COW device which
is likely not a good candidate for full device zeroing.
We may also need to think about better logic when extent size is
enforcing very large LVs, when only a small portion of LV is ever
being used.
3. Using TRIM instead of zeroing metadata device might be worth to
implement.
mm
When converting volume to pool LV use also wiping of other signatures.
For writecache & pool conversion support --yet and --force
to bypass prompting for signature wiping.
For writecache drop unneded zero_sectors.
Note: currently we have lvconvert doing convertion and prompting
for confirmation of conversion - and then again wipe_lv() prompts
for removing i.e. filesystem signature - we should unify this
prompting into 1 message - althought the 'filesystem' discovery
needs active volume - while the 1st. conversion prompt can
work without active converted volume.
To avoid polution of metadata with some 'garbage' content or eventualy
some leak of stale data in case user want to upload metadata somewhere,
ensure upon allocation the metadata device is fully zeroed.
Behaviour may slow down allocation of thin-pool or cache-pool a bit
so the old behaviour can be restored with lvm.conf setting:
allocation/zero_metadata=0
TODO: add zeroing for extension of metadata volume.
Failure in wiping/zeroing stop the command.
If user wants to avoid command abortion he should use -Zn or -Wn
to avoid wiping.
Note: there is no easy way to distinguish which kind of failure has
happend - so it's safe to not proceed any futher.
When initiated larger write request, it may have happened, bcache
got out of free chunks - fix the loop, that is supposed to wait
until next free chunk becomes avain available.
To create a new cache or writecache LV with a single command:
lvcreate --type cache|writecache
-n Name -L Size --cachedevice PVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, a new cachevol LV is created internally, using PVfast
specified by the cachedevice option.
- Then, the cachevol is attached to the main LV, converting the
main LV to type cache|writecache.
Include --cachesize Size to specify the size of cache|writecache
to create from the specified --cachedevice PVs, otherwise the
entire cachedevice PV is used. The --cachedevice option can be
repeated to create the cache from multiple devices, or the
cachedevice option can contain a tag name specifying a set of PVs
to allocate the cache from.
To create a new cache or writecache LV with a single command
using an existing cachevol LV:
lvcreate --type cache|writecache
-n Name -L Size --cachevol LVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, the cachevol LVfast is attached to the main LV, converting
the main LV to type cache|writecache.
In cases where more advanced types (for the main LV or cachevol LV)
are needed, they should be created independently and then combined
with lvconvert.
Example
-------
user creates a new VG with one slow device and one fast device:
$ vgcreate vg /dev/slow1 /dev/fast1
user creates a new 8G main LV on /dev/slow1 that uses all of
/dev/fast1 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1
-n main -L 8G vg /dev/slow1
Example
-------
user creates a new VG with two slow devs and two fast devs:
$ vgcreate vg /dev/slow1 /dev/slow2 /dev/fast1 /dev/fast2
user creates a new 8G main LV on /dev/slow1 and /dev/slow2
that uses all of /dev/fast1 and /dev/fast2 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1 --cachedevice /dev/fast2
-n main -L 8G vg /dev/slow1 /dev/slow2
Example
-------
A user has several slow devices and several fast devices in their VG,
the slow devs have tag @slow, the fast devs have tag @fast.
user creates a new 8G main LV on the slow devs with a
2G writecache on the fast devs:
$ lvcreate --type writecache -n main -L 8G
--cachedevice @fast --cachesize 2G vg @slow
To add a cache or writecache to a main LV with a single command:
lvconvert --type cache|writecache --cachedevice /dev/ssd vg/main
A cachevol LV will be allocated from the specified cache device,
then attached to the main LV. Include --cachesize to specify the
size of cachevol to create, otherwise the entire cachedevice is
used. The cachedevice option can be repeated to create a cachevol
from multiple devices.
Example
-------
A user has an existing main LV that they want to speed up
using a new ssd.
user adds the new ssd to the VG:
$ vgextend vg /dev/ssd
user attaches the new ssd their main LV:
$ lvconvert --type writecache --cachedevice /dev/ssd vg/main
Example
-------
A user has two existing main LVs that they want to speed up
with a new ssd.
user adds the new 16G ssd to the VG:
$ vgextend vg /dev/ssd
user attaches some of the new ssd to the first main LV,
using half of the space:
$ lvconvert --type writecache --cachedevice /dev/ssd
--cachesize 8G vg/main1
user attaches some of the new ssd to the second main LV,
using the other half of the space:
$ lvconvert --type writecache --cachedevice /dev/ssd
--cachesize 8G vg/main2
Example
-------
A user has an existing main LV that they want to speed up using
two new ssds.
user adds the new two ssds the VG:
$ vgextend vg /dev/ssd1
$ vgextend vg /dev/ssd2
user attaches both ssds their main LV:
$ lvconvert --type writecache
--cachedevice /dev/ssd1 --cachedevice /dev/ssd2 vg/main
Use libblkid to detect sector/block size of the fs on the LV.
Use this to choose a compatible writecache block size.
Enable attaching writecache to an active LV.
When lvconvert is used to remove raid images, we can
skip calling lv_add_integrity_to_raid(), which finds
nothing to do, but the the blocksize validation would
be called unnecessarily and trigger spurious errors.
The test was using a raid+integrity LV without
first waiting for the integrity sync, which could
cause the test to fail (depending on init speed)
where it depends on integrity to work in uninitialized
areas.
Also use cmp instead of diff.
pvck --dump headers reads the metadata text area
to compute the text metadata checksum to compare
with the mda_header checksum.
The new header_only will skip reading the metadata
text and not validate the mda_header checksum.
It's possible for a dev-cache entry to remain after all
paths for it have been removed, and other parts of the
code expect that a dev always has a name. A better fix
may be to remove a device from dev-cache after all paths
to it have been removed.
When either logical block size or physical block size is 4K,
then lvmlockd creates sanlock leases based on 4K sectors,
but the lvm client side would create the internal lvmlock LV
based on the first logical block size it saw in the VG,
which could be 512. This could cause the lvmlock LV to be
too small to hold all the sanlock leases. Make the lvm client
side use the same sizing logic as lvmlockd.
The lock adopt feature was disabled since it had used
lvmetad as a source of info. This replaces the lvmetad
info with a local file and enables the adopt feature again
(enabled with lvmlockd --adopt 1).
The --lock-opt autowait was dropped back in 9ab6bdce01,
and attempting to specify it has quite an opposite effect:
no waiting is done, which makes the unit almost useless.
dm-integrity stores checksums of the data written to an
LV, and returns an error if data read from the LV does
not match the previously saved checksum. When used on
raid images, dm-raid will correct the error by reading
the block from another image, and the device user sees
no error. The integrity metadata (checksums) are stored
on an internal LV allocated by lvm for each linear image.
The internal LV is allocated on the same PV as the image.
Create a raid LV with an integrity layer over each
raid image (for raid levels 1,4,5,6,10):
lvcreate --type raidN --raidintegrity y [options]
Add an integrity layer to images of an existing raid LV:
lvconvert --raidintegrity y LV
Remove the integrity layer from images of a raid LV:
lvconvert --raidintegrity n LV
Settings
Use --raidintegritymode journal|bitmap (journal is default)
to configure the method used by dm-integrity to ensure
crash consistency.
Initialization
When integrity is added to an LV, the kernel needs to
initialize the integrity metadata/checksums for all blocks
in the LV. The data corruption checking performed by
dm-integrity will only operate on areas of the LV that
are already initialized. The progress of integrity
initialization is reported by the "syncpercent" LV
reporting field (and under the Cpy%Sync lvs column.)
Example: create a raid1 LV with integrity:
$ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo
Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB.
Logical volume "rr_rimage_0_imeta" created.
Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB.
Logical volume "rr_rimage_1_imeta" created.
Logical volume "rr" created.
$ lvs -a foo
LV VG Attr LSize Origin Cpy%Sync
rr foo rwi-a-r--- 1.00g 4.93
[rr_rimage_0] foo gwi-aor--- 1.00g [rr_rimage_0_iorig] 41.02
[rr_rimage_0_imeta] foo ewi-ao---- 12.00m
[rr_rimage_0_iorig] foo -wi-ao---- 1.00g
[rr_rimage_1] foo gwi-aor--- 1.00g [rr_rimage_1_iorig] 39.45
[rr_rimage_1_imeta] foo ewi-ao---- 12.00m
[rr_rimage_1_iorig] foo -wi-ao---- 1.00g
[rr_rmeta_0] foo ewi-aor--- 4.00m
[rr_rmeta_1] foo ewi-aor--- 4.00m
Make it possible to tear down VDO volumes with blkdeactivate if VDO is
part of a device stack (and if VDO binary is installed). Also, support
optional -o|--vdooptions configfile=file.
lvm2 supports thin-pool to be later used by other tools doing
virtual volumes themself (i.e. docker) - in this case we
shall not validate transaction Id - is this is used by
other tools and lvm2 keeps value 0 - so the transationId
validation need to be skipped in this case.
When vdopool is activated standalone - we use a wrapping linear device
to hold actual vdo device active - for this we can set-up read-only
device to ensure there cannot be made write through this device to
actual pool device.
Creating a snapshot was using a persistent LV lock
on the origin, so if the origin LV was inactive at
the time of the snapshot the LV lock would remain.
(Running lvchange -an on the inactive LV would
clear the LV lock.) Use a transient LV lock so it
will be dropped if it was not locked previously.
Prevent attaching writecache to an active LV until
we can determine the block size of the fs on the LV,
and use that to enforce an appropriate writecache
block size. Changing the block size under a mounted
fs can cause panic/corruption.
dmeventd is 'scanning' statuses in loop (most usually in 10sec
intervals) - and meanwhile it sleeps within:
pthread_cond_timedwait()
However this function call tends to wakeup sometimes a short amount of
time sooner - and our code still believe the 'right time' has not yet
arrived and basically for a moment 'busy-looped' on calling this
function - so for systems with 'clock_gettime()' present we obtain
time and we go 10ms to the future second - this avoids unneeded
repeated invocation of our time scheduling loop.
TODO: monitoring during 1 hour 'time-change'...
When formating VDO volume, the calculated amound of bits
for 'vdoformat --slab-bits' parameter was shifted by 2 bits
(calculated size was making 2MiB vdo_slab_size_mb value appear like if
user would be specifying only 512KiB)
Fixed by properly converting internal size_mb value to KiB.
Fix the anoying kernel message reported:
device-mapper: cache: 253:2: metadata operation 'dm_cache_commit' failed: error = -5
which has been reported while cachevol has been removed.
Happened via confusing variable - so switch the variable to commonly user '_size'
which presents a value in sector units and avoid 'scaling' this as extent length
by vg extent size when placing 'error' target on removal path.
Patch shouldn't have impact on actual users data, since at this moment
of removal all date should have been already flushed to origin device.
m
The previous patch improved read of pipe when lvm2 was looking
for default logical size, but we clearly must read pipe also
for -V case, when the logical size is already defined.
Still the place can be better to block only particular reshape
operations which ATM cause kernel problems.
We check if the new number of images is higher - and prevent to take
conversion if the volume is in use (i.e. thin-pool's data LV).
clang: it's supposedly impossible path to hit, as we should always
have origin_lv defined when running this path, but adding protection
isn't a big issue to make this obvious to analyzer.
Since _reserve_area() may fail due to error allocation failure,
add support to report this already reported failure upward.
FIXME: it's log_error() without causing direct command failure.
When _daemon_read()/_client_read() fails during the read,
ensure memory allocated withing function is also release here
(so caller does not need to care). Also improve code readbility a bit
a for same functionality use more similar code.
Although we expect min_chunk_size to be 32bit value, for
large size of caches it might be useful to do calcs 64bit.
So to avoid doing shift as signed 32bit - use unsigned 64bit
from the start.
reporting fields (-o) directly from kernel:
writecache_total_blocks
writecache_free_blocks
writecache_writeback_blocks
writecache_error
The data_percent field shows used cache blocks / total cache blocks.
Currently the error messages are not clear. This very easy to
guide user to execute "--removemissing --force", it is dangerous
and will make the LVs to be destroied.
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
systemctl status corosync (version: 2.4.5) report error:
parse error in config: No interfaces defined
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Since VDO is also pool, the old if() case missed to know about this,
and executed unnecesserily initialization of cache pool variables.
This was usually harmless when using 'smaller' sizes of VDO pools,
but for big VDO pool size, we were reporting senseless messages
about big cache chunk sizes.
Until we resolve reshape for 'stacked' devices, we need to disable it.
So users can no longer reshape i.e. thin-pool data volumes, causing
ATM bad thin-pool problems.
When devices are created - we were not giving meaning error messages
when the failure happened on 'reload' part of creation.
With this patch we are now able to report both name and major:minor.
Enhancment is most visible with 'crypto' devices,
which are using 'secure' memory erase bit.
To write a new/repaired pv_header and label_header:
pvck --repairtype pv_header --file <file> <device>
This uses the metadata input file to find the PV UUID,
device size, and data offset.
To write new/repaired metadata text and mda_header:
pvck --repairtype metadata --file <file> <device>
This requires a good pv_header which points to one or two
metadata areas. Any metadata areas referenced by the
pv_header are updated with the specified metadata and
a new mda_header. "--settings mda_num=1|2" can be used
to select one mda to repair.
To combine all header and metadata repairs:
pvck --repair --file <file> <device>
It's best to use a raw metadata file as input, that was
extracted from another PV in the same VG (or from another
metadata area on the same PV.) pvck will also accept a
metadata backup file, but that will produce metadata that
is not identical to other metadata copies on other PVs
and other areas. So, when using a backup file, consider
using it to update metadata on all PVs/areas.
To get a raw metadata file to use for the repair, see
pvck --dump metadata|metadata_search.
List all instances of metadata from the metadata area:
pvck --dump metadata_search <device>
Save one instance of metadata at the given offset to
the specified file (this file can be used for repair):
pvck --dump metadata_search --file <file>
--settings "metadata_offset=<off>" <device>
using --settings:
mda_offset=<offset> mda_size=<size> can be used
in place of the offset/size that normally come
from headers.
metadata_offset=<offset> prints/saves one instance
of metadata text at the given offset, in
metadata_all or metadata_search.
This reverts commit 7474440d3b.
lvs can use the scanning optimization again since it has
been changed in:
"scanning: optimize by checking text offset and checksum"
After the VG lock is taken for vg_read, reread the mda_header
and compare the metadata text offset and checksum to what was
seen during label scan. If it is unchanged, then the metadata
has not changed since the label scan, and the metadata does not
need to be reread under the lock for command processing.
For commands that do not make changes (e.g. reporting), the
mda_header is reread and checked on one mda to decide if the
full metadata rereading can be skipped. For other commands
(e.g. modifying the vg) the mda_header is reread and checked
from all PVs. (These could probably just check one mda also.)
The kernel MD runtime requires region size to be larger than stripe size
on striped raid layouts, thus the dm-raid target's constructor rejects
such request.
This causes e.g. an 'lvcreate --type raid10 -i3 -I4096 -R2048 -n lv vg' to fail.
Avoid failing late in the kernel by enforcing region size to be
larger or equal to stripe size.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1698225
When pvcreate/pvremove prompt the user, they first release
the global lock, then acquire it again after the prompt,
to avoid blocking other commands while waiting for a user
response. This release/reacquire changes the locking
order with respect to the hints flock (and potentially other
locks). So, to avoid deadlock, use a nonblocking request
when reacquiring the global lock.
The scanning optimization can produce warnings from
'lvs' when run concurrently with commands modifying LVs,
so disable the optimization until it can be improved.
Without the scanning optimization, lvs will always
read all PVs twice:
1. read metadata from all PVs, saving it in memory
2. for each VG
3. lock VG
4. reread metadata from all PVs in VG, replacing metadata
saved from step 1
5. run command on VG
6. unlock VG
The optimization would usually cause step 4 to be skipped,
and PVs would be read only once.
Running the command in step 5 using metadata that was not
read under the VG lock is usually fine, except for the
fact that lvs attempts to validate the metadata by comparing
it to current dm state. If other commands are modifying dm
state while lvs is running, lvs may see differences between
metadata from step 1 and dm state checked during step 5,
and print warnings.
(A better fix may be to detect the concurrent change and
fall back to rereading metadata in step 4 only when needed.)
Since we fixed linking of proper version of 'libdevmapper' with
linking lvm2 plugin correctly - we already have correct function
available linked with internal lvm library.
So drop unneeded include of parsing function.
Avoid mem leaking hint on every loop continue and
allocate hint only when it's going to be added into list.
Switch to use 'dm_strncpy()' and validate sizes.
For dev_in_device_list() != 0 allocated 'devl' was
actually leaking - so instead allocate 'devl' only
when !dev_in_device_list() and indent code around.
Since we check for NULL pointers earlier we need
to be consistent across function - since the NULL
would applies across whole function.
When dropping 'mda' check - we are actually
already dereferencing it before - so it can't
be NULL at that places (and it's validated
before entering _read_mda_header_and_metadata).
dev_unset_last_byte() must be called while the fd is still valid.
After a write error, dev_unset_last_byte() must be called before
closing the dev and resetting the fd.
In the write error path, dev_unset_last_byte() was being called
after label_scan_invalidate() which meant that it would not unset
the last_byte values.
After a write error, dev_unset_last_byte() is now called in
dev_write_bytes() before label_scan_invalidate(), instead of by
the caller of dev_write_bytes().
In the common case of a successful write, the sequence is still:
dev_set_last_byte(); dev_write_bytes(); dev_unset_last_byte();
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
When resizing 2 volumes like thin-pool and it's metadata and they
would be of a different type - command would be actually expecting
both LVs being of a same segtype - and would throw an error in
case they are different.
This patch fixes is by setting a new segtype from last segment of
2nd. extented device.
Also it fixes the possible 'percentage' extension setup that
might have been used for 'primary' volume - while the 'secondary'
LV always goes with direct size - as we do not support 'percentage'
setup for them
This affects maily usage of thin-pool where the extension of
thin-pool data size may also lead to extension of metadata size.
When running cluster test with clvmd, the actual 'monitoring'
happens in cluster - so the 'already monitored' message
is also logged within clvmd code and the command cannot
see such effect.
clvmd was incapable to report this information back to command
so it cannot be displayed this way.
Add 'lvs -o+seg_monitor' validation which also works in clustered mode.
Instead of checking all LVs in a VG - do just a direct copy of LVs
from the existing list ->segs_using_thin_lv.
TODO: maybe it could be better to expose seg_list to /tools...
Enhance lv_info with lv_info_with_name_check.
This 'variant' not only check existance if UUID in DM table
but also compares its DM name whether it's matching expected LV name.
Otherwise activation may 'skip' activation with rename in case the
DM UUID already exists, just device is different name.
This change make fairly easier manipulation with i.e. detached mirror
leg which ATM is using same UUID - just the LV name have been changed.
Used code was not able to run 'activation' (and do a rename) and just
skipped the call. So the code used to do a workaround and 'tried'
to deactivate such LV firts - this however work only in non-clvmd case,
as cluster was not having the lock for deactivated LV.
With this extended lv_info code will run 'activation' and will
synchronize the name to match expected LV name.
Patch extends _lv_info() with new paramter 'with_name_check',
which is later translated into 'name_check' argument for
_info_run() which in case of name mismatch evaluates the
check as if device does not exists.
Such call is only used in one place _lv_activate() which then
let activation run. All other invocation of _info() calls
are left intact.
TODO: fix mirror table manipulation (and raid)....
Avoid making more dbus calls to get information we already have. This
also avoids us getting an error where a dbus object representation is
being deleted by another process while we are trying to gather information
about it across the wire.
When a LV loses an interface it ends up getting removed and recreated.
This happens after the VGs have been processed and updated. Thus when
this happens we need to re-check the VGs.
VDO pool LVs are represented by a new dbus interface VgVdo. Currently
the interface only has additional VDO properties, but when the
ability to support additional LV creation is added we can add a method
to the interface.
The return value from bcache_invalidate_fd() was not being checked.
So I've introduced a little function, _invalidate_fd() that always
calls bcache_abort_fd() if the write fails.
The resume of 'released' 'COW' should preceed the resume of origin.
The fact we need to do the sequence differently for merge was
cause by bugs fixed in 2 previous commits - so we no longer need
to recognize 'merging' and we should always go with single
sequence.
The importance of this order is - to properly remove '-real' device
from origin LV. When COW is activated as 2nd. '-real' device is
kept in table as it cannot be removed during 1st. resume of origin,
and later activation of COW LV no longer builds tree associated
with origin LV.
When checking device id of a thin device that is just being
merged - the snapshot actually could have been already finished
which means '-real' suffix for the LV is already gone and just LV
is there - so check explicitely for this condition and use
correct UUID for this case.
When a cachevol LV is attached, have the LV keep it's lock
allocated. The lock on the cachevol won't be used while
it's attached. When the cachevol is split a new lock does
not need to be allocated. (Applies to cachevol usage by
both dm-cache and dm-writecache.)
When a user includes "--type foo" in a command, only
look at command definitions with matching type, as
opposed to using matching/mismatching --type as a
vote for/against a given command def. This means a
command with --type foo will prioritize a command def
with --type foo over other command defs that have
more matching options but an unmatching type. This
makes it more likely that a closely matching command
def will be recommended.
When LV gets cached and uses cache-pool - such cache-pool
will now get _cpool suffix automatically.
Thus 'Pool' column for cached LV will now show either _cvol
or _cpool LV.
Improve the implementation of extracting all text metadata
copies from the metadata area. Use this for the existing
metadata_all dump option.
Add a new metadata_search dump option which does not use
lvm headers to find metadata, but looks in standard
locations. This is useful if headers are damaged and
can't be used to locate metadata.
Adding '-v' to metadata_all or metadata_search will add
the description and creation_time to the printed list of
metadata instances that are found.
Before 'archive()' is called, lvm2 must not touch/modify metadata.
So move setting CACHE_VOL related flags past this point.
Also make sure reading of cache segtype always restores this
flag properly (even if compatible flag would be lost).
Since code is using -cdata and -cmeta UUID suffixes, it does not need
any new 'extra' ID to be generated and stored in metadata.
Since introduce of new 'segtype' cache+CACHE_USES_CACHEVOL we can
safely assume 'new' cache with cachevol will now be created
without extra metadata_id and data_id in metadata.
For backward compatibility, code still reads them in case older
version of metadata have them - so it still should be able
to activate such volumes.
Bonus is lowered size of lv structure used to store info about LV
(noticable with big volume groups).
In the hex dump output, grep for the vgname
followed by one space. This allows for test pids
with up to seven digits, which are used to contruct
the variable vgname used by the test. Otherwise
the long vgname wraps to the next line and fails to
match in grep.
When an LV is used as a writecache cachevol, give
it the LV name a _cvol suffix. Remove the suffix
when the cachevol is detached, restoring the
original LV name.
The first part of a cachevol LV is used for metadata,
and the rest of the space is used for data. The
division of space between metadata and data depends
on the total size of the cachevol.
The previous division gave more space than needed to
metadata, it was:
cachevol size 8M to 128M -> metadata size 16M *
cachevol size 128M to 1G -> metadata size 32M
cachevol size 1G and up -> metadata size 64M
(* if this resulted in over half the LV used as
metadata, then half the cachevol would be used
for metadata, and the other half for data.)
The division of space now gives less space to
metadata, it is:
cachevol size 8M to 16M -> metadata size 4M
cachevol size 16M to 4G -> metadata size 8M
cachevol size 4G to 16G -> metadata size 16M
cachevol size 16G to 32G -> metadata size 32M
cachevol size 32G and up -> metadata size 64M
When a VG contains some LVs with unknown segtypes, the user
should still be allowed to activate other LVs in the VG that
are understood.
$ lvs foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
LV VG Attr LSize
lvol0 foo -wi------- 4.00m
other foo vwi---u--- 48.00m
$ lvcreate -l1 foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
Cannot change VG foo with unknown segments in it!
Cannot process volume group foo
$ lvchange -ay foo/lvol0
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
$ lvchange -ay foo/other
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
Refusing activation of LV foo/other containing an unrecognised segment.
$ lvs foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
LV VG Attr LSize
lvol0 foo -wi-a----- 4.00m
other foo vwi---u--- 48.00m
A cachevol LV had the CACHE_VOL status flag in metadata,
and the cache LV using it had no new flag. This caused
problems if the new metadata was used by an old version
of lvm. An old version of lvm would have two problems
processing the new metadata:
. The old lvm would return an error when reading the VG
metadata when it saw the unknown CACHE_VOL status flag.
. The old lvm would return an error when reading the VG
metadata because it would not find an expected cache pool
attached to the cache LV (since the cache LV had a
cachevol attached instead.)
Change the use of flags:
. Change the CACHE_VOL flag to be a COMPATIBLE flag (instead
of a STATUS flag) so that old versions will not fail when
they see it.
. When a cache LV is using a cachevol, the cache LV gets
a new SEGTYPE flag CACHE_USES_CACHEVOL. This flag is
appended to the segtype name, so that old lvm versions
will fail to use the LV because of an unknown segtype,
as opposed to failing to read the VG.
Enhance activation of cached devices using cachevol.
Correctly instatiace cachevol -cdata & -cmeta devices with
'-' in name (as they are only layered devices).
Code is also a bit more compacted (although still not ideal,
as the usage of extra UUIDs stored in metadata is troublesome
and will be repaired later).
NOTE: this patch my brink potentially minor incompatiblity for 'runtime' upgrade
Instead of using 'noflush' option, switch cache_mode into WRITETHROUGH
which does not require flushing, when user confirmed he does not
want flushing for WRITEBACK (because of (partially) missing caching PV)
For wiping we activate and clear 'regular' devices,
since in case of whole process interuption (i.e. kill -9)
we leave metadata & DM table and workable state all the time.
Let vgck --updatemetadata repair cases where different mdas
hold indepedently valid but unmatching copies of the metadata,
i.e. different text metadata checksums or text metadata sizes.
vgck --updatemetadata would write the same correct
metadata to good mdas, and then to bad mdas, but the
sequence of vg_write/vg_commit calls betwen good and
bad mdas could cause a different description field to
be generated for good/bad mdas. (The description field
describing the command was recently included in the
ondisk copy of the metadata text.)
If a VG is forcibly changed from lock_type sanlock to
lock_type none, the internal lvmlock LV is left behind.
If that LV is not removed before vgremove is run on the
VG, then an internal check will be triggered by the
hidden lvmlock LV. So, check for and remove a left over
lvmlock LV during vgremove.
Add lots of vdo fields:
vdo_operating_mode - For vdo pools, its current operating mode.
vdo_compression_state - For vdo pools, whether compression is running.
vdo_index_state - For vdo pools, state of index for deduplication.
vdo_used_size - For vdo pools, currently used space.
vdo_saving_percent - For vdo pools, percentage of saved space.
vdo_compression - Set for compressed LV (vdopool).
vdo_deduplication - Set for deduplicated LV (vdopool).
vdo_use_metadata_hints - Use REQ_SYNC for writes (vdopool).
vdo_minimum_io_size - Minimum acceptable IO size (vdopool).
vdo_block_map_cache_size - Allocated caching size (vdopool).
vdo_block_map_era_length - Speed of cache writes (vdopool).
vdo_use_sparse_index - Sparse indexing (vdopool).
vdo_index_memory_size - Allocated indexing memory (vdopool).
vdo_slab_size - Increment size for growing (vdopool).
vdo_ack_threads - Acknowledging threads (vdopool).
vdo_bio_threads - IO submitting threads (vdopool).
vdo_bio_rotation - IO enqueue (vdopool).
vdo_cpu_threads - CPU threads for compression and hashing (vdopool).
vdo_hash_zone_threads - Threads for subdivide parts (vdopool).
vdo_logical_threads - Logical threads for subdivide parts (vdopool).
vdo_physical_threads - Physical threads for subdivide parts (vdopool).
vdo_max_discard - Maximum discard size volume can recieve (vdopool).
vdo_write_policy - Specified write policy (vdopool).
vdo_header_size - Header size at front of vdopool.
Previously only 'lvdisplay -m' was exposing them.
Since we now support activation of 'vdo' volume
without explicit activation of 'vdopool' it's now possible
to have active layer vdopool (-vpool) volume and
having vdopool itself inactive - yet still in this
case we can show available stats for this volume.
But we need to show correct activation status and other
standard info.
Adds support for the DM_GET_TARGET_VERSION to dmsetup.
It introduces a new comman "target-version" that will accept list
of targets and print their version.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Use /dev/md33 instead of /dev/md0 to reduce chances of
conflicting with an existing name.
Only call 'mdadm --stop /dev/md33' for cleanup and don't
use 'mdadm --stop --scan' to avoid stopping other md devs.
Due to a dm-raid target flaw fixed in target version 1.15.0,
extents of raid sets don't get resynchronized when new MD bitmp
pages have to be allocated due to the extension.
Introduce lvextend-raid.sh to test this flaw.
Related: rhbz1671964
When the PV device names in the VG metadata do not match the
current PV device names seen on the system, do not use the
optimized activation function (that avoids extra device scanning.)
When the device names do not match, it's a clue that there could
be duplicate PVs, in which case we want to scan all devicess to
find any duplicates and stop the activation if found.
This does not prevent autoactivating a VG from the incorrect
duplicate PV, because the incorrect duplicate may appear by itself
first. At that point its duplicate PV does not exist to be seen.
(A future enhancement could use the WWID to strengthen this
detection.)
Analogous to the case of a device with no regions, it is not an
error to attempt to list the stats groups on a device that has no
configured groups: just return success and continue.
Avoid checking 'lv_is_active()' since special LV types does this
validation anyway what calling _percent() function and call it
ONLY when none of special types is queried.
This restores support for VDO resize (as with support for
separate VDO pool activation, plain query for lv_is_active()
is not working in this case).
If the linear mapping is lost (for whatever reason, i.e.
test suite forcible 'dmsetup remove' linear LV,
lvm2 had hard times figuring out how to deactivate such DM table.
So add function which is in case inactive VDO pool LV checks if
the pool is actually still active (-vpool device present) and
it has open count == 0. In this case deactivation is allowed
to continue and cleanup DM table.
- use internal CACHE_VOL flag on cachevol LV
- add suffixes to dm uuids for internal LVs
- display appropriate letters in the LV attr field
- display writecache's cachevol in lvs output
This reverts commit ad560a286a.
The reverted patch also removed the warning which we realized we need
to keep as valuable process information (see related bugzilla below).
In a followup patch, we'll keep the message and avoid bailing out thus
always allowing lvconvert to try repairing if 'allocate' fault policy set.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1751887
. For dm-cache in writethrough, always allow splitcache,
whether the cache is missing PVs or not.
. For dm-cache in writeback, if the cache is missing PVs,
allow splitcache with force and yes.
. For dm-writecache, if the cache is missing PVs,
allow splitcache with force and yes.
Enhance 'activation' experience for VDO pool to more closely match
what happens for thin-pools where we do use a 'fake' LV to keep pool
running even when no thinLVs are active. This gives user a choice
whether he want to keep thin-pool running (wihout possibly lenghty
activation/deactivation process)
As we do plan to support multple VDO LVs to be mapped into a single VDO,
we want to give user same experience and 'use-patter' as with thin-pools.
This patch gives option to activate VDO pool only without activating
VDO LV.
Also due to 'fake' layering LV we can protect usage of VDO pool from
command like 'mkfs' which do require exlusive access to the volume,
which is no longer possible.
Note: VDO pool contains 1024 initial sectors as 'empty' header - such
header is also exposed in layered LV (as read-only LV).
For blkid we are indentified as LV with UUID suffix - thus private DM
device of lvm2 - so we do not need to store any extra info in this
header space (aka zero is good enough).
When lvm2 is activating layered pool LV (to basically keep pool opened,
the other function used to be 'locking' be in sync with DM table)
use this LV in read-only mode - this prevents 'write' access into
data volume content of thin-pool.
Note: since EMPTY/unused thin-pool is created as 'public LV' for generic
use by any user who i.e. wish to maintain thin-pool and thins himself.
At this moment, thin-pool appears as writable LV. As soon as the 1st.
thinLV is created, layer volume will appear is 'read-only' LV from this moment.
When pvscan is used to activate a VG via an
asynchronous service (i.e. lvm2-pvscan), there
is no requirement that the command wait for
udev to create device nodes before returning.
It's possible that waiting for udev is slow
enough to cause the service running the command
to time out. So, allow the --noudevsync option
to be given to pvscan to skip waiting for udev.
(This commit is not changing the lvm2-pvscan
service itself to use --noudevsync.)
Still unknown is whether there are any complex
LV activation cases in which lvm itself requires
access to a device node, in which case the udev
wait could be needed by lvm itself.
(When running an activation command directly
from the command line, it's generally expected
that the activated LVs are ready to use when
the command is finished, so lvm waits for
udev to finish creating the dev nodes.)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.