IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since we now support disabling memory locking by setting
reserved memory or stack to 0 - it would be useful if this would
work also with cmdline --config option.
TODO: rework creation and usage of cmdtool context so we avoid
several places in the code which do try to initialized something...
Fix regression introduced with commit:
964012fdb9
that effectively disabled memory locking before suspending volumes.
From merging/testing there remained wrong condition
as we really want to check for 0 memory reservation value
for both checked settings.
Because now, we are doing the fsinfo check before extending an LV and if
that check fails, we do not proceed to the LV extension itself and the
lvextend command bails out immediatelly.
It seems we need new_size_bytes in places where struct fs_info is also
passed. Store the new_size_bytes inside the struct fs_info so we
can just pass that one to all the functions we call and hence make
the code a bit cleaner and easier to follow.
This avoids a situation where we would extend an LV and then we would
not do anything to the FS on it because the FS info check failed for some
reason, like the type was not supported (e.g. swap) or we could not resize
the FS unless being in some supported state (e.g. XFS to be mounted for
the xfs_growfs to work).
Before this patch (LV resized, FS not resized):
❯ lvextend --fs resize -L+4M vg/swap
Size of logical volume vg/swap changed from 32.00 MiB (8 extents) to 36.00 MiB (9 extents).
File system extend is not supported (swap).
File system extend error.
Logical volume vg/swap successfully resized.
With this patch (LV not resized, FS not resized):
❯ lvextend --fs resize -L+4M vg/swap
File system extend is not supported (swap).
Deactivate converted volume to pool early, so the conversion
exits early and does not leave some already created metadata
volumes that needed manual cleanup by user after command
aborted its conversion operation when the converted volume
was actually in-use (i.e. when user tried to convert
a mounted LV into a thin-pool, 2 extra volumes needed removal).
Device quirks may cause sysfs wwid file to change what it
displays, from a bogus eui... string to an nvme... string.
The old wwid may be saved in system.devices, so recognizing
the device requires finding the old value from libnvme.
After matching the old bogus value using libnvme, system.devices
is updated with the current sysfs wwid value.
All block devices have a disk sequence number assigned (an ever-increasing 64 bit
sequence number) since kernel v5.15 (February 2021). The number is exported through
/sys/block/<disk>/diskseq property and also as DISKSEQ udev event variable.
The diskseq helps with referencing a device throughout its existence in
race-free way.
By default, the /usr/lib/udev/rules.d/60-persistent-storage.rules set
/dev/disk/by-diskseq/<diskseq> symlink for each block device. However,
these rules do not apply for DM devices because we manage the symlinks
ourselves in 13-dm-disk.rules where it properly follows the
DM_UDEV_DISABLE_DISK_RULES flag, among other things.
Add a rule to 13-dm-disk.rules to create the /dev/disk/by-diskseq/<diskseq>.
Since commit d106ac04ab ("configure.ac: use LIBSYSTEMD"),
lvmlockd is not built with SD_NOTIFY by default but depending
on LIBSYSTEMD_LIBS. There are three prerequisites of
nonempty LIBSYSTEMD_LIBS:
NOTIFYDBUS_SUPPORT, SYSTEMD_JOURNAL_SUPPORT and SYSTEMD_JOURNAL_SUPPORT.
If ./configure is called with options ' --disable-systemd-journal
--disable-app-machineid --enable-lvmlockd-sanlock
--disable-notify-dbus', the lvmlockd built is without sd_notify
support which causes hang of start lvmlockd service in notify type.
This commit adds options disable-sd-notify and enable-sd-notify.
The default value is autodetected and when the lvm2 is build with
systemd then sd-notify is enabled.
If systemd/sd-daemon.h is existed, call PKG_CHECK_MODULES libsystemd.
Signed-off-by: Su Yue <glass.su@suse.com>
Modified-by: Zdenek Kabelac <zkabelac@redhat.com>
Also fix the use of --all that was mistakenly included
as an accepted option for vgdisplay and two cases of pvdisplay
in commit "tools: enhance lvdisplay vgdisplay pvdisplay"
Remove --noudevsync option - as this breaks synchronization with
udev which is necessary when trying to i.e. create _rmeta_3
and wipe it - as the symlinks must be present for wiping.
So if there was some other issue (behind the comment) - we need to
check for the problem elsewhere instead of disabling udev sync.
Automatically use --enable-notify-dbus when building lvmlockd
if not configured otherwise by a configure user -
as the lvmlockd.service is notify based.
Split description for display commands so we can better describe
it's usage and combination of individual options in man page.
Now we can separately describe:
lvdisplay, lvdisplay -c, lvdisplay -C
vgdisplay, vgdisplay -c, vgdisplay -C
pvdisplay, pvdisplay -c, pvdisplay -C
TODO: Drop validation from command code itself.
Update the _print_man_option_desc() to also handle common parts
as the initial text without any specified section and also
add support for '#\n' to be able to revert to common part.
Avoid printing lvm2 command trace, if the test finds the 'dmeventd'
was started unxpectedly during testing as the last command is hardly
ever responsible for this
Also reorder some messages when doing teardown of devices.
Do not print 'help' message from hostname command, when it does
not support option '-I'.
Since we now keep lv names valid all the time (as they are part
of radix_tree) - there is a problem with this renaming code, that
for a moment used duplicated name in vg struct.
Fix it by interating LVs backwared - which avoids breaking consitency
and also actually makes code more simple.
Add check for 'leaked' symlinks after test and trap
the case when some 'danglink' links are present.
This might be some problem with udev synchronization
or some other strange race.
All such symlinks will be also removed so they will not
influence following tests.
When command prints warning about suppressing query
for inactive table, because this is not supported
by kernel - 1 printed message is just enough, no
reason to 'spam' command output all the time, message
will remain only in debug log.
Also drop 'WARNING:' from real 'error' message.
WARNIGS are supposed to be just warning and command
then exists with 'success'.
When specifying minimum_io_size with --vdosettings,
command assumed wrong unit (sectors).
So '--vdosettings minimum_io_size=512|4096' resulted into
an error that only 512 or 4096 values are allowed, but
at the same time values 1 or 8 were accepted.
So fix by converting any number >= 512 to 'sectors' and
keep input of 1 or 8 still valid if anyone has been using
this before.
So now we take 512 or 4096 and still also 1 or 8 with the
same effect.
Also correct the 'error' message when invalid minimum_io_size
is specified.
The df -a looks at whole system and it returns an error code in case
there's an inaccessible fs which is not even part of the testing environment.
The -a for df is not actually needed here in the lvresize-xfs test, so remove it.
Fix stripe count and size parameter validation for RAID LVs and
include existing automatic setting of these parameters based
on current shape of the RAID LV in case these are not set
on command line fully.
Previously, this was done only to a certain subset given by this
condition (where the 'stripes' is the '-i|--stripes' cmd line arg
and the 'stripe_size' is actually the '-I|--stripesize' cmd line arg):
!(stripes == 1 || (stripes > 1 && stripe_size))
This condition is a bit harder to follow at first sight and there
are no comments around with explanation for why this one is used,
so let's analyze it a bit more.
First, let's convert this to an equivalent condition (De Morgan law)
so it's easier to read for humans:
stripes != 1 && !(stripes > 1 && stripe_size)
Note: Both stripe and stripesize are unsigned integers, so they can't be negative.
Now, based on that condition, we were running the code to deduce the
stripe/stripesize and do the checks ("the code") only if both of these
are true:
- stripes is different from 1
- we don't have stripes > 1 and stripe_size defined at the same time
But this is not correct in all cases, because:
A) if someone uses stripes = 0, then "the code" is executed
(correct)
B) if someone uses stripes = 1, then "the code" is not executed
(wrong: we still need to be able to check the args against
existing RAID LV stripes whether it matches)
- if someone uses stripes > 1, then "the code" is:
C) if stripe_size = 0, executed
(correct)
D) if stripe_size > 0, not executed
(wrong: we still want to check against existing RAID LV stripes)
Current issues with this condition:
The B) ends up with segfault.
❯ lvextend -i 1 -l+1 vg/lvol0
Rounding size 4.00 MiB (1 extents) up to stripe boundary size 8.00 MiB (2 extents).
Segmentation fault (core dumped)
The D) ends up with errors like:
❯ lvextend -i 3 -l+1 -I128k vg/lvol0
Rounding size 4.00 MiB (1 extents) up to stripe boundary size 8.00 MiB (2 extents).
Rounding size (4 extents) up to stripe boundary size for segment (5 extents).
Size of logical volume vg/lvol0 changed from 8.00 MiB (2 extents) to 20.00 MiB (5 extents).
LV lvol0: segment 1 with len=5 has inconsistent area_len 3
Couldn't read all logical volumes for volume group vg.
Failed to write VG vg.
Conclusion:
The condition needs to be removed so we always run "the code" to check
given striping args given on command line against existing RAID LV
striping. The reason is that we don't want to allow changing stripe
count for RAID LVs through lvextend and we need to end up with the
error:
"Unable to extend <RAID segment type> segment type with different number of stripes"
(We do support changing the striping by lvconvert's reshaping functionality only).
With commit acbeaa7a8d we started
to use symlinks to link test suite shell scripts, however
they remained within CLEAN_TARGETS.
So when running 'make clean' within non-srcdir build dir, we
were cleaning actuall shell script in this dir.
So remove list of this script from CLEAN_TARGETS in this case.
get_sizes_lockspace() may not always initilize all passed values
in case the bitfield would not trigger if() path.
So just in case keep the path initilized.
TODO: maybe add INTERNAL_ERROR to get_sizes_lockspace().
When converting a VG to locktype sanlock, a new
lease is allocated for each existing lv. Finding
a new lease location involved searching the lvmlock
LV from the start for an unused location, which
would be very slow with many LVs. Improve this by
starting each search from the last used location.
Fix regression from commit 7f29afdb06
"lvmlockd: configurable sanlock lease sizes on 4K disks"
That change failed to recognize that a running lockspace will not
exist in lvmlockd when converting a local VG to a sanlock VG, i.e.
vgchange --locktype sanlock vgname. When the vgchange attempted
to initialize new lv leases for existing LVs, lvmlockd would
return an error when it found no lockspace.
When searching for committed LV by uuid, this search can
be expensive for commands like 'vgremove' - so for
this part introduce 'lv_uuids' radix_tree that is
build with first access to lv_committed().
Since there is a group of commands that need to access 'lv_list'
while still need to search for LV by its name, make the whole
struct lv_list a member of logical_volume structure.
This makes it easy to return also 'lv_list' this list this LV
within VG.
Also the patch should not use more memory, since we were allocating
lv_list for each LV anyway when linkin LV to VG.
Since find_lv_by_name() is now using radix_tree(),
use the same 'search for /' in LV in name for both
find_lv() & find_lv_in_vg().
TODO: Possibly refactor code and use only dm_list
instead of lv_list and dereference LV with container_of()
(thus saving pointer within struct logical_volume) - but
we use 'lv_list' currently in many places...
Add lvm.conf config/validate_metadata configurable setting.
Allows to disable validation of volume_group structure before
writing to disk.
Call of vg_validate() is supposed to catch any inconsistency
of in-memory volume group structure and possibly early aborting
commnand before making any more 'damage' in case the VG struct
is found insistent after some metadata manipulation.
This is almost always useful for devel - and also for normal user
as for small metadata size this doesn't add too much overhead.
However if the volume_group size is large and operations are just
adding removing simple LVs - this validation time may add noticable
to final command running time.
So if the user seeks the highest perfomance of command and does
not do any 'complex' metadata manipulation - it's reasonably safe
to disable validation (with the use of setting "none") here.
With presence of uniq_insert, use this function also
here for extra protection and check for duplicate lv_name
when inserting a new name into radix_tree.
Avoid config 'grep' with actual 'randomly' generated path name
which may eventually contain 'cc' as part the path and
causing a mismatch of the grep test.
When 'lvresize -r' is used to resize the volume, it's valid to
resize even to the same size of an LV, as the command then runs
fs-resize utility to eventually upsize the fs to the current
volume size.
Return code of such command then reflects the return value
of this fs-resize tool.
This fixes the regression introduced when the support
for option --fs was added (2.03.17).
Use proper function names in annotation
There are no fuction named print_common_options_cmd()
and print_common_options_lvm(). So, rename them to the
real function named print_usage_common_cmd() and
print_usage_common_lvm().
Signed-off-by: YunJian Long
Tricky one - as the pipe exit codes may result into whole
test failure depending on how quick/slow command exits
are within pipeline.
So get the len without piping.
Replace usage of dm_hash with radix_tree to quickly find LV name
with a vg and also index PV names with set of available PVs.
This PV index is only needed during the import, but instead
of passing 'radix_tree *' everywhere, just keep this within
a VG struct as well and once the parsing is finished, release
this PV index radix_tree.
This also makes it easier to replace this structure
in the future if needed.
lv_set_name now uses radix_tree remove+insert to keep lv_names
tree in-sync and usable for find_lv queries.
Enhance usage with uniq_insert and also try to better
utilize CPU cache and do a smaller loop for individual
hashing of lvname and separately lvid.
Also correcting usage of 'continue' within validation of
historical names as it should report as much errors
as it can within a loop.
When using radix_tree to identify duplicate entries we may
avoid to call an extra 'lookup()' prior the insert() operation
add radix_tree_uniq_insert/_ptr() that is able to report -1 if
there was already set a value for the given key.
Avoid finding problems in vg_validate when restoring
invalid VG metadata as that would lead to internal error.
i.e. adding unsupported METADATA_FLAG to zero segtype
can trigger such thing.
Previous update needed to add handling segtype within flag.c
which somewhat breaks API separition and also had bug in hanlding
actual flags.
So instead keep segtype code in _read_segtype_and_lvflags() within
import_vsn1.c and handle purly flags in read_lvflags() from const
string.
Instead of duplicating whole segtype string with flags and
using 2 calls read_segtype_lvflags() + get_segtype_from_string(),
merge the functionality into a single read_segtype_and_lvflags().
This allows to make only a local string copy (no allocs) and eventually
to not copy segtype string at all, when there are no flags.
As the 'emit_to_buffer' uses relatively complex
vsnprintf() call inside, try to reduce number
of unnecessary calls and try replace some more
complex string build with a single call instead.
With existing code, the cache was working only to the 2nd. locking.
So i.e. when 'lvs' scans system with more then one VG, the caching
was effectively not working.
Update the code, so the label invalidate code is able to update DM
cache - so whenever we take a new lock - we will refresh the cache.
TODO: the refresh ATM does a very simple compare of old a new list
of cached DM device, and with the first spotted difference, it just
fallback to the full rebuild of DM cache - with large amount of active
devices this might not the most efficient way....
Since we detect 'debug' level after calling 'log_debug()' - all
the arguments are evaluated, so in this case display_lvname() was
preparing a string that is not used in case debugging is not enabled.
So since these string are on 'hot-path' and it's already known
which VG is being worked on, in these few cases just use lv->name.
When processing LVs for a command we stored '*object_id' & '*group_id'
as printable string that was however only used with json reporting.
Refactor code so we simply store there 'struct id*' that is just
converted into printable string when json reporting is really used.
Also check for 'sigint()' right before loop processing begins which
is primary purpose of this test.
This function call is able to setup config parser so it stops
parsing 'subsection' nodes after parsing named section node.
Only nodes at 'level' 0 will be still processed. And this nodes
are found by searching for last \n}\n sequence from the end of
buffer (instead of trying to analyze all the text in buffer).
Replace use of dm_hash with radix_tree when making PV index names.
Store just the index number itself and use pv%d for outf() string.
For lookup up a PV - use just the PV pointer itself, it's faster then
converint for it's ID to UUID format.
Split single check_lv_segments() into 2 separate
versions so they can be called independently.
This allow to 'skip' already checked segment
check after it's been imported to VG and also
avoid another repeated checking when validating
segment with complete vg.
**
check_lv_segments_incomplete_vg()
this check just basic LV segment properties and does not
validate those requiring full VG.
**
check_lv_segments_complete_vg()
Remaining check that expects complete VG is present.
ATM this rather save a lot of unncessary log entries as it grabs
the global autoextend_threshold (profile == NULL) just once instead
of revealing it every time with NULL profile.
Track whether import has even seen segment of LV with log_lv,
and call fixup mirror only in this case.
Also avoid repeated lookup of get_segtype_from_string for
SEG_TYPE_NAME_MIRROR.
Use bigger memory pool chunk size and reduces amount of
memory pool extensions when handling larger metadata, but do not
make it noticable bigger when handling small ones...
Use same large value also when allocating VG memory pool.
Instead of allocating string from a pool, for shorted strings
use buffer on stack since the string after the use in _find_or_make_node()
as no longer needed.
Eventually we may enhance code also for TOK_STRING_ESCAPED and TOK_STRING,
but they appear to be unused for _section().
For the most common part check for '#' when it's known it's not a space.
And also when we checked for '\n' we dont need to check again isspace().
Also help a bit more 'gcc' optimizer to grab buffer char just once and
simplify jump to next characted in the buffer when checking for token.
Avoid double dm_pool allocation call by copying string
for node name and config value directly after the end
of node/value structure.
It would be likely better to not copy these strings at all
and derefence it from the original string however that
needs futher changes in the code base.
This code is faster when calculating crc32 checksum for larger
block areas. There is also SIMD variant present in the code,
however ATM the influence on performance of lvm2 is not that big..
When BLKZEROOUT ioctl fails, it should not stop us from trying the direct
zeroing as a fallback action, since this is an optimization only.
We should be able to continue with new LV creation if we succeed
with that direct fallback then.
Related report: https://issues.redhat.com/browse/RHEL-58737
commit a125a3bb50 "lv_remove: reduce commits for removed LVs"
changed "lvremove <vgname>" from removing one LV at a time,
to removing all LVs in one vg write/commit. It also changed
the behavior if some of the LVs could not be removed, from
removing those LVs that could be removed, to removing nothing
if any LV could not be removed. This caused a regression in
shared VGs using sanlock, in which the on-disk lease was
removed for any LV that could be removed, even if the command
decided to remove nothing. This would leave LVs without a
valid ondisk lease, and "lock failed: error -221" would be
returned for any command attempting to lock the LV.
Fix this by not freeing the on-disk leases until after the
command has decided to go ahead and remove everything, and
has written the VG metadata.
Before the fix:
node1: lvchange -ay vg/lv1
node2: lvchange -ay vg/lv2
node1: lvs
lv1 test -wi-a----- 4.00m
lv2 test -wi------- 4.00m
node2: lvs
lv1 test -wi------- 4.00m
lv2 test -wi-a----- 4.00m
node1: lvremove -y vg/lv1 vg/lv2
LV locked by other host: vg/lv2
(lvremove removed neither of the LVs, but it freed
the lock for lv1, which could have been removed
except for the proper locking failure on lv2.)
node1: lvs
lv1 test -wi------- 4.00m
lv2 test -wi------- 4.00m
node1: lvremove -y vg/lv1
LV vg/lv1 lock failed: error -221
(The lock for lv1 is gone, so nothing can be done with it.)
Detect when we have mixed dos partition with gpt's PMBR partition.
This is not a sane configuration, but detect it anyway, just in case
someone configures such partition layout manually and forcefully and
incorrectly defines one of the partition types to be the GPT's PMBR.
For example:
❯ fdisk -l /dev/sdc
Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 67583 65536 32M 83 Linux
/dev/sdc2 67584 262143 194560 95M ee GPT
Before:
(The partition filter passes even though there's real existing dos
partition - the empty GPT PMBR overrides it.)
❯ pvcreate /dev/sdc
WARNING: PMBR signature detected on /dev/sdc at offset 510. Wipe it? [y/n]:
Wiping PMBR signature on /dev/sdc.
Physical volume "/dev/sdc" successfully created.
With this patch applied:
(The GPT PMBR does not override the existence of the dos partition.)
❯ pvcreate /dev/sdc
Cannot use /dev/sdc: device is partitioned
This provides better hints when trying to resize the fs on top of an LV.
Also needs a3f6d2f593 for proper operation.
❯ lvs -o name,size vg/swap
lv_name lv_size
swap 60.00m
Before:
❯ lvextend -L72m vg/swap
Size of logical volume vg/swap changed from 60.00 MiB (15 extents) to 72.00 MiB (18 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L60m vg/swap
File system swap found on vg/swap.
File system device usage is not available from libblkid.
❯ lvreduce -L50m vg/swap
Rounding size to boundary between physical extents: 52.00 MiB.
File system swap found on vg/swap.
File system device usage is not available from libblkid.
After:
❯ lvextend -L72m vg/swap
Size of logical volume vg/swap changed from 60.00 MiB (15 extents) to 72.00 MiB (18 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L60m vg/swap
File system swap found on vg/swap.
File system size (60.00 MiB) is equal to the requested size (60.00 MiB).
File system reduce is not needed, skipping.
Size of logical volume vg/swap changed from 72.00 MiB (18 extents) to 60.00 MiB (15 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L50m vg/swap
Rounding size to boundary between physical extents: 52.00 MiB.
File system swap found on vg/swap.
File system size (60.00 MiB) is larger than the requested size (52.00 MiB).
File system reduce is required and not supported (swap).
blkid does not report FSLASTBLOCK for a swap device. However, blkid
does report FSSIZE for swap devices, so use this field (and including
the header size which is of FSBLOCKSIZE for the swap) instead to
set the "filesystem last block" which is used subsequently for
further calculations and conditions.
We already detect msdos partition table. If it is empty, that is, there
is just the partition header and no actual partitions defined, then the
filter-partitioned passes, otherwise not.
Do the same for GPT partition table.
New config setting sanlock_align_size can be used to configure
the sanlock lease size that lvmlockd will use on 4K disks.
By default, lvmlockd and sanlock use 8MiB align_size (lease size)
on 4K disks, which supports up to 2000 hosts (and max host_id.)
This can be reduced to 1, 2 or 4 (in MiB), to reduce lease i/o.
The reduced sizes correspond to smaller max hosts/host_id:
1 MiB = 250 hosts
2 MiB = 500 hosts
4 MiB = 1000 hosts
8 MiB = 2000 hosts (default)
(Disks with 512 byte sectors always use 1MiB leases and support
2000 hosts/host_id, and are not affected by this.)
In cases user is sure he is not using his 'rootfs' or 'swap' on LVs
managed with his command - it possible to completely bypass pinning
process to RAM which may eventually slightly speedup command execution,
(however at the risk the process can be eventually delayed by swapping).
Basicaly use this only at your risk...
TODO: add some dmeventd support for this.
Previously, lvmlockd detected the end of the lvmlock LV
by doing i/o to it until an i/o error was returned.
This triggered sanlock warning messages, so use the LV
size to avoid accessing beyond the end of the device.
Previously, every lvcreate would refresh the lvmlock LV
in case another machine had extended it. This involves
a lot of unnecessary work in most cases, so now compare
the LV size and device size to detect when a refresh is
needed.
lvremove of a thin lv while the pool is inactive would
leave the pool locked but inactive.
lvcreate of a thin snapshot while the pool is inactive
would leave the pool locked but inactive.
lvcreate of a thin lv could activate the pool to check
a threshold before the pool lock was acquired in lvmlockd.
The lv_hash wasn't being passed to the seg-specific text import
functions, so they were doing many find_lv() calls which consumes
a lot of time when there are many LVs in the metadata.
While performing udev sync semaphore's inc/dec operation, we use the
result from GETVAL semctl just to print a debug message with current
value of that sempahore, nothing else.
If the GETVAL fails for whetever reason while the actual inc/dec
completes successfully, just log a warning message about the GETVAL
(and print the debug messages without the actual semaphore value)
and return success for the inc/dec operation as a whole.
Clean up udev sync semaphore on fail path during its creation, otherwise
the caller will have no handle returned to clean it up itself and the
semaphore will keep staying in the system. The only way to clean it up
would be to call `dmsetup udevcomplete_all` which would destroy all
udev sync semaphores, not just the failed one, which we don't want.
The same message is printed while performing create/inc/dec operation and
the GETVAL semctl fails. Add a prefix so we know exactly in which of
these functions the issue actually happened.
LV with pvmove_ prefix is not allowed to be created by user
so bigger chance our selected name will never exist.
TODO: probably add code to get generic unused LV name...
Older gcc doesn't really like complex types (buffer, struct) to be
initialized without extra {} around such type.
So pick any other 'single type' var from a struct and set it to 0,
rest will do the compiler without emitting a warning.
Revert 373372c8ab and instead update
our validation code to handle LVs with empty segment - currently
we should need this only for pvmove operation, thus such LV should
have name 'pvmove%u'.
This fixes a problem where user tried i.e. pvmove on a VG with single
PV - as reported: https://github.com/lvmteam/lvm2/issues/148
Reported-by: bob@redhat.com
The .cache and compile_commands.json is used by popular source crawling and
indexing clang tools which in turn may be integrated with source code editors.
We may reuse the .cache directory for for other caches and temporary
files.
The /doc/.ikiwiki and /public are related to the ikiwiki.
Avoid possible udev race - since dmsetup create is
not using the same cookie logic as lvm2 commands,
try to avoid racing on some systems with udev scanning.
The option can be used in multiple ways (like --cachesettings):
--integritysettings key=val
--integritysettings 'key1=val1 key2=val2'
--integritysettings key1=val1 --integritysettings key2=val2
Use with lvcreate or lvconvert when integrity is first enabled
to configure:
journal_sectors
journal_watermark
commit_time
bitmap_flush_interval
allow_discards
Use with lvchange to configure (only while inactive):
journal_watermark
commit_time
bitmap_flush_interval
allow_discards
lvchange --integritysettings "" clears any previously configured
settings, so dm-integrity will use its own defaults.
lvs -a -o integritysettings displays configured settings.
log/command_log_report config setting defaults to 1 now if json or json_std
output format is used (either by setting report/output_format config
setting or using --reportformat cmd line arg).
This means that if we use json/json_std output format, the command log
messages are then part of the json output too, not interleaved as
unstructured text mixed with the json output.
If log/command_log_report is set explicitly in the config, then we still
respect that, no matter what output format is used currently. In this
case, users can still separate and redirect the output by using
LVM_OUT_FD, LVM_ERR_FD and LVM_REPORT_FD so that the different types
do not interleave with the json/json_std output.
In case of different PV sizes in a VG, the lvm2 allocator falls short
to define extended segments resiliently asked for 100%FREE RaidLV extension
and a RAID distinct allocation check fails. Fix is to release a memory pool
on the resulting error path.
Until the lvm2 allocator gets enhanced (WIP) to do such complex (and other)
allocations proper, a workaround is to extend a RaidLV to any free space on
its already allocated PVs by defining those PVs on the lvextend command line
then iteratively run further such lvextend commands to extend it to its
final intended size. Mind, this may be a non-trivial extension interation.
The cmd struct is now required in many more functions, and
it's added as a function arg for most direct dev-cache function
calls. The cmd struct is added to struct device (dev->cmd) so
that it can be accessed in many other cases where dev-cache
functions are being called from places where getting the cmd
struct is too difficult.
The dm devs cache is separate from the ordinary dev cache,
so give the function names distinct prefixes, using
"dm_devs_cache" to prefix dm devs cache functions.
When a PV is stacked on an LV, the PV needs to be
dropped from bcache before the LV is processed.
The LV can be found in dev-cache using its name
rather than the devno.
The list of dm devs was in the cmd struct and had a
different lifetime than the radix trees referencing
those dm devs. Now the list and radix trees are
created and destroyed together.
In the context of dm, 'device' refers to a dm device, but
in the context of lvm, 'device' refers to struct device.
Change some lvm function names to make that difference clearer.
dev_manager_get_device_list() -> dev_manager_get_dm_active_devices()
get_device_list() -> get_dm_active_devices()
device_get_uuid() -> dev_dm_uuid(), devno_dm_uuid()
The comment explained that the ex global lock was just
used to trigger global cache invalidation, which is no
longer needed. This extra locking can cause problems
with LVM-activate when local and shared VGs are mixed
(and the incorrect exit code for errors was causing
problems.)
vgchange -an vg is permitted when the vg lockspace
is not available, because LVs could still be active
for some reason, and they should be inactive when not
properly locked. In case lvmlockd was not running, or
the lockspace was not started, the command was
unnecessarily trying and failing to unlock every LV,
printing errors for every LV. We can skip this when
the lockspace is known to not be available.
vgchange --lockstop will fail with EBUSY if orphan locks in the
lock manager prevent stopping the lockspace. The orphan locks
can then be adopted and released, and the lockspace then stopped
cleanly.
Lock adoption is not part of standard command behavior, but can
be used for manual recovery or cleanup from unexpected failure
cases. Like other lockopt values, they are hidden options for
--lockopt. Different lock managers will behave differently.
Adopting locks with lvmlockd -A1 is more accurate and automatic.
--lockopt adoptls
. for vgchange --lockstart
. adopt existing ls, or fail if no existing lockspace is found
--lockopt adoptgl | adoptvg | adoptlv
. for commands using lvmlockd locks
. adopt orphan gl/vg/lv lock, or fail the lock request if
no orphan lock is found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
--lockopt adopt
. for lockstart or any command using lvmlockd locks
. adopt existing lockspace, or start lockspace if none exists
. adopt orphan gl/vg/lv lock, or acquire new lock if no orphan found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
. with dlm this option only works for ls
Stop printing "Skipping global lock: lockspace not found or started"
for vgchange --lockstart, since it's generally an inherent limitation
that the global lock isn't available until after locking is started.
Update the start delay warning to "a few seconds".
The lvb is used to hold lock versions, but lock verions are
no longer used (since the removal of lvmetad), so the lvb
is not actually useful. Disable their use for sanlock to
avoid the extra i/o required to maintain the lvb.
vgremove with --lockopt force should skip lvmlockd-related
steps and allow a forced vg cleanup, in addition to using
--nolocking to skip normal locking calls.
Previously, a command would call lockd_vg() for a local VG,
which would go to lvmlockd, which would send back ENOLS,
and the command would not care when it saw the VG was local.
The pointless back-and-forth to lvmlockd for local VGs can
be avoided by checking the VG lock_type in lvmcache (which
label_scan now saves there; this wasn't the case back when
the original lockd_vg logic was added.) If the lock_type
saved during label_scan indicates a local VG, then the
lockd_vg step is skipped.
The idea in the patch 6e6d4c62b for handling -suffix as
indication of private device needs to be disabled.
Some problematic cases are currently not resolvable and some
more thinking is needed.
Once fixed, we can revert this patch.
For large device sets our dm_hash can produce larger amounf of mapping
collision and we would need to further increase our has size.
So instead use the radix_tree code which is immune agains growing size
of devices and uses memory more effiecently to store all the paths.
Instead of less efficient 'btree' switch dev_cache to use
radix_tree, that is generating more efficient tree mapping.
Some direct use of btree iteration replace with our dev_iter code.
Convert the persisten filter to use more memory compact radix_tree as
dm_hash is bound to preallocated number of slots and stores whole
key together with value.
When DM uuid cache is available, we can possibly avoid unnecessary
status ioctl() when we check the device for 'usable' uuid.
If this test passes the existing code will got through the full check.
Move the code around caching active dm device devno, name and uuid
from device_mapper/libdm-iface to dev_cache file - as libdm layer
cares about 'decoding' ioctl data from kernel and caching for use by
lvm stays within lvm.
Introduce:
dev_cache_update_dm_devs
dev_cache_get_dm_dev_by_devno
dev_cache_get_dm_dev_by_uuid
Use radix_tree for searching.
Do not attach layer suffix to the UUID when activating component LV.
In this case we want to see allow this LV to be public, thus
such LV should not be using -layer suffix in its UUID.
This also requires that our 'cached' access will check for
both UUID (with & without suffix) which was unnoticed issue before.
This change is now necesssary since our udev rule automatically expects
any LV with -layer suffix is private and will prevent generaring
any systemd unit even when there are no 'DM' flags bits passed via
cookie mechanism while creating such LV.
In order to free SubLVs after a stripe removing reshape, lvconvert has
to be run without layout changes. Prevent a layout changing request
in case any such freed SubLVs exist and inform the user about the fact
requesting to release them first.
Calculates expected size before/after reshapes adding/removing stripes
to/from RaidLVs with levels 4/5/6/10 and compares it with the actual
one the block layer shows. Stripes reshaped to are listed in the
tst_stripes variable. mkfs/fsck/resize2fs the respective RaidLVs
to confirm ext4 can be resized accordingly without issues.
We automatically ignore these devs, when lvm2 create devs,
whoever when lvm2 database is dropped or someone just
created these devs with such formated UUID, there is no
other informantion then to check DM UUID.
SUSE kernels distribution enables CONFIG_PRINTK_CALLER by default.
One line of cat /dev/kmsg is like:
6,904,9506214456,-,caller=T24012;loop8: detected capacity change from 0
to 354304
If CONFIG_PRINTK_CALLER is off:
6,721,53563833,-;loop0: detected capacity change from 0 to 354304
',caller=T24012' is the redundant part needed to be handled.
Otherwise pos will be lager than buf size causing sz underflowed.
Then constructor of std::string will throw a exception to break
tests:
$ make check_local T=shell/000-basic.sh
VERBOSE=0 ./lib/runner \
--testdir . --outdir results \
--flavours ndev-vanilla --only shell/000-basic.sh --skip @
running 1 tests
running: [ndev-vanilla] shell/000-basic.sh
0:00.000Exception: basic_string::_M_create
make[1]: *** [Makefile:148: check_local] Error 1
make[1]: Leaving directory '/root/lvm2/test'
make: *** [Makefile:89: check_local] Error 2
Fix it with strchr for ';' as delimiter to get pos.
Reported-by: Su Yue <glass.su@suse.com>
To more easily adopt radix_tree over existing code base, add
abstraction over 'radix_tree_iterate' which basically builds
an array of all traversed values, and then it's just easy to
go over all array elements.
TODO: code should be converted to use radix_tree_iterate()
directly as it's more efficient.
Note: it can be possibly to rewrite recursive _iterate() usage
to linear travesal, not sure whether it's worth the effort
as conversion to 'radix_itree_iterator' is relatively simple.
Add simple 'wrapper' inline functions to insert or return ptr lookup value.
(So the user doesn't need to deal with 'union radix_value' locally and
also it makes easier to translage some lvm2 functions to use radix_tree).
Note: If the stored 'value' would NULL, it cannot be recognized
from a case of 'not found'. So usable only when 'values' stored with
tree are not NULL.
Some updates to _dump() debugging function so the printed result
can be more easily examined by human.
Also print 'prefix' as string when all chars are printable.
Add 'simple' variant of _dump also to 'simple' version of radix tree
(which is however normally not compiled).
Instead of using 'key state & key end' uint8_t* switch to use
void* key, & size_t keylen. This allows easier adaptation with
lvm code base with way too much casting with every use of function.
Also correctly mark const buffers to avoid compiled warnings and
casting.
Adapt the only bcache user ATM for API change.
Adapt unit test to match changed API.
Fix commit b65a2c3f3a "vgimportdevices: skip lvmlockd locking"
which intended to disable lvmlockd locking, but the lockd_gl_disable
flag was mistakenly set after lock_global() so it wasn't effective.
This caused vgimportdevices to fail unless locking was started.
Enums are single 'values' so not a proper type for bitfields.
(Probably better to use such values as defines).
Although here 'daemon_talk()' is part of library API, it's hidden
non-public API call - and moreover 'enum' and 'unsigned' are
using the same size, so linker shouldn't have any issue with
this symbol usage.
For this reason there are no 'versioning' tricks applied.
Size of these hashes was quite small, so raise the size of
hashed entries to reduce amount of hash collistion.
Select some unique/unused number for hash_create below 8192.
Replace call to get_dm_uuid_from_sysfs() with use of
device_get_uuid() which gets the same information,
but instead of several syscalls it need either 1 or even 0
when the information is cached with newer kernels.
We've got cached DM list before grabbing lock, so there
is some chance, that DM table has changed and we would
need to refresh this info.
TODO: benchmark, whether it would even make sense to refresh cache
and keep it content instead of using individual ioctl() for tree build.
Function that is working with DM target is located within
lib/activate directory.
This function is able to use cached dm_device_list when possible
to quickly resolve checks for device's UUID.
Function can fully replace get_dm_uuid_from_sysfs() and instead
of syscalls for open/read/close get the UUID with single ioctl.
When there is cached dm devs list, we can get many UUID from
a single syscall.
For better use of cached data located within cmd_context,
pass this structure from the top level function.
Also add missing '_' for static _dev_cache_index_devs.
No other change here.
Introduce function to find device's name and uuid for
a given major:minor.
This information is cached with dm_device_list which reads all the
info from single ioctl(DM_DEVICE_LIST).
Lvm keeps major:minor name & uuid for active devices in the system.
Skip scan and stat() for dirs and nodes within known /dev/ paths,
where no block devices are located.
Also strlen(_cache.dev_dir) just once.
TODO: add more dirs to _no_scan (configurable via lvm.conf ?)
Just like with _VAL strings, also _ARG strings do not need to
be present - as we can easily check for LONG opt version just
by adding attribute.
With attribute ARG_LONG_OPT string arg name[] becomes unused
and can be safely removed.
Also within _find_command_id_function() we do not need to handle
'command_enum == CMD_NONE' as separate case and just use single loop.
Previous commit 82617852a4
introduce bug in complession - as the rl_completion_matches()
needs to always advance to next element where the index
is held in static variable.
Add comment about this usage.
When PVs are created on LVs, remove the devices file entries
for the PVs when the LVs are removed. In general, the devices
file entries should be removed with lvmdevices --deldev when
the LVs are removed (lvremove is the equivalent of detaching
a device from the system when layering PVs on LVs.)
This change is effectively an automatic lvmdevices --deldev
command that is built into lvremove when the LV has a PV on it.
An OS installer can create system.devices for the system and
disks, but an OS image cannot create the system-specific
system.devices. The OS image can instead configure the
image so that lvm will create system.devices on first boot.
Image preparation steps to enable auto creation of system.devices:
- create empty file /etc/lvm/devices/auto-import-rootvg
- remove any existing /etc/lvm/devices/system.devices
- enable lvm-devices-import.path
- enable lvm-devices-import.service
On first boot of the prepared image:
- udev triggers vgchange -aay --autoactivation event <rootvg>
- vgchange activates LVs in the root VG
- vgchange finds the file /etc/lvm/devices/auto-import-rootvg,
and no /etc/lvm/devices/system.devices, so it creates
/run/lvm/lvm-devices-import
- lvm-devices-import.path is run when /run/lvm/lvm-devices-import
appears, and triggers lvm-devices-import.service
- lvm-devices-import.service runs vgimportdevices --rootvg --auto
- vgimportdevices finds /etc/lvm/devices/auto-import-rootvg,
and no system.devices, so it creates system.devices containing
PVs in the root VG, and removes /etc/lvm/devices/auto-import-rootvg
and /run/lvm/lvm-devices-import
Run directly, vgimportdevices --rootvg (without --auto), will create
a new system.devices for the root VG, or will add devices for the
root VG to an existing system.devices.
lvm2 for a while already optimizes 'vgremove' operation to
a single commit when possible if all LVs can be
easily deactivated.
So the number of LVs doesn't matter much - but the tested
case 'test_delete_non_complete_job' seems to be taking
some time anyway to capture the exception.
So just reducing the running time of the test significatanly
as we don't need to create 64LVs for 4 'execution mode' runs.
Order is used for man page generation (although not completely).
So place 'zero & error' to the end of list.
Keep linear,striped,snapshot in front.
For the rest use alphabetic order.
Move part of the 'inner' loop which is would be otherwise
always production same results for all 'opt_enum' values
out of the loop, so it can be evaluated just once.
Instead of resolving and storing 'command_fn'
withing 'struct command' use just funtion enum
and resolve function pointer just in place,
where it is really needed - first try to resolve
'new style' and fallback to 'old style' named.
Instead of storing command_id as string, direcly
translate string to enum index and use 'command_enum()'
to get string when needed for printing.
This way we can easily detect error in the structure
while parsing it - and we can later avoid separate
'translation' loop.
We can more efficiently use command_name struct to
lookup for lvm_command_enum and avoid many repeated
command name searches since we already know
the enum index that is now stored in 'struct command'.
Constity members in cmdline_context, would be nice, to replace
this static struct with couple function calls.
Also replace some 'while' loops with for loops, so code
is more readable.
Split struct command_name to the constant part (keep the name)
and new 'struct command_name_args' which holds runtime computed
info. To get to the _args part - we can easily use
lvm_command_enum as equivalent index.
Constified part 'struct command_name' is now fully stored
in .data.rel.ro segment, while command_name_args part goes
to .bss segment.
Code will be further reduced with next refactoring.
Refactor code to not allocate memory for rule->opts,
instead use uint16_t array of MAX_RULE_OPTS within cmd_rule.
Also print more info if array would not be enough (>= 8).
Add slight delay for waiting until 'started' dmeventd is
responsing to other 'dmeventd -i' command.
TODO: race observed here is somewhat unclear, might need some more
details
There is no reason to wait for 10sec when removing 'vg' at test
exit - we just need to kill 'sleep 10' process forked from flock.
(Not using 'flock -F' as this flag is not across all machines used
for testing)
This reverts commit d16a8f80e9.
So the correction was OK, however we missed to fix also
cut&paste bug here - as the second check should be
actually checking field->type.
Update definitions to add support for some more VDO options
when converting volumes to be used as thin-pool with vdo data volume.
Split some option in existing OO_LVCONVERT_VDO to OO_LVCONVERT_VDO_POOL
and reused then with OO_LVCONVERT_THINPOOL.
When convert already existing vdopool to be used as
thin-pool backend and user is passinng option for VDO configuration
process them - as we know converted LV is offline, we can do such
change easily instead of telling user to run separate lvchange later.
When converting volumes to be used for thin-pool with VDO, allow
users to control wipesingature behaviour.
By default volumes should be checked against signature, and if
they are present, we promt user whether he wants to process with
conversion and lose i.e. filesystem present on such volume.
Users that want to bypass prompt in script can use either --yes
or they can disable wipe signature -Wn.
Resolve reported leak by renaming existing pckk() to pvck_mf()
and wrapping pvck() over this. This function can easily
free allocated buffer within the subfunction.
When we switch supported_reserved_types_with_range to const
gcc repots this problem:
warning: ‘and’ of mutually exclusive equal-tests is always 0
!(iter->type & supported_reserved_types_with_range))) {
It's not clear from the history what was the actual intention of this
internal error test, let's assume the check wanted to make sure
that when DM_REPORT_FIELD_RESERVED_VALUE_RANGE is set,
some other fields from supported_reserved_types_with_range set
are also selected.
This was rather API mistake - the internal of libdm
do handle suffixes list as const string, just the API
was only using 'const char **'.
So the user may pass safely casted 'const char * const`.
man-generate --check actually noticed 2 same definitions
for snapshot create with 'lvreate -T [--snapshot]'
and 'lvcreate --snapshot [-T]'.
So drop the '-T' from second alternative variant as
thin type is already implied here.
Refactor code so the definitions may become 'static const'
and with configure_command_option_values() we update options
val_enum for actually running command option when used.
Also update _update_relative_opt() which is used for
generating man pages and command help.
Introduce enumeration for lvm2 commands - so we may
use enum cmd_COMMAND instead of string checking.
So running command now does not modified opt_names.
Function to _allocate_memory() was not compiled-in when lvm2 was
build with support for better tracking memory pool with valgrind.
Instead now correctly avoid this function only when running
withing valgrind environment.
Do not modify flags field from 'strcut command_name' and
instead control this via cmd_context get_vgname_from_options.
Flag GET_VGNAME_FROM_OPTIONS is currently used only by lvconvert.
Create leading dirs for lvmdbusd lock file, otherwise it fails to start:
| systemd[1]: Starting LVM2 D-Bus service...
| lvmdbusd[1602]: [1602]: Error during creation of lock file(/var/lock/lvm/lvmdbusd): errno(2), exiting!
Signed-off-by: Kai Kang <kai.kang@windriver.com>
In the past it was common for a single command to run
multiple lvmcache_label_scan, and this code was a way
to make each call select the same duplicate pvs. Now
commands run a single lvmcache_label_scan, so this is
not needed.
When reseting stream buffer - check for being run within valgrind
and only in this case skip this code.
Define VALGRIND_POOL was incorrectly used for this logic.
We can run and build this code just fine with VALGRIND_POOL
and just rely on runtime detection - so we do not close
descriptors also within running valgrind session.
Old system do not work well with -l% in findstring,
so use a different strategy to recognize whether -lreadline needs
another library (-ltinfo, -lncourses...)
(So we don't need to solve this via 'configure')
Also for now comment out -Wl,-z,pack-relative-relocs and leave it
for the package maintainer whether they want to use.
Possibly add some 'configure' autodetection for usable switch,
as it's relatively new feature..
Add some -Wl flags separatly and avoid their duplication.
Also add --as-needed only when system is using 'newer' readline
library - on these older system the usage of '--as-needed'
seems to be causing some hard to solve problem - so avoid it.
Attach -Wl,-z,relro,-z,now,-z,pack-relative-relocs,--as-needed
to LDFLAGS, but only if LDFLAGS already doesn't contain 'relro'
(so it's not given repeatedly).
Also start to use -z,now linkage also when building libraries
with default compilation - this avoid calling symbol resolver
while library function are using function needing resolving.
Note: Fedora or RHEL rpm building is using:
CFLAGS=$(rpm --eval %{build_cflags})
LDFLAGS=$(rpm --eval %{build_ldflags})
Also split -DUSE_SD_NOTIFY into DEFS from CFLAGS.
Use LDFLAGS separately with every use of CLDFLAGS and leave
this flag only for handling versioning.
This will reflect any LDFLAGS setting use during make.
Look not only for whole 64byte sequence,
but seek also 32byte, 16byte and 8byte parts of the
key.
Currently to pass memcpy ZMM problems add possible
workaround in the form of GLIBC_TUNABLES setting.
Avoid using 'lvs' from 'get' shell - as that would wait until
whole group of processes is finished.
TODO: rethink what would be the point of starting 'dmeventd' with lvs.
It seems to break some rules.
Refactor _start_daemon and add synchronization delay waiting
untill new forked dmeventd actually open fifos and is ready
to process messages.
This closes some race window in testing.
When no lock manager for the global lock had been set yet,
and the first global lock request was received, and both
dlm and sanlock were running, lvmlockd would assume it
should use the dlm for the global lock, and would start
the "lvm_global" dlm lockspace automatically. That's
not always correct, so don't assume the dlm should be used,
fail the lock request, and wait until a VG with a specific
lock type is started to determine the lock manager to use.
Detect also presence of 'vdoformat' tool.
Fallback to 'kvdo' modprobe only when dm-vdo fails
(removed ugly error message from log)
Also add extra check for scsi model being present
so the test can wait a bit if 'scsi_debug' takes longer time.
No need for 'aux' within aux.
Refactor existing code from tools/lvmcmdline.c to
libdaemon/server/daemon-stray.h daemon_close_stray_fds()
used to close stray descriptors above some specified Fd.
This is code parses content of /proc dir to minimize 'blind' closing
of all possible descriptors within rlimits range.
As we have the same code in few other places in it's more 'trivial'
version - these were actually sensitive to high amount of descriptors,
which might be configured on some system.
With this patch we effectively resolve this reported gitlab issue:
https://gitlab.com/lvmteam/lvm2/-/issues/5
TODO: Current placement might not be ideal - however considering
existing code base constrains it's not so simple.
ATM it uses lib/misc/lvm-file.h for custom_fds declaration
and rest of functinality is included in daemon header file.
Let's move proc into include/configure.h so this define can
be easily used across the source base.
Configure defines it - but ATM we do not provide any configure
option to change it - there should be no need to ever change it.
Handle the case, where we 'kill -9' running dmeventd,
and restarting 'dmeventd -R' manages to still contact this
'zombie' damone and manages to get list of monitored devices
out of this and eventually 'run' new and in this case
unexpected instance of dmeventd.
Some test do expect event_activation to be set.
So add explicit configuring of this setting in tests,
but also add new default which kind of does it globally
as it's expected default (yet our testing rpms might
be create with disabled event_activation)
By adding this to each test individually - it's now easy
to locate such tests...
Move static to main() local vars.
Initilize libdm logging also before starting _restart_dmeventd()
so function there can be also logged.
Move call of _info_dmevent() out of option processing - so some
options like -V are processed with higher priority.
Use _restart_dmeventd() with return values 0,1,2.
Also let's use already created fifos struct.
Make sure the _systemd_activation variable is properly initialized
from _systemd_handover before calling _restart_dmeventd() as
this variable is used there to decide where 1 or 2 should
be returned - aka either letting systemd to initilize
or restart dmeventd itself.
This is initial test how to disable event_activation on
machines, where lvm2-testsuite packages are installed
with its lvm.conf file.
TODO: find more elegant mechanism
Occasionally this test fails as soemtimes UUID actually
may constain LV[d] string causing it to grep mismatch
UUID and LV name and eventually fail test for wrong reason.
As a simple workaround print the LV name first and
check the name is followed by a space character.
This code is somewhat complex and involves recursion and pointer
shuffling which confuses coverity here.
Let's add masking comment for this warning as there is no double
free in this code.
Let's stop Coverity thinking here we are using freed FILE*
for anything else then comparing numbers.
For this use the original source of old_stream pointer.
Add new configurable option for building lvm2 with enable/disable
default autoactivation setting.
Might be useful for building i.e. rpms for systems where
this event_activation is not desired.
Parse timestamps included in kernel kmsg line and add current system
time to the messsage as well - this makes it easier to look over,
when chasing some messages in journal after some time...
Before initiation fifo communication, check whether there is
running dmeventd - in case there is no running instance, this
would be just blocked for 5 seconds trying to open fifo.
Also awk got annoyed by this \# char sequence which is reported
as incorrect, however older rpm builder were failing without this.
So let's just try other variant.
Thin-pool and cache-pool targets got already quite stable so let's try
to remove checking of pools when using lvremove or vgremove commands.
This skips checking pools when they are going to be removed, but it
also when removing thin volume that was the only user of a thin-pool.
In this case thin-pool will be still there and could be activated
again with another thin volume and thin_check will be executed
in this moment. In this case it can delay discovery of metadata damage.
Shortens processing of 'lvcreate -L -V -T' command and
avoid deactivation and its activation with thin_check of the empty
created thin-pool that will be used for the new thin volume
made with a single lvcreate command.
Shuffle code to parsing VDO message also for lvs segment status
so it can report correctly data usage for VDO LVs.
For this change move code and also change its API to use just mempool.
Fixes usage with upstream 6.9 vdo target driver.
When using message API for parsing VDO stats info, 0 was wrongly
used for fallback for trying the old sysfs API.
Switch to use ULLONG_MAX for values that could not have been obtained
through the message call.
Fixes lvdisplay info for freshly created VDO volume with 0 used data
blocks.
Make a seperate function to decode which ID should be user
for cvol meta or data volume - also avoids duplication of code.
As a result it's now also easier to see how the lvid is build.
Fix cutting away 1 character via incorrect usage of dm_strncpy
introduced in last batch of commits and use sizeof(buffer) to
get proper sizes.
In case of use strcpy_name_len() the replacement was invalid,
so we need to restore this case since sanlock uses buffer without
nul ending, so we would strip one more character from the buffer.
Also start to use dm_strncpy() without (void) for unchecked usage.
Add internal inline function wrapper for dm_strncpy().
Use it for calls where we test the result.
Avoids emitting warnings in Coverity for unchecked usage.
This sanity check actually confused in some way Coverity
giving it some assumption about array access.
Since these two checks basically validated compiler's capability
to add and then substract the number from char pointer we likely
don't really need them - as if this base math would not work,
compiler would be having far more troubles...
So drop them - and get rid of report:
Event overrun-call: Overrunning callee's array of size 513 by...
Reduce #lv and intervals.
We are trying to ensure that the daemon stops while it's busy processing
its internal queues. Decrease the number of LVs and intervals to allow
this test to complete in less time.
Add a global timeout value to be used for the threads to end waiting for
whatever it is they are blocked on. The values varied from 2-5 seconds,
which is way longer than needed. Value of 0.5 shows no CPU load when
service is running and is idle.
Shuffle code to avoid using static variable to pass parsed option.
Code is now easier to follow and also number of coverity reports
will go away.
There should be no functional change.
Add debug tracing for syscall failures.
Also switch some log_error to log_warn when command does not exit
with 'error' result and only warns user.
Easier error path handling.
Initialize some vars at declaration time.
Here we actually need to slowdown only $dev2 - since repair operation
is only reading data from this device and compares it with origin $dev1,
and if they match there is no write...
Move memset() to the initialization function define_commands().
There is also not much point in clearing memory on command's exit
so drop zeroing of ~2M of RAM.
Use \0 as EOL in compiled-in syntax description to avoid
unnecessary line copy that just replaced \n with \0.
Also use already splitted lines when possible.
Handle mismatch of reported 'dm raid' status, where the active
raid LV can be actually showing higher numebr of raid leg devices,
that the number of shown status characters.
This can happen if the raid leg is dropped during initial
resynchronization.
Setting db_persist is required for dm devices so that their properties
are carried over on switch-root from the initrd to the rootfs. This
logic has always lived in dracut
(https://github.com/dracutdevs/dracut/blob/master/modules.d/90dm/11-dm.rules).
However, this means that other initramfs generators each have to
implement and maintain the same rule which leads to unnecessary
duplication.
Instead, let's make the rule part of the upstream lvm rules, which
will ensure that generated initramfses will just work if they make
sure the lvm udev rules are installed, without having to figure out
that they have to add an extra rule themselves on top.
Identical rule in Arch Linux's lvm2 package: https://gitlab.archlinux.org/archlinux/packaging/packages/lvm2/-/blob/main/11-dm-initramfs.rules?ref_type=heads
For DM devices, the add/change/remove can appear as action for genuine
udev events.
However, there are more action types (bind, unbind, move, online, offline)
which never appear as actions for genuine DM udev events, but they can
still be synthesized (e.g. by writing "<action>" to "/sys/.../uevent" file
or by calling "udevadm trigger --action=<action>").
Let's also process these extra action types so that the udev-related content
is not lost completely, keeping all the symlinks and udev db entries just like
this was a synthetic udev event with "change" action.
Related to https://gitlab.com/lvmteam/lvm2/-/issues/4.
Bump the rules version in order to indicate that upper level rules
should consume DM_UDEV_DISABLE_OTHER_RULES_FLAG rather than DM_NOSCAN
and DM_SUSPENDED.
Also update the comments at the top of the file that describe the
exported properties, and add a note about internal device-mapper
properties.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
DM_NOSCAN is not an official API any more and doesn't have to be
restored from the udev db. Rename it to .DM_NOSCAN.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
DM_SUSPENDED is a device-mapper internal flag, which is not intended to be
used by other rules, and which is determined by 10-dm.rules from sysfs for
every uevent. Rename it to ".DM_SUSPENDED", so that it won't be saved in the
udev database.
Known consumers of DM_SUSPENDED are 66-kpartx.rules (from multipath-tools) and
99-systemd.rules (from systemd). These will have to be adapted.
11-dm-mpath.rules will be changed to use .DM_SUSPENDED.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
DM_UDEV_DISABLE_OTHER_RULES_FLAG is used as the "output" flag of the
device-mapper rules, to be consumed by non-dm rules. It is a logical OR of
several conditions that might make dm devices inaccessible. 10-dm.rules
calculates it for every uevent, whether it's genuine or spurious.
DM_SUBSYSTEM_UDEV_FLAG0 is just another flag that needs to be or'd in. We
don't need to restore the previous state of DM_UDEV_DISABLE_OTHER_RULES_FLAG.
Actually, doing so is wrong if the flag has previously been set because the
device was suspended, and the device isn't suspended anymore.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
We use DM_UDEV_DISABLE_OTHER_RULES_FLAG to tell upper non-DM layers
to keep their hands off the device in question, for any reason.
One possible reason is that the device is supended; another is that
the cookie carries the flag of the same name.
DM_SUSPENDED is not restored from the db, but evaluated anew for every
uevent. Therefore DM_UDEV_DISABLE_OTHER_RULES_FLAG shouldn't be
restored, either. Use a new variable DM_COOKIE_DISABLE_OTHER_RULES_FLAG
to save and restore the original value from the cookie.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
DISK_RO is set in the environment of a block-device uevent if and only if
the read-only (ro) attribute of the device just changed (the kernel
function set_disk_ro() was called). It is not synoymous with the "ro" sysfs
attribute; the device could very well be write-protected if DISK_RO is not
set. Device mapper-level probing is possible for DISK_RO events, but it makes
little sense, because the device propreties haven't changed as far as dm is
concerned. But we should import possible previously set device properties
to avoid confusing follow-up rules. We should do this for both DISK_RO=1
and DISK_RO=0 events.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
ID_FS_TYPE is the most important udev property for most follow-up
rules. It must be imported from the udev db if blkid can't be run.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Peter Rajnoha <prajnoha@redhat.com>
When using cached LV with cachevols (so not with cachepool),
the loaded table could have been using more then one mapping line
for sub devices - resulting into data corruption in some cases
when i.e. taking snapshot of such cached LV with and instead of
single line - 2 lines were generated into DM table as the code
skipped protection again repeated addition.
vg-fast_cvol-cdata: 0 16384 linear 253:2 16384
vg-fast_cvol-cdata: 16384 16384 linear 253:2 16384
New code is also refactoring to use _add_new_cvol_subdev_to_dtree
(similar _add_cvol_subdev.. ) and also the addition of subdev has
been moved after check for already processed node.
Also the cachevol sub devices are now added with the insertion
of cachevol with cached LV.
Document usage of chaining external origins as it is now possibly to
create thinA LV in poolA which use thinB LV in poolB as it external
origin and user could create chain of such LV.
When using swap_lv_identifiers() we were basiclly exchanging 'names'
and only according to the caller some more data were 'transfered'.
However in most cases we should swap properties like 'hostname' as
the creation information should be preserved.
So let's do the function more universal.
Improve support for building DM tree when there is a chain
of external origins used for LV.
For this we cannot use track_external_lv_deps as this works
only for LV with just one external origin in its device tree.
Instead add directly 'dev' to the instead of add whole LV.
This avoid possibly recurive endless loop, however we may eventally
have some problems with undiscovered/missing devices in DM tree.
When we have some existing LV and this LV is being converted to
external origin - during the DM table manipulation there is a short
moment when the LV is being 'resumed' as 'read-only' volume
while still being live as 'rw' volume i.e. we could have had
a single thin LV active twice.
To avoid such weird scenarios of dual access to a same volume, we
just postpone a resume until a moment, where the existing volume
is already suspended thus no I/O can be in flight to such device.
Note: however there is slight catch - that we now have basically
a different 'risk' case where a resume of such i.e. new external
origin LV might fail and we are already in suspend tree state -
resolving error path in this situation is untrivial as well...
Fix/support creation and usage of the external origin
across thin-pools - so thin LV can use thin LV from
some other thin-pool as external origin (read-only).
When creating external origin via 'lvcreate --type thin'
add the validation for LV being usable as external origin
since certain LVs cannot be really used this way.
Also call this function early during lvcreate cmdline arg
validation se we do not need to do unecesary operation.
Over the time the code for preloading detached LVs got unnecessarily
complicate. But actually we need to preload only LVs that
were previously non-toplevel (invisible) LVs and became visible
toplevel LVs in the precommitted metadata.
If there would be needed some other rule, it would likely be a bug in
conversion code forgetting to set visibility flag on detached LV.
This reduces number of unnecessary repeated DM tree preloading.
When building in other dir, ensure srcdir is used to find helper
script (via vpath) through $< usage.
Add also missing [INSTALL] prints for some installed files.
External origins for thin volumes can be also used at the same time
as old(thick) snapshot origins. However in this case it's possible
the LV is only active as being 'external' origin, but old snapshot LVs
are not active. For this case before handling these
LVs for un/monitoring check the active state of origin LV.
This should prevent warnings of monitoring failures.
Make recursive directory path creation reusable via
dir_create_recursive.
While we already have dm_create_dir() - it's not taking mode arg,
so let's make lvm's internal file helper function.
Instead of parsing the whole /proc/kallsyms use faster variant
of using modprobe tool logic.
lvm2 here wants to know whether the particular DM cache policy is
present in the kernel - however since the cache policy does not have
any kernel module parameters and it can be built-in to a kernel
there is no /sys/modules directory in such case and we would need to call
modprobe everytime we want detect such case.
The old solution tried to look for particular kernel symbol
(and like not the right way, as smq_exit might be actually ommitted).
New version checks MODULES_PATH/`uname -r`/modules.builtin for
whether is present cache policy module instead of CPU expensive parsing
of kallsyms.
If lvm.conf has use_devicesfile=0 and /etc/lvm/device/system.devices
exists, then rename it to system.devices-unused.YYYYMMDD.HHMMSS.
This prevents an old, incorrect system.devices from being used in
the future if lvm.conf is changed to use_devicesfile=1.
Create backup copies of system.devices in /etc/lvm/devices/backup
named system.devices-YYYYMMDD.HHMMSS.NNNN. NNNN is the version
counter from the file.
Each time that an lvm command writes a new system.devices file,
it also writes the same file in the backup directory.
A new comment line is added to system.devices with HASH=<num>
where <num> is a crc calculated from the uncommented lines in
system.devices. This lets lvm detect if the file has been
modified outside of lvm itself.
If system.devices is edited directly, the next time a command
reads the file, the crc will not match the HASH value. The
command will then rewrite system.devices with the correct HASH
value, and create a backup reflecting the edits.
A default limit of 50 backup files is kept, configurable by
lvm.conf devicesfile_backup_limit (set to 0 to disable backups.)
If LVM LVs happen to contain PVs, they are passed to the lvm udev
rule for processing, where they should be ignored. PVs on LVs
most likely belong to VM images, and don't belong to the host
which sees the LV. It's unsafe for the host to use these PVs.
Without this change, the LV would be processed by pvscan which
would generally ignore it, either because of the devices file,
or because of the default lvm policy to not consider LVs as
potential PVs. This change makes the udev rule consistent
with that policy and avoids the unnecessary system messages
produced when pvscan ignores the LV.
Since understanding the reason for choosing the appmachineid over the
direct use of machineid is not easily found, I extended to help text to
clarify this a bit.
Function to recalc chunk_size according to dev hints needs to be
used after chunk_size is being set to thin pool segment - correct
this ordering mistake introduced in previous refactoring commit.
Previous patch that introduced support for thinpool with vdo
not correctly handled header size - as this part is not fully usable
yet. We are going to try to use the 0, but current state of code is not
yet compliant to this logic so keep vdo_header_size during conversion
and alos correctly pass through virtual_extents to vdo formating.
Add code to handle creation of thin-pool with VDO data backend
which can be seen as compressed deduplicated thin-pool.
To avoid need of changing to many internal APIs, pass the conversion
parameters for create thin-pool data volume via cmd_context.
Introduce struct vdo_convert_params {} to pass-in all the parameters
needed for the conversion of an LV to a vdopool + vdo LV.
Function convert_vdo_lv() is also able to create a new LV and swap
segments, so the passed in LV can be later on use for futher
conversion so this refactoring makes it ready for more enhanced
usage.
Introduce vdo_convert_params and use vdo_params from this structure
also with lvcreate_params.
Later we will use this for convertion of thin-pool data volume to VDO.
If a RaidLV mapping is required to be refreshed as a result of temporarily failed
and recurred RAID leg device (pairs) caused by writes to the LV during failure,
the requirement is reported by volume health character r' in position 9 of the
LV's attribute field (see 'man lvs' about additional volume health characters).
As this character can be overlooked, this patch adds messages to the top
of the lvs command output informing the user explicitely about the fact.
Fix typos in previous commit 3589e515d.
Both commands default [raid_](min|max)recoveryrate to 0 but ensure
min_recovery_rate is not larger than max_recoveryrate. This results
in command failure without requesting the user to also define
max_recovery_rate >= min_recovery_rate.
Fix both commands by defining max_recovery_rate = min_recoveryrate
in case "lvcreate/lvchange --minrecoveryrate Size ..." requests a
larger value than current maxrecoveryrate without also giving option
Both commands default [raid_](min|max)recoveryrate to 0 but ensure
min_recovery_rate is not larger than max_recoveryrate. This results
in command failure without requestinng the user to also define
max_recovery_rate >= min_recovery_rate.
Fix both commands by defining max_recovery_rate = min_recoveryrate
in case "lvcreate/lvchange --minrecoveryrate Size ..." requests a
larger value than current maxrecoveryrate without also giving option
"--maxrecoveryrate Size ..." with a size greater or equal than min.
The first lv_attr flag is 'i' or 'I' for a raid image.
(i: raid image, I: out of sync raid image)
For integrity raid images (_iorig), the flag was not being set.
pvs -A|--allpvs
Show PVs that would otherwise be excluded by the devices file.
pvscan -A|--allpvs
Show PVs that would otherwise be excluded by the devices file.
For those devices that are included by the devices file,
their device ID is displayed in place of the usual "lvm2"
format and size.
(pvs -a|--all is unchanged, and shows devices not formatted as PVs.)
A pvid string read from system.devices could be less
then ID_LEN since system.devices fields can be edited.
Ensure the pvid buffer is ID_LEN+1 even if the string
read from the file is shorter.
Include info in the temp file to confirm that it should be used.
The temp file is meant to suppress repeated, identical searches
for the same PVIDs on the same set of devices. Write to the file
a count and hash of the missing PVIDs and a count and hash of the
devices to search. A subsequent command will ignore and remove
the temp file if any of these values differ. We don't want to
suppress a search if a change has occured, and a missing PV could
be found by scanning devices.
Problematic scenario:
. the device for a PV has no wwid, so it's identified in system.devices
with IDTYPE=devname IDNAME=/dev/foo
. user adds/enables a wwid for the device
. on reboot, the device name changes, e.g. now /dev/bar
. the code that searches for the new device name includes an
optimization to skip looking on devs that have a wwid, on
the basis that a device with a wwid won't have IDTYPE=devname
. this optimization causes lvm to not look for the PV on /dev/bar
since that device now has a wwid, so the PV is not found
. the optimization is enabled by search_for_devnames="auto"
. change the default to search_for_devnames="all" which does not
use the problematic optimization
- add new comparison between old and new entries, and use this
as the basis for new dedicated output for check and update
- add new --refresh option to search for missing PVIDs on all
devices, and possibly update the device ID
- internally, only use the term "refresh" for cases where a
new device ID may be found and assigned for a missing PVID
During vdo testing with smaller block devices the test needs to determine
which PVs in a VG are unused. This was leveraging LvCommon.Devices, but
that isn't populated when a LV is resides on a LV pool instead of a PV.
This is a known limitation of the code at this time. For now we will walk
all the PVs in the VG looking for ones that don't have any associated LVs
and use them instead.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
When getting raid status from some older kernels, we may get an 'a'
for 'A' leg when doing initial synchronization.
This may prohibit removal of newly synchronized leg until synchronization
is finished.
So in this case change the status to look like being reported
from a newer kernel version.
Incorrectly matching a dev to a devname id (due to changing devnames)
before matching the dev to a proper device id, can result in the
dev not being matched to the real id.
When reporting in JSON format, we need to be able to find the 'last
displayed row', not just 'last row' as we did before. This is used
to decide whether to put the JSON_SEPARATOR (the ',' character)
between the lines when reporting in JSON format.
This is mainly important in case we use a combination of JSON format
and a report marked with DM_REPORT_OUTPUT_MULTIPLE_TIMES flag.
Such report may be reused several times with different selection
criteria each time. In that case, the report always contains all lines
in memory, even though some of them do not need to pass the selection
criteria that are currently used.
Without DM_REPORT_OUTPUT_MULTIPLE_TIMES flag, the report only contains
the lines that have passed the selection criteria, so the this wasn't
an issue in this case.
Fix suggested by Lars Ellenberg and reported here: https://github.com/lvmteam/lvm2/issues/130
It looks like there is some kernel bug/limitation
that may cause invalid table load processing:
dmsetup load LVMTEST-LV1
device-mapper: reload ioctl on LVMTEST-LV1 failed: Invalid argument
md/raid:mdX: reshape_position too early for auto-recovery - aborting.
md: pers->run() failed ...
device-mapper: table: 253:38: raid: Failed to run raid array (-EINVAL)
device-mapper: ioctl: error adding target to table
However ATM there is not much we can do then make delays bigger.
TODO: fixing md core...
With commit d7e922480e
lvconvert -m may fail if we try to remove 1st. leg that
is out-of-sync while other leg is in-sync.
Hot fix allows to proceed with such down conversion.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Fixes commit 63b469c160
"device_id: fix hints with device ids"
It's not correct for internal filtering to be a factor
in validating the set of devs that are the basis for hints.
Instead, look for inconsistencies between a hint entry and
the corresponding devices file entry.
Extend the test a bit futher so we can keep logic of resize
working similarly well for older and newer systems.
Test uses new 'aux have_fsinfo'function to regnize compiled
version of lvm.
Apply the same logic for 'lvreduce' which exists for newer
systems (compiled with HAVE_BLKID_SUBLKS_FSINFO)
also for older systems for one very common practical case where
the active LV does not have any blkid known signature/filesystem.
New variant recognized this situation and allowed to proceed
without requesting a prompt, while the older variant always
requested confirmation prompt.
With this patch command now works equily for both variants
for 'active LV' without signature and allows to reduce LV
without prompting.
Simplify configuration with enabling options:
- For --enable-dmeventd automatically --enable-cmdlib.
- For --enable-dbus-service auto --enable-notify-dbus.
Fix check linux/fiemap.h for dmfilemapd.
Before checking seg_type of the first area, check there is
some existing area.
Since we now support error and zero segtypes, these do not have
any PV area present.
When new app links with current dm_task_run() that propagates ioctl()
errno - ensure it will not be usable with older version without this
supported feature.
Introduce couple configure options:
--without-blkid
--without-systemd
--without-udev
These can be useful for build with --disable-shared
where those libraries are often distibuted without
their static libraries.
However such compiled static binaries then have limitted usage.
Since LVM 2.02 it appears that compiling LVM on a platform which
lacks a shared library linker/loader will fail to produce any
binaries (dmsetup, lvm2, etc).
Adds support for the standard --disable-shared flag, and
use it to disable attempts at building shared libraries.
Modified-by: zkabelac
Fix a bug in _stats_set_aux() that causes bogus data to appear
in the 'userdata' field of stats reports when previously grouped
regions are ungrouped:
/var/tmp/File With Spaces: Created new group with 1 region(s) as group ID 0.
Removed group ID 0 on fedora-root
Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID UserData
fedora-root - 0 region 6.39g 100.00m 1 100.00m dmstats #-
^^
This is the aux_data separator character followed by empty user data.
The _stats_set_aux() function should only emit the separator if
there is a valid group descriptor for the region.
Creating a group with a name that contains whitespace causes an error in
the @stats_set_aux message:
device-mapper: message ioctl on (253:0) failed: Invalid argument
Could not create regions from file /var/tmp/File With Spaces.
Fix this by quoting the group alias and backslash escaping any
embedded space characters.
Refactor the function that parses regions from @stats_list data to
separate the parsing of variable string data from the fixed parts
of the @stats_list response.
Fix a double free in the error path from _stats_create_group() by
clearing the group struct embedded in the dm_stats handle before
returning:
device-mapper: message ioctl on (253:0) failed: Invalid argument
Could not create regions from file /var/tmp/File With Spaces.
free(): double free detected in tcache 2
Aborted (core dumped)
When setting up dm-verity devices with signed root hashes it is very
useful to have a recognizable error code when a key is not present in
the kernel keyring. Turns out the kernel actually returns ENOKEY in that
case, but this gets lost in libdevmapper.
This fixes this: in _create_and_load_v4() it copies the error code from
the ioctl from the sub-tasks back to the main task field on failure.
This is not enough to make libcryptsetup actually propagate the ENOKEY
correctly, that also needs a patch to libcryptsetup, but this is part of
the puzzle.
Fix some interactions between device IDs and hints. Hints
may limit the scanned devices which should not always trigger
a search for the PVs that were intentionally not scanned.
Hints should also be invalidated if they contain a device
that's become excluded by an internal filter such as the
device_id filter.
Search for a PV on other devices if it's a devname entry
and the name doesn't exist on the system. This restores
code that should not have been removed in commit 1901a47df
"device_id: fix conditions for device_ids_refresh"
throtling mirror device is becoming useless with faster CPUS,
as way to many data can be transferred before throttling steps-in.
So prefer using dm-delay for test and keep throttling as fallback.
Fix commit 847f1dd99c
"device_id: rewrite validation of devname entries"
which began calling device_ids_refresh() in cases where it
was unnecessary, leading to extra PV searches and warnings.
Specifically, a command like "lvs <vg>" would use the hints
file to scan only devices for the named VG. This means that
scanning other PVs would be skipped, and device IDs of those
PVs could not be validated because there are no PVID values
to verify. This missing info would cause messages about
the missing info, and would cause device_ids_refresh to
search for the PVs that had been intentionally skipped.
Detection of how the command is supposed to behave actually depends on
the configure.h compilation and whether binary is compiled with
HAVE_BLKID_SUBLKS_FSINFO.
This makes it somewhat complicated in a way how to recognize which
behavior is expected.
Currently we can eventually recognize by checking error output
of some 'random' lvresize command and see if the --fs checksize is
actually recognized and rejected. If this changes - test needs
to be updated.
After umout we may race with system udevd rule - so
just retry once again after 1s sleep - that should be
enough - otherwise we would need some loop here...
Multi-line echo command are problemat across variety of bash version
and may have produce shorter results.
Convert to stable heredoc string with 'tab' skipping <<- for better
formating.
Since we now push more data into journal, parser reading this file
for --continue mode need to be adapted.
Also properly align batch mode with '.' for max test length name.
This is mainly useful in internal testing - but keep sysfs dir also
passed to filter.
Also drop use of static variable within sysfs filter and base whole
config at creation time.
Restore fsync() call For more accurate tracking by buildbot.
Try different rather tricky way of static_cast to use
already opened FD instead of seperate open(),fsync(),close().
It's pretty strange there is no way to enforce fsync() for
C++ iostreams. Flush() is actully not equal.
Add Timespec class to increase time resolution to miliseconds
(can switch to microseconds if ever needed).
Use more const and const_interators and pass by reference.
Output rusage also to list result file.
Reduce inlining of C++ constructors.
If the system changes, locate PVs that appear on different devices,
and update the device IDs in the devices file. A system change is
detected by saving the DMI product_uuid or hostname in the devices
file, and comparing it to the current system value. If a root PV
is restored or copied to a new system with different devices, then
the product_uuid or hostname should change, and trigger lvm to
locate PVIDs from system.devices on new devices.
When exit on file is present in a system and term/break signal is
catched, them dmeventd is no longger refusing to exit.
For the correct shutdown, there should be ideally unmonitoring call,
however in some case it's very hard to implement this correct procedure.
With this 'exit on' file dmeventd at least avoid 'blocking' shutdown,
before systemd kills use with -9 anyway possibly even in some unwanted
stated of internal dmeventd processing (i.e. in the middle of some lvm
command processing).
To quickly get info about some internal dmeventd status,
implment 'dmeventd -i' support.
Reported messages are some 'raw' internal informations mainly
useful to developers.
Instead of just exiting in the middle of monitoring,
unregisted all monitored devices first and then exit.
To speedup this path, all send internal SIGINT when thread
unregiters itself, to wakup-up main sleeping loop.
Make a local copy of the 'idx' string to avoid
overlapping during the rebuild of name.
This fixes cases where users specified raid
component LVs for moving.
Reported-by: kotarou3@github.com
No write outside of $LVM_TEST_DIR (removed /test access).
Use 'aux prepare_scsi_debug_dev' for automated scsi_debug handling
Properly use "" around shell vars.
Smarter read of PVS values.
Relax requirement to only work with real /dev dir.
Handle the case of device teardown where the first pass
could have only a single, but opened device, for removal.
In such case we want to at least once go through
the udev_wait and retry removal again.
TODO: maybe a sleep .1 might be usable as well with udev_wait
Instead of relying on 'pvs' output - check directly system
configuation and use lvmdevice accrording to use_devicesfile setting.
Also drop use of --fs ignore for filesystem extension for better
backward compatibility with older lvm version.
Shuffle code a bit so the '--no-snapshot' path does not execute
a few unnecessary commands.
Work also with devices that may have ':' inside their generated
/dev/disk/by-id
Ensure there is no race with systems' auto activation while using
the snapshot for conversion.
Update system's vdoconf.yml after the use of snapshot for conversion.
Skip unnecesary prompt for 'convert' while using snapshot and query only
for final snaphot merge.
Prohibit conversion for a device with the PV header.
Enhance 'trap' protection for more signals.
Improve clean() recovery path.
Replace bash 'test' command with [].
Correct some output message to print $TOOL.
Support also options without '-' in the middle i.e. --nosnapshot.
For shellcheck predefine all variables extracted from vdoconf.yml.
Since lvmdbusd testing tends to do its own logging,
try for now to disable very generic logging mechnanism
of the test suite and see the result.
Some lvmdbusd test seems to rely on some log/file logic
which is modified with the use of these shell vars.
Require VDO version 6.2.3.
Skip the part of the test that needs vdo wrapper and 2 different
versions of vdoprepareforlvm to prepare shifted VDO header
at the 2MiB offset.
Hunt also for devices with LVMTEST prefix in UUID.
Call teardown_devs_prefixed - so if they hold RAM or SCSI
they are closed before trying to remove kernel modules.
When creating thin pool or check pool there is allocated LV
for metadata and for such allocation user should be able to
specify list of PVs on cmdline.
Also fix unused passed list of PV for thick to thin conversion,
where the code was using whole PV set from a VG (but since it's
been not enabled on cmdline, user could not hit this issue).
Also remove unneeded initialization of use_pvh.
When the import is used on a system, that uses devices file,
the final activation was impossible for the case the converted
volume was not present in devices file.
Currently add volume automatically in such case.
Also add some more debugging output from the script.
TODO: Consider enhnacing lvconvert with extending devices file.
VDO is using specific path for some device paths.
i.e. for /dev/sda it could be /dev/disk/by-id/scsi-xxxxx.
This used to be not a problem before lvm2 started to use snapshot,
but now it needs to replace matching device path.
So switch to the path naming used in vdoconf.yml file.
Add '--headings none|abbrev|full|0|1|2' command line option to select
the heading type.
none|0 - no headings
abbrev|1 - column name abbreviations
full|2 - full column names (column names are equal to exact names
that -o|--options also accepts to set report output)
Reuse existing report/headings config setting to make it possible to
change the type of headings to display:
0 - no headings
1 - column name abbreviations (default and original functionality)
2 - full column names (column names are equal to exact names that
-o|--options also accepts to set report output)
Also, add '--headings none|abbrev|full|0|1|2' command line option
so we are able to select the heading type for each LVM reporting
command directly.
Add new DM_REPORT_OUTPUT_FIELD_IDS_IN_HEADINGS report output flag.
If enabled, column IDs are reported instead of column names in report
headings.
The 'column IDs' are IDs as found in 'const struct dm_report_field_type *fields'
array that is passed during report initialization (that is, a call to
dm_report_init/dm_report_init_with_selection). In this case, the 'id'
dm_report_field_type member is used instead of the 'heading' member.
While create new LV for pool volume, use name from 'pool_metadata%d' naming
sequence. This LV is later on renamed to pool_t/cmeta, but if there
is any error in the middle, we may evenutally leave some 'volume',
With this name it can be slightly more obvious how it got there,
but also when we handle _pmspare name - we get slightly more predictible
name used there for it.
However for a standard usage this commit shall no visible impact as the
name is used temporarily just for cleaning LV.
Set the lock_args string in addition to doing initialization.
lvconvert calls lockd_init_lv_args() directly, skipping
the normal lockd_init_lv() which usually sets lock_args.
lvmlockd locking for swapmetadata adjusted for commit
ac36153e99 lvconvert: preserve UUID for swapped metadata
Now that the LV uuid is swapped between LVs, the lvmlockd lock can
simply be moved between them, and the same lock can continue to be
used for the LV outside of the pool.
Once lvm2 repairs pool's metadata LV and preserves the original metadata LV
with unmodified metadata, for such LV in VG use newly created UUID for new
_pmspare and actually preserve UUID for this hidden _pmspare (if it
exists).
The lockd lock needs to be freed for the LV that is becoming
the new metadata LV, and a new lockd lock needs to be created
for the old metadata LV that is becoming an independent LV.
Fixes b3e45219c2
We incorrectly marked pv_major and pv_minor fields as being of string
type, even though the values were already correctly handled as integers
internally. This confused -S|--select that tried to compare string
values instead of integers.
Reported here: https://github.com/lvmteam/lvm2/issues/122
Previous fix was invalid (after some in-place shuffling)
'dd' copied goes to 'stderr' so we need to catch all output.
Grep needs to check output of tee tool.
Ensure 'C' locales are in use with 'dd'.
Use cachepool name for create name for metadata backup LV.
(so we do not generate 2 'sequences' of metadata filenames.)
Move path preparion before handling _pmspare.
Also drop extra call to sync_local_dev_names() as it's
already got in sync with call of exec_cmd().
When repair thinpool or cachepool, lvm2 leaves original metadata
volume backup. To avoid potential damage of those data, mark such
volume as 'read-only' and also allow user to use --setactivationskip
option for this volume.
TODO: likely better default would be to automatically skip, but
that might need some more thinking about recovery reporting doc.
Replace the use of internal /dev/mapper names with the use of
public LV names /dev/vg/lv for use with repair tools.
For this make the activation of _pmspare LV to be handled as
a component activation with public name.
Metadata is already atomatically activated this way (as readonly).
So if there is any 'error' happening, we leave public LVs in
system.
Pass more args with some 'aux' commands:
wipefs_a, enable_dev, disable_dev
(so it's a bit more efficient using single udev_wait call).
Use prepare_vg instead of prepare_pvs.
Keep using test directory for created files.
Trap errors and remove brd in this case.
Use some shell builtins to reduce fork count.
Use "$VAR".
Run 'pvs' with devlist (so not acceing other system devices).
New dmpd tools return version string in different format,
so update code to understand both variant.
Also hide some shell var setting to local functions.
lvresize --usepolicy requires resized LVs to be active.
(So it's not only required for shared VG).
The test for active pool needs to use lv_info to query 'layer'
otherwise the pool is considered inactive if it was not activated
explicitely - thun 'implicit' activation with VDO or ThinLV was
not managed by --usepolicy option.
Some linkers do need libdevmapper-event-lvm2.so.2.03,
so add also this symlink to the tests /lib dir.
This fixes the need to use previous LD_LIBRARY_PATH complex
setting and now works with much shorter list.
Last commit c38b668fc3 was pushed
with type 'scanf' instead of 'sscanf' needed for buffer reading.
Interestingly this caused scanning from 'stdin' descriptor and
thus failures in lvm shell usage.
The code in init_log_file relies on the process name (COMM) to not
contain whitespaces. This change fixes it by looking up the right-most
parenthesis to safely jump past COMM.
For more context see:
https://www.openwall.com/lists/oss-security/2022/12/21/6
Code is only used with testing, so it should have no impact on regular
users.
Reported-by: Hugues Evrard <hevrard@google.com>
mm
With 3596558861 it's been introduced
a more fine grained description.
However 'disabled' might be actually more confusing then empty field,
so keep only the info about 'not enabled'aka dmevend is not allowed
to monitor LV which otherwise could be monitored.
Update pool conversion function to handle also conversion of
thick LV to thin LV by moving thick LV into thin pool data LV
and creating fully provissioned thin LV on top of this volume.
Reworking existing conversion to use insert_layer_for_lv co
the uuid is now kept with thin-pool - this should however not
really matter as we are doing full deactivation & activation cycle.
With conversion to thin LV user can use same set of arguments
to set chunk-size.
TODO: add some smart code to decide best values for chunks sizes.
For proper functionality of insert_layer_for_lv we need to
move more bits to layerd LV.
Add some missing new types and correct usage of caller,
so the new LV type is set after the movement.
Validate cache origin in front of the prompt.
Also add some rules to command description file.
TODO:
more validation needed also for lvcreate,
more complex rules with "OR" seems to be needed.
Enable support to cache thin/error/zero virtual targets.
Use can now select whether he want to cache whole thin-pool,
or an individual thin volume out of whole thin-pool.
Support using zero and error LVs as data volume for
cachepool is possibly useful for benchmarking, not much
can be expected from such setup.
Avoid activation when going to skip zeroing of 'error' segtype
(so it's not erroring out).
Also skip zeroing for 'zero' segtype LV (already being zero).
When lvm2 calculates the maximal usable COW size and crops the user
requested size to this value, don't return the error result from
the 'lvextend' operation.
We already apply the same logic when resizing thin-pool beyond
the supported maximal size.
FIXME: The return code error logic here is somewhat fuzzy.
Output from vdo manager may actually indent output with spaces,
so trim leading and ending space.
Also add support for verbosity flag for vdo conversion tool.
This vdo parameter existed in the early stage of integration of vdo into lvm2,
but later it's been removed from vdoformat tool - so actually if
there would be any non-zero value it would cause error on lvcreate.
Option was not stored on disk in lvm2 metadata.
Remove this vdo parameter from lvm2 sources.
(Although this vdo parameter will be still accepted on cmdline through
--vdosettings option, but it will be ignored.)
Fix the testing logic.
With raid5 device the layout of files on a filesystem does not define
which leg will actually contain the block we try to damage.
So test will now figure out which device has damaged block.
Use 'check' functionality and also drop unneeded random write as we
now can identify easily in other way.
Compilation may configure it's own /etc path so ensure the test
has a defined location for access to this dir during testing.
Also prepare machine_id filei (with the use of uuidgen tool)
for the test.
Ensure the volume after conversion has the properly aligned size to the
volume group extent size. This would be visible when using virtual size,
that cannot be divided by extent size.
Before the user had to manually adjust the size after conversion to get
access to all data stored on VDO volume.
The command "lvcreate --type thin --snapshot ..." to create a thin
snapshot would fail.
commit d651b340e6 removed the optional
"--type thin" from the command definition "lvcreate --snapshot LV_thin",
and added --type thin as AUTOTYPE. This was correct and should not have
changed anything if all the command defs were correct, but it broke
the "lvcreate --type thin --snapshot" case. It reveals a problem in a
different command definintion: "lvcreate --type thin LV_thin" that was
missing --snapshot in its OO list.
If we want to have const structure - use a const pointer to avoid,
however making individual members const make it only assingle
in construct time and we already violate this logic since
we memcpy into the structure - so drop these unnecessary consts...
Add support for usage of 'dm-snapshot' for doing whole device conversion
in a snapshot which could be merged once the whole conversion has been
made.
This helps with cases where there would be any unexpected failure in the
middle of conversion process and user can continue using original
device until problem in conversion is fixed.
Import tool now uses 'truncate' tool to create a small backend file (20M) for loop device
to hold COW exception store.
Option --vdo-config has been added to allow specifing location of vdo
configuration file.
Option --no-snapshot allows to use import tool without creation of
snapshot, however recovery after finished VDO conversion and unfinished
lvm2 conversion is then very difficult.
Option --uuid-prefix allow to specify DM UUID prefix for snapshot
device.
Use read with -r.
Improve metadata parser to handle volume_geometry bio_offset,
which needs to be substracted from 'region' start_block when present.
This bio_offset block is non-zero i.e. with converted VDO volumes.
Also fix some converted structure value (but they are not in use).
Fix in the code that matches devices to system.devices entries when
the devices have the same serial number. A non-PV device in
system.devices has no pvid value, and the code was segfaulting
when checking the null pvid value.
If we are executing 'report_for_selection' to do an internal report
just for the selection itself (not for display), we can call it more
than once. In that case, we are reusing the same selection handle
(e.g. inside 'process_each_lv_in_vg').
The selection handle has 'report_type' field which is a union of all
report types needed for the report based on selection fields we actually
use.
The 'report_type' is further clarified based on checks and rules inside
'_get_final_report_type' which 'report_for_selection' calls. Then, this
final report type unambiguously identifies proper branch to take in
'report_all_in_{pv,vg,lv}' that is called next.
If the 'report_for_selection' is called more than once with the same
selection handle, we need to make sure that we always restore the report
type to its initial value, so all the rules inside 'report_for_selection'
are applied correctly next time.
This patch fixes the missing restoration of the 'report_type' value in
'selection_handle' that is reused for recurring 'report_for_selection'
calls.
An example scenario where this failed was with selecting an LV for
removal with "lvremove --select" while using a field in the selection
that required extra DM info or DM status call for the LV (that is,
"Logical Volume Device Info Fields" and "Logical Volume Device Status Fields"
as visible in 'lvs -S help').
In previous lvm versions, trailing spaces at the end of a t10 wwid would
be replaced with underscores, so the IDNAME string in system.devices
would look something like "t10.123_". Current versions of lvm ignore
trailing spaces in a t10 wwid, so the IDNAME string used would be
"t10.123". The different values would cause lvm to not recognize a
device in system.devices with the trailing _. Fix this by ignoring
trailing underscores in the IDNAME string from system.devices.
The recent fix 05c2b10c5d ensures that raid LV images are not
using the same devices. This was happening in the lvextend commands
used by this test, so fix the test to use more devices to ensue
redundancy.
vgrename does not support -S|--select, so do not provide a hint about
using it. Instead, provide a hint about using VG uuid directly.
❯ vgs
WARNING: VG name vg1 is used by VGs DXjcSK-gWfu-5gLh-9Kbg-sG49-dtRr-GqXzGL and MVMfyM-sjOa-M2xV-AT4Y-JddR-h4SP-UO5Ttk.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
VG #PV #LV #SN Attr VSize VFree
vg1 1 0 0 wz--n- 124.00m 124.00m
vg1 1 0 0 wz--n- 124.00m 124.00m
(vgrename does not support -S|--select)
❯ vgrename vg1 vg2
WARNING: VG name vg1 is used by VGs DXjcSK-gWfu-5gLh-9Kbg-sG49-dtRr-GqXzGL and MVMfyM-sjOa-M2xV-AT4Y-JddR-h4SP-UO5Ttk.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
Multiple VGs found with the same name: skipping vg1
Use VG uuid in place of the VG name.
(vgchange does support -S|--select)
❯ vgchange --addtag a vg1
WARNING: VG name vg1 is used by VGs DXjcSK-gWfu-5gLh-9Kbg-sG49-dtRr-GqXzGL and MVMfyM-sjOa-M2xV-AT4Y-JddR-h4SP-UO5Ttk.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
Multiple VGs found with the same name: skipping vg1
Use --select vg_uuid=<uuid> in place of the VG name.
In case of e.g. 3 PVs, creating or extending a RaidLV causes SubLV
collocation thus putting segments of diffent rimage (and potentially
larger rmeta) SubLVs onto the same PV. For redundant RaidLVs this'll
compromise redundancy. Fix by detecting such bogus allocation on
lvcreate/lvextend and reject the request.
Check for running (possibly leftover) lvmdbusd running in the
system - as this daemon may interfere with this test as in this
case both be operating on same 'live' data in /run/lvm.
Make test faster by agregating sets of operation to work on a single
created filesystem yet checking all the variants of extension and reduction.
Split 'xfs' part into separate test and convert it for use of the
minimal size 300M nowdays required by mkfs.xfs.
lvreduce uses _lvseg_get_stripes() which was unable to get raid stripe
info with an integrity layer present. This caused lvreduce on a
raid+integrity LV to fail prematurely when checking stripe parameters.
An unhelpful error message about stripe size would be printed.
When lvmcache info is dropped because it's an md component,
then the lvmcache vginfo can also be dropped, but the list
iterator was still using the list head in vginfo, so break
from the loop earlier to avoid it.
Previous commit cause the pvmove could actually be started in unexpected
order - so make sure, we are not starting new pvmove in same VG until
the previous one is started.
Keep backups within test_dir instead of dev_dir (so it doesn't
leak large files there if the tests are run over real /dev dir).
Move restoring of dm_mirror throttling before test_dir is removed.
Not quite sure if this helps anything, some of testing
machines can't reliably remove scsi_debug, reporting
they are in use - but it's not easily reproducible...
aux wait_pvmove_lv_ready() now handles multiple pvmove LVs
at one go - which allows a bit fast checking - although
at some point we may need to switch to use delayed devs
since mirror throttling seems to be no longer working well,
as CPU are getting so fast, that most of data are already
pvmoved before throttling has any chance to do something...
When 'brd' device can be removed (is unused AKA not opened),
remove such device and use again for testing.
Let's assume user has no unused brd device left in the system.
When the 'tests' sometimes fail to cleanup devices, with this
change futher cleanup from some next test may evenually release
brd device and make it available for testing.
There is no easy way to detect, whether device supports zeroing,
and kernel also zeroes device when it's not directly supported,
but with extra message:
operation not supported error, dev X, sector Y op 0x9:(WRITE_ZEROES)...
So to avoid generating such message with every 'lvcreate', use for
zeroing of upto 8K just standard write of zeroed page.
(maybe we can go with even larger sizes).
Convert test to use only ext4 instead of 300M demanding XFS.
Shorten 'B' files to 4K and use 4K strip size with >raid1 arrays
so we do not risk spreading of the file across stripe.
Also use easier 'aux corrupt_dev()' method to introduce a bit
corruption into a block device with integrity.
TODO: shorten _wait_recalc (should't be needed).
Add function to corrupt some bytes in give file path presenting
a device. 1st. patern in just once replaced with 2nd. pattern.
Usable to simulate some bit corruption for integrity devices.
Convert test to use a single skeleton and only different pieces
keep in separate tests.
Lower raid disk usage to smaller size and switch to ext4
as way less demanding fileystem.
Instead of using size of 'empty header' in vdopool use fixed size 4K
for a 'wrappeing' vdo-pool device.
This fixes the issue when user tried to activate vdo-pool after
a conversion from vdo managed device with 'vgchange -ay' - where
this command activated all LVs with 'vdo-pool' wrapping device as well,
but this converted pool uses 0-length header.
This 4k size should usually prevent other tools like 'blkid' recognize
such device as anything - so it shouldn't cause any problems with
duplicate indentification of devices.
Correcting output results from configure script when
printing locking dir.
Shuffle dmfilemapd close to dmeventd option and tracing.
Couple indent changes.
syscall 186 is specific to x86 64bit. As this is different from arch
to arch and between same arch different arch size we will only grab
thread ID using built-in python support if it is supported.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2166931
When the daemon is starting we do an initial fetch of lvm state. If we
happened to get some type of failure with lvm during this time we would
exit. During error injection testing this happened enough that
the unit tests were unable to finish. Add retries to ensure we can get
started during error injection testing.
Previously we were injecting a missing key in the lv, vg, and pv.
Given the order of processing in lvmdbusd, this prevented us from
exercising all the error paths. Change to returning just 1 instead.
When we sort the LVs, we can stumble on a missing key, protect against
this as well.
Seen in error injection testing:
Traceback (most recent call last):
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 198, in update_thread
num_changes = load(*_load_args(queued_requests))
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 83, in load
rc = MThreadRunner(_main_thread_load, refresh, emit_signal).done()
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/utils.py", line 726, in done
raise self.exception
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/utils.py", line 732, in _run
self.rc = self.f(*self.args)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 148, in load_lvs
return common(
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 72, in lvs_state_retrieve
lvs = sorted(cfg.db.fetch_lvs(selection), key=get_key)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 35, in get_key
pool = i['pool_lv']
KeyError: 'pool_lv'
Remove old code that became incorrect at some point.
It's probably a fragment of an old condition that was left
behind because it wasn't understood. We don't want to drop
the MISSING_PV flag just because the PV has no mda in use.
The device that was missing may have stale data, so the user
needs to decide if the device should be removed or restored.
Different sequences of steps that could be used to handle raid LVs
after VG takeover (what would happen in cluster failover) combined
with the loss of a disk.
When we are using -S|--select for non-reporting tools while using command log
reporting (log/report_command_log=1 setting), we need to create an internal
processing handle to handle the selection itself. In this case, the internal
processing handle to execute the selection (to process the -S|--select) has
a parent handle (that is processing the actual non-reporting command).
When this parent handle exists, we can't destroy the command log report
in destroy_processing_handle as there's still the parent processing to
finish. The parent processing may still generate logs which need to be
reported in the command log report. If the command log report was
destroyed prematurely together with destroying the internal processing
handle for -S|--select, then any subsequent log request from processing
the actual command (and hence an attermpt to access the command log report)
ended up with a segfault.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=2175220
There is a window of time where the following can occur.
1. An API request is in process to the lvm shell, we have written some
command to the lvm shell and we are blocked on that thread waiting
2. A signal arrives to the daemon which causes us to exit. The signal
handling code path goes directly to the lvm shell and writes
"exit\n". This causes the lvm shell to simply exit.
3. The thread that was waiting for a response gets an EIO as the child
process has exited. This bubbles up a failure.
This is addressed by placing a lock in the lvm shell to prevent
concurrent access to the shell. We also gather additional debug data
when we get an error in the lvm shell read path. This should help if
the lvm shell exits/crashes on its own.
Replace spaces with \040 in directory paths from getmntent (mtab).
The recent commit 5374a44c57 compares mount point directory paths
from /etc/mtab and /proc/mounts, in order to detect when a mounted
LV has been renamed. The directory path comparison does not work
correctly when the path contains spaces because getmntent uses
ascii space chars and proc replaces spaces with \040.
Add some Gentoo based patches for better support of static linking.
This are not tested nor supported by upstream developers.
Usage requires presence of several libraries in their static form
which is however not commonly available.
Selinux modified by zkabelac to still work on older sofrware which
did not provided libselinux.pc at a time - see keep the old check
present and use pkg-config only when possible.
When user enables lockd libraris, code needs to fail, when then are
missing. Also when notify-dbus support if enabled, and libsystemd is
missing, abort configuration.
Check for pkg-config --libs libdlm_lt and test if the returned value
contains word 'pthread' - if so, it's likely a buggy result from
incorrect config file and use directly -ldlm_lt for this case.
There may be some unconventional methods of sharing disks with vgimport
where hints could cause confusion. Hints should be disabled when
sharing disks, or pvscan --cache used to regenerate hints as needed,
but invalidating hints from vgimport might help reduce confusion.
Convert lvmlockd to use configure _LIBS and _CFLAGS for
discovered libraries.
TODO: ATM we ignore discovered libdlm and use libdlm_lt instead.
Also libseagate_ilm is hard to find unicorn for testing.
Convert naming SYSTEMD_CFLAGS/LIB -> LIBSYSTEMD_CFLAGS/LIBS
to better fit library check for libsystemd.
Build lvmlockd with SD_NOTIFY when we have defined LIBSYSTEMD_LIBS.
Coverity is complaining about unchecked strcpy here, which is
irelevant as we preallocate buffer to fit in copied string,
however we could actually reuse these size and use just memcpy().
So lets make some simple conversions.
This was trying to warn about a misconfig where the root PV
was excluded by the devices file, but it wasn't covering all
cases. Also, it's not very useful because the root LV is
usually activated directly in the initrd.
With the recent use of DEVLINKS, there is no longer any real
point in checking the filter for symlink names. Removing
this check should not change behavior with or without symlinks
in the filter.
Commit 7a8b7b4add introducing lockidm
support missed to use AC_SUBST() for a variable and provide it only
in prebuilt configure, so with the next autoreconf this variable
was lost and IDM support was no longer compiled.
Fixes: https://github.com/lvmteam/lvm2/issues/98
Reported-by: Tom Prohofsky
Correct setting of migration_threshold and add clarify
how the user can restore default cache settings for cache policies.
Also document mq aliasing with smq for newer kernels.
When two parallel commands would have tried to create our
/dev/mapper/control node at the same time, one of them could
actually fail even if the 2nd. command actually mknod()
this control node correctly.
So for EEXIST case add detection if the control node is ok.
This may possibly help with some race case in early boot.
We still need to support build without any blkid present,
so use PKG_CHECK_EXISTS() instead of direct failure
from PKG_CHECK_MODULES for too old version.
Allow users to specify their path to systemd-run binary:
configure --with-systemd-run=/my/path/system-run
By defaults it autodetected in $PATH and fallbacks to:
/usr/bin/systemd-run.
Previous commit e67ebc5c40 used incorrect check
for blkid version.
Fix it by always checking for at least version 2.24,
as we can't use older version anyway.
Added BLKID_SUBLKS_FSINFO is present in newer version.
Reduce shellcheck warnings about missing {} for possible array
dereference.
Make sure we are not loosing error code when assigning local vars
and explicitely ignore 'errors' from standalone lines when needed.
Add some missing quotes.
Use $() instead of ancient ``
Avoid writing some temporary data into /tmp - test need to store
files within its own 'testdir' - so it can be properly discarded.
Addressing problem for user of system without default bash shell.
As reported "set -o pipefail" is a bashism that causes the build to
fail when /bin/sh does not point to bash.
For example, this has been observed causing build failures
on Gentoo when /bin/sh is symlinked to /bin/dash.
Rule has been reworked and we started to use .DELETE_ON_ERROR to
ensure that with any errors during file generation, such invalid
file is automatically removed.
Rules were reworked to avoid using pipe and instead use temporary
files when necessary.
Printing lines with echo was replaced with POSIX recomended 'printf'
as proposed by reporter since handling of escape characters and
the "-n" flag for "echo" aren't portable across shells.
Also some build deps has been added.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1822280
Gentoo-bug: https://bugs.gentoo.org/682404
Gentoo-bug: https://bugs.gentoo.org/716496
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1822280
Reported-by: Michael Orlitzky <michael@orlitzky.com>
TODO: investage if the temporary files could be handled via some
intermediate target solution - ATM I couldn't make it work equally
well as current solution use shell 'trap' to remove temp file.
Commit dropping lvmetad support 117160b27e
also removed cleaning of its generated files. However we need to keep
this functionality, otherwise we can leak them during various bisect.
Easier is to keep such rules forever.
Also commit c1ab9fb37f moved cmds.h to
include, so again keep it removed so it's not left there and in any
case could not misslead anyone.
Use configure detection instead of trying to directly include
blkid.h inside lvresize.c - where we do not pass in BLKID_CFLAGS
and since we actually do not need to use blkid there, introeduce
test variable HAVE_BLKID_SUBLKS_FSINFO and avoid trying to
use blkid.h in this place - this also fixes builing problem for
systems without blkid.h.
`AS_IF([...])` is more portable, as it respects macro expansions of
`AC_REQUIRE()`.
This is recommended Autoconf best practice, since in nested
conditionals, it is generally unknowable whether some macro invokes
`AC_REQUIRE()` deep down:
https://www.gnu.org/software/autoconf/manual/autoconf-2.71/html_node/Common-Shell-Constructs.html#index-AS_005fIF-1
As a result, the hacky `pkg_config_init` function is not needed
anymore, since any `PKG_*` invocation will ensure that
`PKG_PROG_PKG_CONFIG` will have been called, due to the fact that
`AC_REQUIRE()` will trickle up.
`[[ ... ]]` only works in bash, and not in POSIX sh, specifically
dash fails on this, which is a popular alternative to bash for running
configure scripts.
test's `-a` and `-o` options are considered obsolescent by POSIX,
because they interact badly with expressions and can become ambiguous
depending on string arguments:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
`==` only works in bash.
Fix hang of vgimportclone command when:
the PV(s) being imported are not actually clones/duplicates, and
the -n vgname arg is the same as the current vgname.
(Not the intended use of the command, but it should still work.)
In this case, the old and new vgnames ended up being the same, when
the code expected they would be different. A file lock on both the
old and new vgnames is used, so when both flocks are on the same
file, the second blocks indefinitely.
Fix the new vgname to be the old name plus a numeric suffix, which
is the expected result.
Follow-up for e10f67e917.
The commit e10f67e917 tries to keep device
node symlinks even if the device is in the suspended state. However,
necessary properties that may previously obtained by the blkid command
were not imported at least in the .rules file. So, unless ID_FS_xyz
properties are imported by another earlier .rules file, the device node
symlinks are still lost when event is processed in the suspended state.
Let's explicitly import the necessary properties.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2158628
GHPR: https://github.com/lvmteam/lvm2/pull/105
We used KERNEL=="device-mapper", NAME="/dev/mapper/control" udev rule to
create the /dev/mapper/control file. The "NAME" rule should be only used
to rename network devices, otherwise udev issues a warning message. The
device-mapper driver has proper DEVNAME=/dev/mapper/control propagated
in the uevent environment when it is loaded so we don't need further
instruction on where to create the node - udev knows already.
Also, these days, it is created directly by kernel inside devtmpfs.
This makes the NAME="/dev/mapper/control" rule completely obsolete.
Since 67722b3123, we have a new mechanism
to run the autoactivation from udev. With this change, we also replaced
the way the LVM autoactivation service is instantiatiated - instead of
setting the SYSTEM_WANTS udev variable (which systemd read and then
instantiated the service), we're now directly instantiating the
transient 'lvm-activate-<vgname>' service by calling systemd-run.
As such, we don't need to bother with setting the SYSTEMD_READY variable
for foreign devices anymore (in this case, MD and loop devices on top of
which there's a PV).
Before, we set the SYSTEMD_READY variable to make sure that the SYSTEMD_WANTS
is applied correctly - the service instantiation was edge-triggered by
flipping the SYSTEMD_READY from 0 to 1 and at the same time having the
SYSTEMD_WANTS variable set to the service name to instantiate. We're
using systemd-run now so this condition does not apply anymore.
Also, it was not completely correct to set SYSTEMD_READY for foreign
devices because there might be cases where this could cause issues,
see also https://github.com/lvmteam/lvm2/issues/94.
When refreshing all LVs in a VG, skip processing of thick snapshots,
since they will be refreshed through its origin LV.
Should reduce some unnecessary ioctl().
Keep the conversion 64bit as on x32 arch time_t is 64bit value
and we may loose precision (y2038).
TODO: like use universal string for time printing as in log/log.c
_set_time_prefix()
When activating origin and its thick snapshots, ensure the
origin's LV udev processing is finished first and after this
reactivate its snapshot so the udev can scan them afterwards.
This should fix the problems for users using UUID of such device
in their fstab and occasionaly mounted snapshot instead of origin LV.
"vgchange -aay --autoactivation event" is called by our udev rule.
When the udev rule runs, symlinks for devices may not all be created
yet. If the regex filter contains symlinks, it won't work correctly.
This command uses devices that already passed through pvscan. Since
pvscan applies the regex filter correctly, this command inherits the
filtering from pvscan and can skip the regex filter itself.
See the previous commit
"pvscan: use alternate device names from DEVLINKS to check filter"
pvscan --cache <dev> is called by our udev rule at a time when all
the symlinks for <dev> may not be created yet (by other udev rules.)
The regex filter in lvm.conf may refer to <dev> using a symlink name
that hasn't yet been created, which would cause <dev> to not match
the filter regex. The DEVLINKS env var, set by udev, contains all
the symlink names for <dev> that have been or will be created.
So, we add all these symlink names to dev->aliases, as if we had
found them in /dev. This allows <dev> to be recognized by a regex
filter containing a symlink for <dev>.
It looks like force was not being used to enable crypt resize,
but rather to force an inconsistency between LV and crypt
sizes, so this is either not needed or force in this case
should have some other meaning.
This reverts commit ed808a9b54.
Update previous commit
"lvresize: only resize crypt when fs resize is enabled"
to enable crypt resizing when --force is set and --resizefs
is not set. This is because it's been allowed in the past
and people have used it, but it's not a good idea.
There were a couple of cases where lvresize, without --fs resize,
was resizing the crypt layer above the LV. Resizing the crypt
layer should only be done when fs resizing is enabled (even if the
fs is already small enough due to being independently reduced.)
Also, check the size of the crypt device to see if it's already
been reduced independently, and skip the cryptsetup resize if
it's not needed.
While the output of building looks more polished, text editors fail to
find source file from compile errors - so until we start to print
all file with full paths - comment out this make build parameter.
Enhance checking vdo constains so it also handles changes of active VDO LVs
where only added difference is considered now.
For this also the reported informational message about used memory
was improved to only list consuming RAM blocks.
Introduce struct vdo_pool_size_config usable to calculate necessary
memory size for active VDO volume.
Function lv_vdo_pool_size_config() is able to read out this
configuration out of runtime DM table line.
Cover a case missed by the recent commit e0ea0706d
"report: query lvmlockd for lv_active_exclusively"
Fix the lv_active_exclusively value reported for thin LVs.
It's the thin pool that is locked in lvmlockd, and the thin
LV state was mistakenly being queried and not found.
Certain LV types like thin can only be activated exclusively, so
always report lv_active_exclusively true for these when active.
If one of the PVs in the VG does not hold metadata, then the
command would fail, thinking that PV was from a different VG.
Also add missing free on that error path.
typo used "cryptresize" as command name
this affects cases where the file system is resized
independently, and then the lvresize command is used
which only needs to resize the crypt device and the LV.
18722dfdf4 lvresize: restructure code
mistakenly changed the overprovisioning check from applying
to all lv_is_thin_type lvs to only lv_is_thin_pool lvs, so
it no longer applied when extending thin lvs. The result
was missing warning messages when extending thin lvs.
The recent change that verifies sys_serial system.devices entries
using the PVID did not exclude non-PV devices from being checked.
The verification code would attempt to use du->pvid which was null
for the non-PVs causing a segfault.
With newer kernels (>5.13) DM_CREATE no longer generates
uevent for DM devices without table.
There are even no sysfs block device entries in such case,
although device has asigned major:minor and it is being listed
by 'dmsetup info'.
So this patch calculates amount of 'table' lines and in case
no table line comes from cmdline or stdin - waiting on cookie
is avoided generically instead of disabling just case with
option --notable - which then also skipped handling of an
option --addnodeoncreate (which is however historical and
should be avoided)
As a result there should be no leaking udev cookies and endlessly
waiting commands like this:
dmsetup create mytestdev </dev/null
We can't assume that strerror_r returns char* just because _GNU_SOURCE is
defined. We already call the appropriate autoconf test, so let's use its
result (STRERROR_R_CHAR_P).
Note that in configure, _GNU_SOURCE is always set, but we add a defined
guard just in case for futureproofing.
Bug: https://bugs.gentoo.org/869404
Query LV lock state in lvmlockd to report lv_active_exclusively
for active LVs in a shared VGs. As with all lvmlockd state,
it is from the perspective of the local node.
Signed-off-by: corubba <corubba@gmx.de>
Add a note to the manpage that lvmlockd is unable to determine
accurately and without side-effects whether a LV is remotely active.
Also change the value of the lv_active_remotely option from false to
undefined for shared VGs to distinctly communicate that inability to
users. Only for local VGs it can be definitely stated that they are not
remotely active.
Signed-off-by: corubba <corubba@gmx.de>
Since now we change deduplication with V4 table line change,
the modification tends to be faster and we can capture for a few ms
also 'status' about opening or closing deduplication index.
Use 'grep -E' to handle both words.
Improve detection of VDO virtual size - so it's not reading VDO
metadata when VDO device is already active and instead we reuse
existing table line for knowing existing metadata size.
NOTE: if there is ever going to be added support for reduction
of VDO virtual size - this method will need to be reworked to
allow size difference only within 'extent_size' alignment.
As we actully use reading of VDO metadata only as extra 'information' source,
and not error command - switch to 'log_debug()' severity with messages
out of parser code.
Handle multiple devices using the same serial number as
their device id. After matching devices to devices file
entries, if there is a discrepency between the ondisk PVID
and the devices file PVID, then rematch devices to
devices file entries using PVID, looking at all disks
on the system with the same serial number.
Only /sys/dev/block/major:minor/device/serial was read to find
a disk serial number, but a serial number seems to be reported
more often in other locations, so check these also:
/sys/dev/block/major:minor/device/vpd_pg80
/sys/class/block/vda/serial (for virtio disks only)
Add very simplistic parser of vdo metadata to be able to obtain
logical_blocks stored within vdo metadata - as lvm2 may
submit smaller value due to internal aligment rules.
To avoid creation of mismatching table line - use this number
instead the one provided by lvm2.
Previously we utilized udev until we got a dbus notification from lvm
command line tools. This however misses the case where something outside
of lvm clears the signatures on a block device and we fail to refresh the
state of the daemon. Change the behavior so we always monitor udev events,
but ignore those udev events that pertain to lvm members.
Note: --udev command line option no longer does anything and simply
outputs a message that it's no longer used.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1967171
The lvm dbus daemon will auto activate on dbus API calls. To
prevent the dbus daemon starting when lvm command line tools are
being used we will check to see if the daemon is running first.
If the daemon is not running, we will not notify the daemon.
For this check to work it requires the changes done previously
with commit: 3fdf449348
Reviewed-by: David Teigland <teigland@redhat.com>
The number of extents for the sanlock lvmlock lv is calculated using
integer division, which rounds towards zero. With a physical extent size
of 129M, instead of the requested 256M the lv is only 129M (1 extent).
With any physical extent size greater than 256M the lv creation fails
because the number of extents is zero.
This is fixed by replacing the integer division with a division macro
that rounds up and thus guarantees that the size of the lv will always
be equal or greater than the requested size. Using the examples above, a
pes of 129M will result in a 258M lv (2 extents), pes of 300M in a 300M
lv (1 extent).
The re-calculation of the lv size in bytes and megabytes is only so the
debug output shows the correct values. The size in mb there is still
not byte-perfect-accurate, but good enough for a human-readable estimate;
and the exact size in bytes and extents is right next to it.
Signed-off-by: corubba <corubba@gmx.de>
When executing process_each_lv_in_vg, we process live LVs first and
after that, we process any historical LVs. In case we have just removed
an LV, which also means we have just made it "historical" and so it
appears as fresh item in vg->historical_lvs list, we have to skip it
when we get to processing historical LVs inside the same process_each_lv_in_vg
call.
The simplest approach here, without introducing another LV list, is to
simply mark such historical LVs as "fresh" directly in struct
historical_logical_volume when we have just removed the original LV
and created the historical LV for it. Then, we just need to check the
flag when processing historical LVs and skip it if it is "fresh".
When we read historical LVs out of metadata, they are marked as
"not fresh" and so they can be processed as usual.
This was mainly an issue in conjuction with -S|--select use:
# lvmconfig --type diff
metadata {
record_lvs_history=1
}
(In this example, a thin pool with lvol1 thin LV and lvol2 and lvol3 snapshots.)
# lvs -H vg -o name,pool_lv,full_ancestors,full_descendants
LV Pool FAncestors FDescendants
lvol1 pool lvol2,lvol3
lvol2 pool lvol1 lvol3
lvol3 pool lvol2,lvol1
pool
# lvremove -S 'name=lvol2'
Logical volume "lvol2" successfully removed.
Historical logical volume "lvol2" successfully removed.
...here, the historical LV lvol2 should not have been removed because
we have just removed its original non-historical lvol2 and the fresh
historical lvol2 must not be included in the same processing spree.
The new device_id types are: wwid_naa, wwid_eui, wwid_t10.
The new types use the specific wwid type in their name.
lvm currently gets the values for these types by reading
the device's vpd_pg83 sysfs file (this could change in the
future if better methods become available for reading the
values.)
If a device is added to the devices file using one of these
types, prior versions of lvm will not recognize the types
and will be unable to use the devices.
When adding a new device, lvm continues to first use sys_wwid
from the sysfs wwid file. If the device has no sysfs wwid file,
lvm now attempts to use one of the new types from vpd_pg83.
If a devices file entry with type sys_wwid does not match a
given device's sysfs wwid file, the sys_wwid value will also
be compared to that device's other wwids from its vpd_pg83 file.
If the kernel changes the wwid type reported from the sysfs
wwid file, e.g. from a device's t10 id to its naa id, then lvm
should still be able to match it correctly using the vpd_pg83
data which will include both ids.
t10 wwids are now edited in the same way that multipath does,
which is replacing a series of spaces with one _. Previously
lvm replaced every space with one _. Devices file entries
with the old form will be converted to the new shorter form.
Move the functions handling dev wwids.
Add dev flags indicating that wwids have been read from
sysfs wwid file or sysfs vpd_pg83 file. This can be
used to avoid rereading these.
Improve filter-mpath search for a device's wwid in
/etc/multipath/wwids, to avoid unnecessary rereading
of wwids from sysfs files.
Type 8 wwids from vpd_pg83 with naa or eui names should be
saved as those types.
If lvmlockd in cluster is killed accidently or any other reason, the
lock resources will become orphaned in the VG lockspace. When the
cluster manager tries to restart this daemon, the LVs will probably
become inactive because of resource schedule policy and thus the lock
resouce will be omited during the adoption process. This patch will
try to purge the lock resources left in previous lockspace, so the
following actions can work again.
Exclude the new fs resizing capabilities at build time
(rather than run time) if the necessary libblkid features
are not available. When excluded, all fs resizing options
are translated to resize_fsadm. Accessing the new
features now requires rebuilding lvm if libblkid is
upgraded.
Place the addCleanup at the end as we don't want to go through clean up
if we don't make it through setUp. If we don't do this we can remove VGs
that we didn't create in the unit test.
Previously when the __del__ method ran on LVMShellProxy we would blindly
call terminate(). This was a race condition as the underlying process
may/maynot be present. When the process is still present the SIGTERM will
end up being seen by lvmdbusd too. Re-work the code so that we
first try to wait for the child process to exit and only then if it hasn't
exited will we send it a SIGTERM. We also ensure that when this is
executed we will briefly ignore a SIGTERM that arrives for the daemon.
If we plan to use dm throttling for mirror targets - we actually
have to check whether kernel runs with CONFIG_HZ_1000 - if it does
not the whole idea of throttling is actually not working in the
testsuite as within a single 'tick' with HZ 100 way too much date
is being moved on any modern hardware - and since there is no plan
to change this in kernel - we simply avoid using throttling on such
kernel and test needs to work differently - either ignore results
or use much larger mirror sizes...
When checking to see if the PV is missing we incorrectly checked that the
path_create was equal to PV creation. However, there are cases where we
are doing a lookup where the path_create == None. In this case, we would
fail to set lvm_id == None which caused a problem as we had more than 1
PV that was missing. When this occurred, the second lookup matched the
first missing PV that was added to the object manager. This resulted in
the following:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 25, in _main_thread_load
(changes, remove) = load_pvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/pv.py", line 46, in load_pvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 55, in common
del existing_paths[dbus_object.dbus_object_path()]
Because we expect to find the object in existing_paths if we found it in
the lookup.
resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2085078
Latest upstream build of lvm results in the following error when
trying to use lvmshell.
"Argument --reportformat cannot be used in interactive mode.,
Error during parsing of command line."
When lvm is compiled with editline, if the file descriptors don't look like
a tty, then no "lvm> " prompt is done. Having lvm output the shell prompt
when consuming JSON on a report file descriptor is very useful in
determining if lvm command is complete.
Historically we have seen a few different errors which occur when we call
fullreport. Failing exit code and JSON which is missing one or more keys.
Instruct lvm to dump the debug to a file during fullreport calls when we
fork & exec lvm. If we encounter an error, ouput the debug data.
The reason this isn't being done when lvmshell is used is because we
don't have an easy way to test the error paths.
This change is complicated by the following:
1. We don't know if fullreport was good until we evaluate all the JSON.
This is done a bit after we have called into lvm and returned.
2. We don't want to orphan the debug file used by lvm if the daemon is
killed. Thus we try to minimize the window where the debug file hasn't
already been unlinked. A RFE to pass an open FD to lvm for this
purpose is outstanding.
The temp. file is:
-rw------. 1 root root /tmp/lvmdbusd.lvm.debug.XXXXXXXX.log
Introduce an exception which is used for known existing issues with lvm.
This is used to distinguish between errors between lvm itself and lvmdbusd.
In the case of lvm bugs, when we simply retry the operation we will log
very little. Otherwise, we will dump a full traceback for investigation
when we do the retry.
Instead of lumping all the exceptions, break them out to handle the dbus
exceptions separately, to reduce the amount of debug information that ends
up in the journal that has questionable value.
Lvm occasionally fails to return all the request JSON keys in the output of
"fullreport". This happens very rarely. When it does the daemon was reporting
the resulting informational exception:
MThreadRunner: exception
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 143, in load_lvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 95, in lvs_state_retrieve
l['vdo_operating_mode'],
KeyError: 'vdo_operating_mode'
The daemon retries the operation, which usually works and the daemon continues.
However, simply reporting this informational stack trace is causing CI and other
automated tests to fail as they expect no tracebacks in the log output.
Remove the reporting of this code path unless it persists and causes the daemon
to give up and exit.
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2120267
When the daemon isn't started with --debug we will keep a circular
buffer of the past N number of debug messages which we will output
when we encounter an issue.
Rather than trying to bubble up return codes that get us to exit cleanly
it's better to just raise an exception to bail. In some cases functions
don't have return codes, so they cannot be checked.
We were incorrectly only setting this if --udev wasn't present on the
command line. In all cases when we see a manager.ExternalEvent we want
to set this.
Depending on when an occurs, it maynot have any information available for
lastlog. In this case try to grab an error message from the original
response.
Introduce a new lock for the flight recorder, so that we can dump it when
a command is block waiting for lvm to complete. Also in all paths we will
addthe metadata to the flight recorder before it's done, so we will have
it when a command hangs and we dump the flight recorder. Add the missing
bits after the command has finished.
Cleaned up the output too.
We put this in to test one of the paths in the damon, but unfortunately
if we hit the race condition where the job actually is done we will try
to call j.Wait(1) after the remove. This fails with:
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.UnknownMethod:
Method "Wait" with signature "i" on interface "com.redhat.lvmdbus1.Job"
doesn't exist
Which is caused by the dbus object no longer existing. We could handle
this, but the issue is we no longer have the ability to get the result to
return, they have been lost.
A better solution would be to write a specific unit test to force this code
path and handle all the possible outcomes.
These files are automatically cleared on reboot given
that /run is tmpfs, and that remains the primary way
of clearing online files.
But, if there's extreme use of vgcreate+pvscan+vgremove
between reboots, then removing online files in vgremove
will limit the number of unused online files using space
in /run.
The new option "--fs String" for lvresize/lvreduce/lvextend
controls the handling of file systems before/after resizing
the LV. --resizefs is the same as --fs resize.
The new option "--fsmode String" can be used to control
mounting and unmounting of the fs during resizing.
Possible --fs values:
checksize
Only applies to reducing size; does nothing for extend.
Check the fs size and reduce the LV if the fs is not using
the affected space, i.e. the fs does not need to be shrunk.
Fail the command without reducing the fs or LV if the fs is
using the affected space.
resize
Resize the fs using the fs-specific resize command.
This may include mounting, unmounting, or running fsck.
See --fsmode to control mounting behavior, and --nofsck to
disable fsck.
resize_fsadm
Use the old method of calling fsadm to handle the fs
(deprecated.) Warning: this option does not prevent lvreduce
from destroying file systems that are unmounted (or mounted
if prompts are skipped.)
ignore
Resize the LV without checking for or handling a file system.
Warning: using ignore when reducing the LV size may destroy the
file system.
Possible --fsmode values:
manage
Mount or unmount the fs as needed to resize the fs,
and attempt to restore the original mount state at the end.
nochange
Do not mount or unmount the fs. If mounting or unmounting
is required to resize the fs, then do not resize the fs or
the LV and fail the command.
offline
Unmount the fs if it is mounted, and resize the fs while it
is unmounted. If mounting is required to resize the fs,
then do not resize the fs or the LV and fail the command.
Notes on lvreduce:
When no --fs or --resizefs option is specified:
. lvextend default behavior is fs ignore.
. lvreduce default behavior is fs checksize
(includes activating the LV.)
With the exception of --fs resize_fsadm|ignore, lvreduce requires
the recent libblkid fields FSLASTBLOCK and FSBLOCKSIZE.
FSLASTBLOCK*FSBLOCKSIZE is the last byte used by the fs on the LV,
which determines if reducing the fs is necessary.
This test could only be run when user passes LVM_TEST_DEVDIR=/dev
as it requires and expects actions to be going in this dir, skip
otherwise.
Also 'extend_filter' manages multiple args in on lvm.conf update.
mdraid has some breakage - so 5.19 is crashing
(possibly even some more older version - that can be added as well)
Test seems to pass with 6.0-rc kernel.
Add new define 'newline' for use in 'foreach()'
Add new $(SHOW) for makefile printing output
Add 'make print-VAR' for easier debugging of Makefiles' variables.
Fix lv_active to be of BIN type instead of STR. This allows lv_active to
follow the report/binary_values_as_numeric setting as well as --binary
cmd line switch. Also, it makes it possible to use -S|--select with
either textual or numeric representation of the value, like 'lvs -S
active=active' but also 'lvs -S active=1'.
Take the devices file lock before creating a new devices file.
(Was missed by the change to preemptively create the devices
file prior to setup_devices(), which was done to improve the
error path.)
When using --all, if one VG is skipped, continue adding
other VGs, and do not return an error from the command
if some VGs are added. (A VG is skipped if it's missing PVs.)
If the command fails during devices file setup or device
scanning, then remove the devices file if it has been
newly created by the command, and exit with an error.
If devices from a named VG are not imported (e.g. the
VG is missing devices), then remove the devices file if
it has been newly created by the command, and exit with
an error.
If --all VGs are being imported, and no devices are found
to include in the devices file, then remove the devices
file if it has been newly created by the command, and
exit with an error.
Names matching internal code layout.
Functionc in thin_manip.c uses thin_pool in its name.
Keep 'pool' only for function working for both cache and thin pools.
No change of functionality.
Certain args can't be used in lvm shell ("interactive mode") because
they are not supported there. Add ARG_NONINTERACTIVE flag to mark
such args and error out if we're in interactive mode and at the same
time we detect use of such argument.
Currently, this is the case for --reportformat arg - we don't support
changing the format per command in lvm shell. The whole shell is running
under a reportformat chosen at shell's start.
Commit 73ec3c954b added a way to print
only a part of the report string (repstr) to support decoding individual
string list items out of repstr.
The repstr is normally printed through _safe_repstr_output so that any
JSON_QUOTE character ('"') found within the repstr is escaped to not
interfere with value quoting in JSON format.
However, the commit 73ec3c954b missed
checking the 'len' argument passed to _safe_repstr_output function when
adding the rest of the repstr after all previous JSON_QUOTE characters
were escaped (when calling the last dm_pool_grow_object). When 'len'
is 0, we need to calculate the 'len' ourselves in the function by
simply calling strlen. This is because 'len' is passed to the function
only if we're taking a part of repstr, not as a whole.
If we failed or logged anything before we actually execute given command
in lvm shell, we couldn't report the log using lastlog command after.
This patch adds specific 'pre-cmd' log report object type to identify
such log messages and enables lastlog to report even this log.
pvcreate with --uuid would segfault if a devices file entry matched
the specified pvid, but the devices file entry had no device_id, which
could happen if the entry has a devname idtype.
When the volume size is extended, there is no need to flush
IO operations (nothing can be targeting new space yet).
VDO target is supported as target that can safely work with
this condition.
Such support is also needed, when extending VDOPOOL size
while the pool is reaching its capacity - since this allows
to continue working without reaching 'out-of-space' condition
due to flushing of all in flight IO.
When we wanted to insert '#' before a config line (to comment it out),
we used dm_pool_strndup to temporarily copy the space prefix first so
we can assemble the final line with:
"<space_prefix># <key>=<value>":
out of original:
"<space_prefix><key>=<value>".
The space_prefix copy is not necessary, we can just use fprintf's
precision modifier "%.*s" to print the exact part if we alrady
know space_prefix length.
The new --valuesonly option causes the lvmconfig output to contain only
values without keys for each config node. This is practical mainly in
case where we use lvmconfig in scripts and we want to assign the value
to a different custom key or simply output the value itself without the
key.
For example:
# lvmconfig --type full activation/raid_fault_policy
raid_fault_policy="warn"
# lvmconfig --type full activation/raid_fault_policy --valuesonly
"warn"
# my_var=$(lvmconfig --type full activation/raid_fault_policy --valuesonly)
# echo $my_var
"warn"
Internally, NUM and BIN fields are marked as DM_REPORT_FIELD_TYPE_NUM_NUMBER
through libdevmapper API. The new 'json_std' format mandates that the report
string representing such a value must be a number, not an arbitrary string.
This is because numeric values in 'json_std' format do not have double quotes
around them. This practically means, we can't use string synonyms
("named reserved values") for such values and the report string must always
represent a proper number.
With 'json' and 'basic' formats, this is not an issue because 'basic' format
doesn't have any structure or typing at all and 'json' format puts all values
in quotes, including numeric ones.
When we tested lvm2, the kernel injected various random faults.
(gdb) bt
...
(gdb) p vg
$1 = (struct volume_group *) 0x0
(gdb) p use_previous_vg
$2 = (unsigned int *) 0x0
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Update to more recent version of configure script to support more
new architecture types like RISCV64. Tools in use ATM:
autoconf-2.71-3.fc37.noarch
autoconf-archive-2022.02.11-3.fc37.noarch
automake-1.16.5-9.fc37.noarch
Resolves https://bugzilla.redhat.com/show_bug.cgi?id=2118243
When lvcreate is makeing VDO pool and user has not specified -V size,
ATM we actually run 'vdoformat' twice to get properly 'extent' aligned
size matching lvm2 properties - so the 2nd. run of vdoformat actually
can stay with 'log_verbose()' so the standard printed result
is not showing confusing info (which is now also correctly using
print_unless_silent)
The 'pe_start' column was incorrectly marked as being of type NUM.
This was not correct as pe_start is actually of type SIZ, which means
it can have a size suffix and hence it's not a pure numeric value.
Proper column type is important for selection to work correctly, so we
can also do comparisons while using suffixes.
This is also important for new "json_std" output format which does not
put double quotes around pure numeric values. With pe_start incorrectly
marked as NUM instead of SIZ, this produced invalid JSON output
like '"pe_start" = 1.00m' because it contained the 'm' (or other)
size suffix. If properly marked as SIZ, this is then put in double
quotes like '"pe_start" = "1.00m"'.
In JSON format, we print string list this way:
"key" = "item1,item2,...,itemN"
while in JSON_STD format, we print string list this way:
"key" = ["item1","item2",...,"itemN"]
Before, we stored only the report string itself for a string list
in field->report_string. The field->report_string has either
sorted items or not, depending on what we need for a field -
some report fields have sorted output, some don't...
The field->sort_value.value then contains pointer to the exact
field->report_string. The field->sort_value.items ALWAYS keeps
sorted array of individual items, represented as '[position,length]'
pairs pointing to the field->sort_value.value string.
This approach was fine as far as we didn't need to apply further
formatting to field->report_string. However, if we need to apply
further formatting to field->report_string content, taking into
account individual items, we also need to know where each item
starts and what is its length. Before, we only knew this when
items in report string were sorted, but not in the unsorted version.
We can't rely on the delimiter (default ",") only to separate items
back out of report string, because that delimiter can be contained
in the item value itself.
So this patch enhances the field->report_string for a string list so
it also contains '[position,length]' pairs for each individual item
inside field->report_string. We store this array right beyond the
string itself and we encode it in the same manner we already did for
field->sort_value.items before.
If field->report_string has sorted items, the field->sort_value.items
just points to the array of items we store beyond the report string.
If field->report_string has unsorted items, we store separate array
of items for both field->report_string and field->sort_value.
This patch also cleans up the _report_field_string_list function a bit
so it's easier and more straightforward to follow than the original
version.
Example. If we have "abc", "xy", "defgh" as list on input with ","
as delimiter, then:
- field->report_string will have:
- if we need field->report_string unsorted:
abc,xy,defgh\0{[3,12],[0,3],[4,2],[7,5]}
|____________||________________________|
string array of [pos,len] pairs
|____||________________|
#items items
- if we need field->report_string sorted:
repstr_extra
|
V
abc,defgh,xy\0{[3,12],[0,3],[4,5],[10,2]}
|____________||________________________|
string array of [pos,len] pairs
|____||________________|
#items items
- field->sort_value will have:
- if field->report_string is unsorted:
field->sort_value.value = field->report_string
field->sort_value.items = {[0,3],[0,3],[7,5],[4,2]}
(that is 'abc,defgh,xy')
- if field->report_string is sorted already:
field->sort_value.value = field->report_string
field->sort_value.items = repstr_extra
(that is also 'abc,defgh,xy')
For JSON_STD format, use 'null' if a field has no value at all.
In JSON format, we print undefined numeric values this way:
"key" = ""
while in JSON_STD format, we print undefined numeric values this way:
"key" = null
(Keep in mind that 'null' is different from 0 (zero value) which is
a defined value.)
In JSON format, we print numeric values this way:
"key" = "N"
while in JSON_STD format, we print numeric value this way:
"key" = N
(Where N is a numeric value.)
The original JSON formatting will be still available using the original
DM_REPORT_GROUP_JSON identifier. Subsequent patches will add enhancements
to JSON formatting code so that it adheres more to JSON standard - this
will be identified by new DM_REPORT_GROUP_JSON_STD identifier.
If using JSON format for lvm shell's output, the error message about
exceeding the maximum number of arguments was not reported on output if
this condition was ever hit.
This is because the JSON format (as well as any other future format)
requires extra formatting compared to "basic" format and so it also
requires extra calls when it comes to reporting. The report needs to
be added to a report group and then popped and put on output with
specialized "dm_report_group_output_and_pop_all".
This "output and pop" is normally executed after we execute the command
in the lvm shell. When we didn't get to the command exection at all because
some precondition was not met (like hitting the limit for the number of
arguments for the command here), we skipped this important call and
so there was no log report output.
Right now, it's only this exact error message for which we need to call
"output and pop" directly, all the other error messages are about
initializing and setting the log report itself which we can't report
obviously.
Before this patch:
lvm> pvs 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
lvm>
With this patch applied:
lvm> pvs 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
{
"log": [
{"log_seq_num":"1", "log_type":"error", "log_context":"shell", "log_object_type":"cmd", "log_object_name":"", "log_object_id":"", "log_object_group":"", "log_object_group_id":"", "log_message":"Too many arguments, sorry.", "log_errno":"-1", "log_ret_code":"0"}
]
}
If there's any other error message in the future before we execute the
command itself, we also need to call the "output and pop" directly.
multipath_component_detection=0 has always applied to the filter-based
component detection. Also apply this setting to the duplicate-PV
handling which also eliminates multipath components (based on duplicate
PVs having the same wwid.)
When creating VDO pool based of % values, lvm2 is now more clever
and avoids to create 'unsupportable' sizes of physical backend
volumes as 16TiB is maximum size supported by VDO target
(and also limited by maximum supportable slabs (8192) based on slab
size.
If the requested virtual size is approaching max supported size 4PiB,
switch header size to 0.
Newer VDO kernel target require to have matching virtual size - this
however cause incompatiblity when lvcreate is let to format VDO data
device and read the usable size from vdoformat.
Altough this is a kernel regression and will likely get fixed,
lvm2 can actually reformat VDO device to use properly aligned VDO LV
size to make this problem disappear.
Add function to check for avaialble memory for particular VDO
configuration - to avoid unnecessary machine swapping for configs
that will not fit into memory (possibly in locked section).
Formula tries to estimate RAM size machine can use also with
swapping for kernel target - but still leaving some amount of
usable RAM.
Estimation is based on documented RAM usage of VDO target.
If the /proc/meminfo would be theoretically unavailable, try to use
'sysinfo()' function, however this is giving only free RAM without
the knowledge about how much RAM could be eventually swapped.
TODO: move _get_memory_info() into generic lvm2 API function used
by other targets with non-trivial memory requirements.
Keep single source for most of values printed in lvm.conf
(still needs some conversion)
Correct max for logical threads to 60
(we may refuse some older configuration which might eventually
user higher numbers - but so far let's assume no user have ever set this
as it's been non-trivial and if would complicate code unnecessarily.)
Accept maximum of 4PiB for virtual size of VDO LV
(lvm2 will drop 'header borders to 0 for this case').
Patch 1b070f366b should have
been already fixing this issue but since it the incorrect
patch rebasing the change to vdo_slabSize got lost.
So again now with explicit one-line patch.
Fixes commit b8f4ec846 "display: ignore --reportformat"
by restoring the --reportformat option to pvdisplay.
Adding -C to pvdisplay turns the command into a reporting
command (like pvs, vgs, lvs) in which --reportformat can
be useful.
to compare with wwids in /etc/multipath/wwids when
excluding multipath components. The wwid printed
from the sysfs wwid file may not be the wwid used
in multipath wwids. Save the wwids found for each
device on dev->wwids to avoid repeating reading
and parsing the sysfs files.
Update calling vdo manager since our vdo wrapper has a simple shell
arg parser so it needs args without '='
Also correct using DM_DEV_DIR for 'pvcreate'
Introduce a replacement vdo manager wrapper for testing.
When using test suite on a system without vdo manager (which has got
deprecated) - we still need its functionality to prepare 'vdo volume'
for testing lvm_import_vdo.
Wrapper currently need 2 binaries from older 'vdo 6.2' package -
to be named:
oldvdoformat - format VDO metadata with older format
oldvdoprepareforlvm - shift vdo metadata by 1MiB
When converting VDO volume, the parameter vdo_slabSize was
incorrectly copied as vdo_blockMapCacheSize, however this parameter
is then no longer used for any table line creation so the wrong
value was only stored in metadata.
Also use just single get_kb_size_with_unit_ and remove it's duplicate
functionality with get_mb_size_with_unit_.
Use $VERB for vdo remove call.
Fixes commit 494372b4ee
"filter-mpath: use multipath blacklist"
to handle wwids with initial type digits 1 and 2 used
for t10 and eui ids. Originally recognized type 3 naa.
A typo of the filename after --devicesfile should result in a
command error rather than the command falling back to using no
devices file at all. Exception is vgcreate|pvcreate which
create a new devices file if the file name doesn't exist.
When processing historical LVs inside process_each_lv_in_vg for
selection, we need to use dummy "_historical_lv" for select_match_lv.
This is because a historical LV is not an actual LV, but only a tiny
representation with subset of original properties that we recorded
(name, uuid...).
To use the same processing functions we use for full-fledged non-historical
LVs, we need to use the prefilled "_historical_lv" structure which has all
the other missing properties hard-coded.
Allow to use --vdosettings with lvcreate,lvconvert,lvchange.
Support settings currenly only configurable via lvm.conf.
With lvchange we require inactivate LV for changes to be applied.
Settings block_map_era_length has supported alias block_map_period.
Explicit wwid's from these sections control whether the
same wwid in /etc/multipath/wwids is recognized as a
multipath component. Other non-wwid keywords are not
used, and may require disabling the use of the multipath
wwids file in lvm.conf.
in vgcreate for shared sanlock vg, if sanlock_write_resource
returns an unexpected error, then make init_vg_sanlock fail
which will cause the vgcreate to fail.
When a VG has PVs with different device id types,
it would try to use the idtype of the previous PV
in the loop. This would produce an unncessary warning,
or could lead to using the devname idtype when a better
idtype is available.
In most installations, /dev/sda* or /dev/vda* should be included
in system.devices because the root, home, etc LVs are usually on
sda or vda.
Add a special case warning when a pvscan autoactivation command
sees that /dev/sda* or /dev/vda* are excluded by system.devices,
either not listed or having a different device id.
Change messages that refer to devices being "excluded by filters"
to say just "excluded". This will avoid mistaking the word
"filters" with the lvm.conf filter setting.
Warn if a scsi device is listed in the devices file that
is used by a multipath device that is not listed. This
will happen if a scsi device is listed in the devices
file and then an mpath device is set up to use it.
The way to correct this would be to remove the devices
file entry for the component device and add a new entry
for the multipath device.
When thin-pool had queued some delete message on extension operation
such message has been 'lost' and thin-pool kernel metadata has been
left with a thin volume that no longer existed for lvm2 metadata.
It's more logical to warn about --nolocking in the man page
before it's used rather than after it's used and too late.
Also, warnings are usually for things the user may not know.
dev_name(dev) returns "[unknown]" if there are no names
on dev->aliases. It's meant mainly for log messages.
Many places assume a valid path name is returned, and
use it directly. A caller that wants to use the path
from dev_name() must first check if the dev has any
paths with dm_list_empty(&dev->aliases).
Use dev_cache_get_existing() in a few common, high level
locations where it's obvious that only existing dev-cache
entries are wanted. This can be expanded and used in more
locations (or dev_cache_get can stop creating new entries.)
along with some basic checks for cases when a device
has no aliases.
lvm itself creates many situations where a struct device
has no valid paths, when it activates and opens an LV,
does something with it, e.g. zeroing, and then closes
and deactivates it. (dev-cache is intended for PVs, and
the use of LVs should be moved out of dev-cache in a
future patch.)
Different target type for LV it's not an internal error.
i.e. when target type is replaced with 'error' type - it should be
reported as regular warning and not cause interruption of command with
internall error.
With very old version of DM target driver we have to avoid
trying to use newuuid setting - otherwise we get error
during ioctl preparation phase.
Patch is fixing regression from commit:
988ea0e94c
In a certain disconnected state, a block device is present on
the system, can be opened, reports a valid size, reports the
correct device id (wwid), and matches a devices file entry.
But, reading the device can still fail. In this case,
device_ids_validate() was misinterpreting the read error as
the device having no data/label on it (and no PVID).
The validate function would then clear the PVID from the
devices file entry for the device, thinking that it was
fixing the devices file (making it consistent with the on disk
state.) Fix this by not attempting to check and correct a
devices file entry that cannot be read. Also make this case
explicit in the hints validation code (which was doing the
right thing but indirectly.)
When compiled and used with:
CFLAGS="-fsanitize=address -g -O0"
ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
we have few reported issue - they where not normally spotted, since
we were still accessing our own memory - but ouf of buffer-range.
TODO: there is still something to enhance with handling of #orphan vgids
If a dm device is suspended, we can't run blkid on it. But earlier
rules (e.g. 11-dm-parts.rules) might have imported previously scanned
properties from the udev db, in particular if the device had been correctly
set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing
ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority
devices (such as multipath components) might take over the symlink
temporarily.
Likewise, we should't stop watching a temporarily suspended, but previously
correctly configured dm device.
Signed-off-by: Martin Wilck <mwilck@suse.com>
event based autoactivation is now the only method that lvm
provides for autoactivation.
Setting lvm.conf event_activation=0 can still be used to disable
event based autoactivation commands, but doing so will no longer
enable static autoactivation.
Removes some incorrect and unnecessary checks for other entries
when adding a new devices. The removed checks and corrections were
mostly redundant with what is already done by device id matching.
Other checking is reworked so the warnings are a bit different.
When a device has a wwid (from sysfs), but lvm ignores the wwid,
e.g. because it contains an unreliable "QEMU" value, then lvm
falls back to using IDTYPE=devname (the device name) as the
device id. If the device name changes after reboot, then lvm
automatically searches for the PV on other devices to find the
new device name and correct system.devices. When searching for
the PV, lvm skips looking at devices that would use other id types,
e.g. if a device would use a wwid and not a devname, then it
skips checking it. However, it failed to account for the fact
that a device may have wwid that was ignored, in which case it
should be checked.
. error exit means that lvmdevices --update would make a change.
. remove check of PART field from --check because it isn't used.
. unlink searched_devnames file to ensure check|update will search
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
After a vg_write, this function was used to attempt to
make lvmcache data match the new state written to disk.
It was not updated correctly in a many or most cases,
and the resulting lvmcache is not actually used after
vg_write, making the update unnecessary.
The destination vg is first written with the EXPORTED flag,
then the source vg is written, then the destination vg is
written again without the EXPORTED flag. Remove an unnecessary
vg_read of the destination vg just before the second write.
This reverts commit bd2baeaaa6.
This commit broke vgrename because vgrename relies on old bugs
in lvmcache_update_vg_from_write and lvmcache_update_vgname
which need to be fixed first.
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
When storage is lost under a sanlock VG, and kill_vg/drop_vg
are used, sanlock_rem_lockspace() may return an error, but
the cleanup steps should still be performed. Without the
cleanup, gl_lsname_sanlock was not cleared. This caused
future lock requests to fail with ENOLS, but the NO_GL_LS
flag was not set due to gl_lsname_sanlock being set.
This caused lockd_global(sh) in lvm commands to fail when
they could succeed.
Fix from guozhonghua216
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
New API extension (internal ATM) for getting a list
of active DM device with extra features like i.e. uuid.
To easily lookout for existing UUID in device list,
there is: dm_device_list_find_by_uuid()
Once the returned structure is no longer usable call:
dm_device_list_destroy()
Struct dm_active_device {} holds all the info,
but is always allocated and destroyed within library.
TODO: once it's stable, copy to libdm
We used to reset 'settings' to their defaults after command is finished.
This however has a drawback we lose all the logging after this point.
So this patch disables this 'reset' to observe for side-effects.
lvm shell should be getting reset when next command is run -
so this might or might not have some 'hidden' effects.
ATM it looks like nothing really bad should happen - we just should be
able to get more logs - at least from normal commands.
If the transient service remains after it's done, then
it prevents the same transient service from being run
again later if the PVs are detached and reattached
(although the behavior of a second autoactivation is not
well defined and may only work in limited cases.)
Description stolen from linux d/b/rbd.c L3:
rbd.c -- Export ceph rados objects as a Linux block device
16 partitions seem to make sense according to L90:
#define RBD_SINGLE_MAJOR_PART_SHIFT 4
Running *scan -vvvvvvdddddd yields
#filters/filter-type.c:28 /dev/rbd1p5: Skipping: Unrecognised LVM device type 252
#filters/filter-persistent.c:131 filter caching bad /dev/rbd1p5
right now, and adding
types = ["rbd", 252]
to /e/l/lvm.conf (with the matching "252 rbd" in /p/devices) works as a
per-machine fix:
rbd1 252:16 0 1T 1 disk
|-rbd1p1 252:17 0 243M 1 part
|-rbd1p2 252:18 0 1K 1 part
`-rbd1p5 252:21 0 1023.8G 1 part
`-dev01--vg-root 253:0 0 1023.8G 0 lvm
but rbd is supported by upstream so it'd be nice to have it work OOB
Fixes commit "pvscan: match device arg to filter symlink"
which failed to account for the fact that filter entries
are not just path names but include "a" or "r", etc.
The device_hint name in the metadata was meant to prevent
autoactivation from md components, but the name checks were
more general and would catch unnecessary cases.
This fixes an issue related to the optimization in
"pvscan: only add device args to dev cache"
If the devices file is not used, and the lvm.conf filter
accepts devices via symlink names, then those devices won't
be accepted by pvscan for autoactivation. To resolve this,
recognize when the filter contains symlinks and disable the
optimization. When the optimization is disabled, a full
dev_cache_scan is performed, and symlinks are associated
with the device names passed to pvscan. filter-regex
will accept a device if symlinks to that device are accepted.
Improve handling of md components that get through the
filter, like the previous improvement for multipath.
If md components get through the filter and trigger
duplicate PV code, then eliminate any devs entirely
that are not an md device.
If multipath component devices get through the filter and
cause lvm to see duplicate PVs, then check the wwid of the
devs and drop the component devices as if they had been
filtered. If a dm mpath device was found among the duplicates
then use that as the PV, otherwise do not use any of the
components as the PV.
"duplicate PVs" associated with multipath configs will no
longer stop commands from working.
Remove the searched_devnames file in a couple more places:
. When hints need refreshing it's possible that a missing
devices file entry could be found by searching devices
again.
. When a devices file entry devname is first found to be
incorrect, a new search for missing entries may be
useful.
When devnames are used as device ids and devnames change,
then new devices need to be located for the PVs. If the old
devname is now used by a filtered device, this was preventing
the code from searching for the new device, so the PV was
reported as missing.
If the optimized label scan fails (using online files),
then clear the device state prior to falling back to the
standard label_scan. This avoids printing output about
unexpected state.
Consistently create the pvs_lookup file for VGs with
more than one PV. Previously the file create would be
skipped if all the PVs happened to already be online.
That led to unpredicatable results in an uncommon case
(when the last PV to come online is the only PV with
metadata.)
Copy another optimization from pvscan -aay to vgchange -aay.
When using the optimized label scan for only one VG, acquire the
VG lock prior to the scan. This allows vg_read to then skip the
repeated label scan that normally happens after locking the vg.
Include the device name in the /run/lvm/pvs_online/pvid files.
Commands using the pvid file can use the devname to more quickly
find the correct device, vs finding the device using the
major:minor number. If the devname in the pvid file is missing
or incorrect, fall back to using the devno.
For completeness and consistency, adjust the behavior
for some variations of:
vgchange -aay --autoactivation event [vgname]
The current standard use is with a VG name arg, and the
command is only called when all pvs_online files exist.
This is the optimal case, in which only pvs_online devs
are read. This remains the same.
Clean up behaviors for some other unexpected uses of the
command:
. With no VG name arg, the command activates any VGs
that are complete according to pvs_online. If no
pvs_online files exist, it does nothing.
. If a VG name is used but no PVs online files exist for
the VG, or the PVs online files are incomplete, then
consider there could be a problem with the pvs_online
files, and fall back to a full label scan prior to
attempting the activation.
Part of the optimization to avoid a full dev_cache_scan requires
translating major:minor numbers to a device name. If this devno
translation fails, then fall back to doing a full dev_cache_scan
which is slower but certain to provide the info. This preserves
the most important part of the label scanning optimization in the
vgchange aay (avoiding dev_cache_scan is a relatively small part
of the optimized activation compared to label scanning.)
Port another optimization from pvscan -aay to vgchange -aay:
"pvscan: only add device args to dev cache"
This optimization avoids doing a full dev_cache_scan, and
instead populates dev-cache with only the devices in the
VG being activated.
This involves shifting the use of pvs_online files from
the hints interface up to the higher level label_scan
interface. This specialized label_scan is structured
around creating a list of devices from the pvs_online
files. Previously, a list of all devices was created
first, and then reduced based on the pvs_online files.
The initial step of listing all devices was slow when
thousands of devices are present on the system.
This optimization extends the previous optimization that
used pvs_online files to limit the devices that were
actually scanned (i.e. reading to identify the device):
"vgchange -aay: optimize device scan using pvs_online files"
Port the old pvscan -aay scanning optimization to vgchange -aay.
The optimization uses pvs_online files created by pvscan --cache
to derive a list of devices to use when activating a VG. This
allows autoactivation of a VG to avoid scanning all devices, and
only scan the devices used by the VG itself. The optimization is
applied internally using the device hints interface.
The new option "--autoactivation event" is given to pvscan and
vgchange commands that are called by event activation. This
informs the command that it is being used for event activation,
so that it can apply checks and optimizations that are specific
to event activation. Those include:
- skipping the command if lvm.conf event_activation=0
- checking that a VG is complete before activating it
- using pvs_online files to limit device scanning
The information in /run/lvm/pvs_online/<pvid> files can
be used to build a list of devices for a given VG.
The pvscan -aay command has long used this information to
activate a VG while scanning only devices in that VG, which
is an important optimization for autoactivation.
This patch implements the same thing through the existing
device hints interface, so that the optimization can be
applied elsewhere. A future patch will take advantage of
this optimization in vgchange -aay, which is now used in
place of pvscan -aay for event activation.
When a device id is set for a device, using an idtype other
than devname, it means that sysfs has been used with the device
to match the device id. So, checking for a sysfs entry for the
device in filter-sysfs is redundant. For any other cases not
covered by this (e.g. devname ids), have filter-sysfs simply
stat /sys/dev/block/major:minor to test if the device exists
in sysfs.
The extensive processing done by filter-sysfs init is removed.
It was taking an immense amount of time with many devices, e.g.
. 1024 PVs in 520 VGs
. 520 concurrent vgchange -ay <vgname> commands
. vgchange scans only PVs in the named VG (based on pvs_online
files from a pending patch)
A large number of the vgchange commands were taking over 1 min,
and nearly half of that time was used by filter-sysfs init.
With this patch, the vgchange commands take about half the time.
Help bootstrapping existing shared vgs into the devices file.
Reading the vg in vgimportdevices would require locking to be
started, but vgchange lockstart won't see the vg if it's not
in the devices file. The lvmlockd locks are not protecting
vg modifications so skipping them here won't be a problem.
As we are not using 'enable-compat' for anything, remove this section.
Also remove duplicated check for blkid.
Move setting of some AC_ARG_ENABLE defaults into the macro so it's always
defined.
When scanning configured /dev dir, avoid entring
directories with different filesystem.
This minimizes risk we will block on i.e. entring
directory with mount point.
* TODO: Add banner for Important News: Critical Bugs, Important Announcements,...
-->
<!--
* TODO: Add a feed for latest articles/release-notes on the right
-->
<!--
## About LVM2
-->
LVM aka LVM2 refers to the userspace toolset that provide logical volume
management facilities on linux.
<!--
It is reasonably backwards-compatible with the
original LVM1 toolset.
* TODO: Add information about LVM1 metadata format conversion!
-->
LVM offers more flexibility than using partitions, allowing one to
* grow and, where supported by filesystem, shrink volumes,
* create snapshots of existing volumes,
* mirror data on multiple disks including RAID levels 5 or 6,
* striping data on multiple disks,
* create a read or write cache.
To use LVM2 you need 3 things:
* [device-mapper](https://sourceware.org/dm/) in your kernel (upstream since long ago)
* the userspace device-mapper support library (*libdevmapper*) (part of lvm2)
* and the userspace LVM2 tools.
## Getting LVM
Most of linux distribution offer packaged LVM tools.
Depending on your distribution use
# RPM based distributions (Fedora):
yum install lvm2
# DEB based distributions (Debian, Ubuntu):
apt-get install lvm2
Tarballs of the userspace LVM2 source code releases are available from [sourceware.org](https://sourceware.org/pub/lvm2/) [ftp](ftp://sourceware.org/pub/lvm2/).
List of official [mirror sites](https://sourceware.org/mirrors.html) (including http and rsync protocols).
### LVM Releases
[[!inline pages="release-notes/2.03.* and !*/template and !*/Discussion and !tagged(draft) and !tagged(pending)" limit="2" show="2" rootpage="release-notes"]]
[[More releases|release-notes/index]]
## Getting Started
<!--
TODO: We are missing a lvm(7) man page explaining this, I think it would be a nic addition!
And perhaps so would be a lvmtroubleshooting(7) guide.
-->
Word of warning first! Even though LVM errs on the side of data safety it is a
tool with low level access and one may seriously harm their data when used
incorrectly!
* Physical Volume (PV) is underlying disk, local or remote, encrypted or even
a mdadm RAID volume. PV is divided into so called Physical Extents (PE) which
are a basic allocation unit.
List PVs using [pvs(8)](https://man7.org/linux/man-pages/man8/pvs.8.html) or
Make one by running `lvcreate [-n LVNAME] -L SIZE VGNAME`, and you are done!
See [vgcreate(8)](https://man7.org/linux/man-pages/man8/vgcreate.8.html).
## Avoiding Problems
Good start is to avoid using `{--force|-f}` and `{--yes|-y}` options which are
often seen on internet discussions.
there is a possibility of data loss, LVM tools usually ask, so read the prompts
carefully! Using `--yes` removes these safety.
Also in some cases where it is too dangerous to proceed, e.g. device is used,
LVM refuses to do so, which can be overridden by `--force`.
Second, when resizing and especially when shrinking LVs it is always a good
idea to use `--resizefs` option which ensures the devices are resized in
correct order.
Third, if you still make a mess, never ever run fsck on damaged LV/FS, this is
usually the final blow to your data. It is always better to ask first!
## Documentation
## Resolving Problems
* Backup if possible!
* Search the problem first, check the list of [[common problems|Problems]]
* Never run `fsck` on damaged LV, LV must be recovered first!
* When asking for help describe exactly how the system got corrupted. It really
does not help trying to cover one's mistakes in such situation, it takes
longer to get help and also you are likely to get wrong answer making repair
impossible.
## Reporting Bugs
* When you find a problem there is often something specific about your system.
If the problem is reproducible run the failing command(s) with verbose flag
`-vvvv` which gives developers clue where the problem might be.
There is a [lvmdump(8)](https://man7.org/linux/man-pages/man8/lvmdump.8.html)
tool to help collect data about your system, block devices and LVM setup.
* Please report upstream bugs or request features in [Red Hat Bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper)
<!--
TODO:
* Add links to other documentation
* Add links to git
* Add links to mailing lists
* Resolving problems
* Backup if possible!
* Newer run fsck! Do the research first!
* List of Common issues
* Resizing in wrong order
* Thin pool running out of space
* Configuration - duplicates
* Mailing list
* IRC?
* Reporting Bugs
* sosreport/lvmdump
* BZ
* Contributing
* gitlab MR
* Add latest articles
-->
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.