IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The process of using persistent reservations for recovery:
host A owns a lock
host A fails
host B requests the lock
host B request fails because A owns the lock
host A enters the FAIL state in sanlock
host B retries the lock, and sees owner A is failed
host B runs lvmpersist to remove the PR key of host A
host B tells sanlock that host A is dead
host B retries the lock, which is now granted by sanlock
The new option: --setlockargs persist,notimeout
adds "persist" and "notimeout" components to the
lock_args string in the VG metadata, viewable with
vgs -o+lockargs. The option is available in vgcreate
or vgchange.
. "persist" tells lvmlockd to remove the PR key of
a failed host, by running lvmpersist, and set the
sanlock host state to "dead".
. "notimeout" tells lvmlockd to configure sanlock
leases to not time out. sanlock does not use the
watchdog to protect leases that do not time out.
With this combination, PR removal replaces the watchdog
for fencing hosts with expired leases that are blocking
other hosts.
The lock_args version component is "2.0.0" when
these new settings are used, otherwise remains
"1.0.0". Previous lvm versions will not start a
VG with lockargs version 2.
Requires sanlock version 4.2.0 or later.
Use only hostid-based PR keys for sanlock VGs.
This will be required for PR-based recovery with sanlock.
Changes how persist_start() is used to update an existing key.
Previously, the updated key was passed to persist_start as if
it was the local pr_key setting. It's now passed as a separate
parameter.
This also fixes an incorrect warning in vgchange --persist check
when checking a hostid-based key while the lockspace is stopped.
Test doesn't need to be there for 100second - shorten timeout.
Normally this is not an issue if the background process does not
have an output, but if the output is enabled, it may block exit.
When we disable 'delay' for pvmove as not useful, whole propagation
becomes useless - so drop it for now until we have better plan.
Previous patch was not working well for some snapshot construction.
Change delay_resume_if_new propagation from "> 1" to "!= 1" to ensure
devices are resumed when required by parent devices. This specifically
fixes issues where RAID devices using legs that are being pvmoved need
their preloaded legs resumed before the RAID device itself is preloaded.
The fix ensures pvmove operations start when needed by parent devices,
even though this means pvmove begins before metadata commit. This is
acceptable since pvmove mirrors to previously unallocated extents.
Add synchronization after deactivate_lv() to prevent races between
deactivation and subsequent activation using the same udev cookie.
This ensures proper ordering when wiping prepared integrity metadata
devices during RAID integrity setup, affecting both active and inactive
code paths.
Fix commit ab1405034d
"persist: vgremove should check for other keys"
which counted keys without eliminating the duplicates
that are reported for mpath devices, so vgremove would
report multiple registered keys for a single host using
an mpath device.
Address Coverity static analysis warnings by initializing variables
that could theoretically be used uninitialized in edge cases:
While these uninitialized uses are unlikely to occur in practice due to
the surrounding logic, explicit initialization eliminates potential static
analysis warnings.
Add private -real suffix for active _iorig and _imeta volumes.
TODO: ATM we will keep using "-real" suffix also for raid leg with integrity.
Better fit would be i.e. "-integ" suffix for such LV, but this
requires more complicated table manipulation as we would need
to also change the raid leg name, so leave this for later...
Usage of "-real" suffix to keep raid leg private is already better
than previous state.
Integrity segment is not using 'origin' anywhere and every code
is now using seg_lv(seg, 0) to reach origin LV from Integrity segment.
So do not set this pointer to any value for integrity segment.
Also for lv_origin_lv() presence of origin is already made by
validation - integrity must have origin defined.
Reorder integrity cleanup operations to ensure proper LV state before
device suspension. Move layer removal and visibility changes to occur
before the first metadata write, so that _iorig and _imeta LVs are
already marked as public and visible during suspend/resume operations.
This prevents issues where the device mapper operations occur on LVs
that are still marked as internal components rather than proper public
LVs ready for removal.
Note: this code had not set ->origin used in other places,
but it's going to be replaced with use of seg_lv().
Extract common integrity origin detection logic into new helper function
lv_integrity_from_origin() to eliminate code duplication:
- Add lv_integrity_from_origin() that returns the integrity LV for a given origin
- Refactor lv_is_integrity_origin() to use the new helper function
- Replace duplicate logic in lv_raid_integrity_image_in_sync() with helper call
- Consolidate multiple condition checks into single if statement
- Replace direct seg->origin access with seg_lv(seg, 0) for consistency
The seg_lv(seg, 0) approach is preferred throughout the codebase as it
provides a consistent interface to access the origin LV from segments.
With recent change to use -real suffices for raid/mirror legs,
we need to take special care for devices that already use -real
suffix normally - patchset missed check for external origins.
Also avoid even creating UUID for these LVs in _lv_info_real().
Extend "-real" suffix handling to mirror operations, similar to RAID
functionality for managing internal-to-public LV transitions.
- Add reactivation logic in _split_mirror_images() to update UUIDs
- Enable "real" suffix for mirror images and logs in build_dm_uuid()
- Detect temporary mirror sync layers using MIRROR_SYNC_LAYER pattern
- Force deactivate/activate cycle to transition from private to public UUID
This ensures mirror split operations properly transition split LVs from
internal components with private UUIDs to public LVs with correct UUIDs.
After RAID leg split operations, the resulting LV becomes public but
retains its private "-real" UUID suffix from when it was an internal
component. Since active device mapper devices cannot change their UUID,
the split LV must be deactivated and reactivated to acquire the proper
public UUID.
Add reactivation logic to lv_raid_split() and lv_raid_split_and_track()
that handles both shared and non-shared volume groups, ensuring split
LVs are properly transitioned from internal to public state.
Add support for managing RAID internal LVs that use "-real" suffix
during transitions from internal components to public LVs.
- Add _lv_info_real() to detect orphan devices with private UUIDs
- Add _lv_adjust_real_uuid() to handle UUID mismatches in device tree
- Use "real" suffix for non-visible RAID images and metadata in build_dm_uuid()
- Integrate checks into activation and device manager code paths
This handles edge cases where RAID leg splits leave public devices active
with private "-real" UUIDs, requiring explicit reactivation.
Note: RAID image and metadata gets "-real" suffix only when it is marked
as not visible LV in lvm2 metadata as there is also a public activation
of these devices.
Add _set_optional_uuid_suffixes() helper to reduce code duplication
when calling dm_tree_set_optional_uuid_suffixes() with the global
_uuid_suffix_list.
Sometimes sg_persist/mpathpersist commands will fail if the
device returns a Unit Attention, e.g. if the host's last
registration was cleared. Run sg_turs on each scsi/mpath
device at the start of each command to clear any UA errors.
This is simpler than adding code to retry commands throughout
the script if they happen to hit a UA error.
Add comprehensive validation to ensure args.h option definitions follow
the established sorting rules:
1. Options without short options - sorted alphabetically
2. Alias options (without descriptions) - sorted alphabetically
3. Options with short options - sorted by uppercase symbol with
lowercase variants first, then by long option name
Changes include:
- Enhanced validation logic in _find_lvm_command_enum() to check args.h
ordering during MAN_PAGE_GENERATOR builds
- Improved error messages showing which entries violate sorting rules
- Added _sort_opt_args() function to sort optional/required option args
for consistent man page and help generation output
- Fixed validation flag handling to properly report multiple errors
This ensures the args.h reorganization from the previous commit is
maintained and prevents future ordering regressions that could affect
code maintainability and generated documentation consistency.
tools: reorganize args.h argument definitions by sorting rules
Reorganize argument definitions in args.h to follow consistent sorting:
1. Options without short options - sorted alphabetically
2. Alias options (unadvertised, no description)
3. Options with short options - sorted by uppercase symbol
with lowercase variants first, then by long option name
This improves code organization and maintainability by establishing
a clear, documented ordering for LVM command line argument definitions.
No functional changes - only reordering of existing argument definitions.
Also note there should never be a command using 2 options with the same
short option symbol
Previously, setpersist was only supported in vgchange
on an existing VG. The PR is acquired exclusively before
the devices are modified, and in the case of a shared
VG the PR is subsequently changed to a shared mode.
adding a new helper function to create key file path.
Including vgid protects against cases of reading a
stale key file that was left over from a previous VG
with the same name.
vgremove should check for keys registered by other hosts
before removing the VG to avoid leaving dangling PR keys
on devices. This involves refactoring the related commit
ca6fe99162 "lvmpersist: fix vgremove when another key is registered"
that separated persist_stop into before and after parts.
When an XFS file system was previously mounted with quota mount options
(combination of -o uquota,gquota,pgquota) and then we are mounting the
file system as part of the lvresize/lvextend operation (through the fs
resize helper script), we need to preserve the quota mount options.
Otherwise, the XFS would need to recheck quotas - in that case the kernel
log contains:
XFS (<device>): Quotacheck needed: Please wait.
This may take a long time, depending on the file system size.
Related issue: https://github.com/lvmteam/lvm2/issues/182
Previously this was hard-coded to: "Autoactivation commands use a number
of temp files in /run/lvm (with the expectation that /run is cleared
between boots.)"
Since c1bfc8737f it was made more generic,
but on some systems this logic leads to "Autoactivation commands use a
number of temp files in /run/lvm (with the expectation that /var/run
is cleared between boots)." which I'd say adds more confusion than it
solves.
As man pages now tend to print options as man macro reference,
we may still need some local 'specialization'.
ATM --persist option is known expection where command may accept only
certain argument to be recognized/allowed - so generic String printed
in Italic can be replaced with just specific argument printed in Bold.
TODO: recognize only cases where the 'generic' String is NOT used and
for rest of them use the common O_persist macro reference.
The ls struct was being freed after ls thread exit for common stop,
but was missing when the ls thread was stopped for drop and rename.
Also free ls->actions structs in case any still exist.
Replace %4095c sscanf format specifier with position-based copying.
The _stats_parse_list_region() function previously used %4095c to read
the remaining part of the input buffer. Recent glibc versions
(2.42.9000-3.fc44) have changed the behavior of this format
specifier to align with C standard specifications, which affects
parsing when the buffer contains fewer than 4095 characters.
To ensure compatibility across different glibc versions, switch to
using %n to track the parsing position and manually copy the string
data using dm_strncpy().
Note: Binaries compiled with the previous implementation may experience
compatibility issues with newer glibc versions.
Remove per-thread grace_mutex and use existing global_mutex
with per-thread condition variables.
We separate take of grace mutex there was slight race chance
of hitting wait condition while 2nd. thread already signaled wakeup.
With the use of global mutex (which must be held before and after
anyway), there is no such race chance even possible.
With old systems we cannot always check /dev/mapper/vg-lv name,
as those might be symlinks managed by udev. However monitoring
is not 'synchronized' with udev as dmventd API is major:minor
based and does not needed symlinks. So to 'compensate' this
and make just 'old system' working - just check also for
presence of /dev/dm-X device.
Improve _get_device_inode() function with kernel version check
For kernel >=3: Use sysfs path format (/sys/dev/block/major:minor).
For kernel <3: Use /dev/mapper device path format.
Add explicit handling for very old kernels (<3.x) that don't change
inode numbers for /sys device-mapper devices.
For vgremove, split persist stop steps into:
1. prepare, which gets the pr key and VG devices,
and happens before the normal vgremove.
2. run, which removes pr key from the VG devices,
and happens after the normal vgremove.
This is necessary if another PR key is registered on the device
(which likely doesn't happen in the normal usage pattern, but is
still possible.)
When another key exists, removing the local PR key will cause the
machine to lose the ability to write VG metadata to the device.
So, the removal of the local PR key must happen after the VG metadata
is written for vgremove. The prepare step must remain at the beginning
of the command, while the list of PVs in the VG still exists.
Unlock devices file at the end of cmd processing, after all the PVs are
processed, not after the first processed PV.
Before this patch:
❯ pvchange -u /dev/sda /dev/sdb
Physical volume "/dev/sda" changed
WARNING: Devices file unlock no fd.
Physical volume "/dev/sdb" changed
2 physical volumes changed / 0 physical volumes not changed
With this patch applied:
❯ pvchange -u /dev/sda /dev/sdb
Physical volume "/dev/sda" changed
Physical volume "/dev/sdb" changed
2 physical volumes changed / 0 physical volumes not changed
The lvm_run_command/devices_file_exit/unlock_devices_file will
do the unlock at the very end of cmd processing.
Fix several problems in recent dmeventd grace period enhancement:
1. **Timer reset bug**: Threads exiting grace period were starting
time period with rather random time 'offset' causing change
in behavior and firing monitor action in non-deterministic time.
Reset timer (next_time = current_time + timeout) AFTER grace
period wait.
2. **Device identity verification**: Add inode tracking to prevent
incorrect thread reuse when device UUID is recycled. Grace period
thread lookup now verifies device inode matches to ensure same
physical device. This avoid reusing of monitored values for
a different device.
'diskseq' was also considered as unique identifier, but inode
check seems to be easier to check and is not 'so new'.
Also another solution could have been to change dmeventd protocol,
and introduce supend/resume calls - but this could be another source
of problem.
3. **Signal handling order**: Move SIGALRM reset to occur BEFORE
entering grace period instead of after. This prevents potential
error path trouble that may have shutdown reused thread causing
the actual monitoring to be silently lost.
4. **Event state cleanup**: Clear current_events before grace period
to prevent stale event processing after thread reuse.
Note: wondering if there can be any problem with inode check....
vgchange --persist stop requires locking to first be stopped
(vgchange --lockstop). Allow setting --lockopt force to bypass
this check so that PR can be forcibly stopped.
Fix commit b2bc06caf8
"lvmpersist: check devices support PR before certain commands"
which broke clear and read commands. Change to skip individual
devices in clear/read if they don't support PR.
The pv/vg/lvdisplay (without -C|--columns) does not use the reporting
mode for the output. However, we can still allow the --reportformat
option for these commands, but it will affect only the log report,
like we have for other non-reporting lvm commands
(e.g. lvchange,vgchange...).
Just like process_each_vg and process_each_lv, the top-level process_each_pv
should also initialize the processing handle (if the handle not already passed
as argument for use). This is mainly important, among others, for proper
log_report tracking and proper output formatting.
For example:
Before this patch - there are two "log" sections as the processing handle is
initialized dynamically more times if not passed explicitly from the top-level
process_each_pv function:
❯ pvdisplay --config 'log/report_command_log=1 report/output_format=json'
{
"log": [
...
]
}
{
"log": [
...
]
}
❯ pvdisplay --config 'log/report_command_log=1 report/output_format=json' | jsonlint
<stdin>:17:2: Error: Unexpected text after end of JSON value
| At line 17, column 2, offset 4463
<stdin>: has errors
With this patch applied:
❯ pvdisplay --config 'log/report_command_log=1 report/output_format=json'
{
"log": [
...
]
}
❯ pvdisplay --config 'log/report_command_log=1 report/output_format=json' | jsonlint
<stdin>: ok
--configreport, --logonly, --reportformat and --select options can be
used only if -C|--columns is used at the same time. These options are
only to control the reporting infrastructure. We switch to reporting
mode using that -C|--columns switch.
Using the options for reporting mode if reporting is not initiated
(using the -C|--columns switch) can cause various issues and
inconsistencies in the output.
See also https://github.com/lvmteam/lvm2/issues/144.
When cachevol was used for the cache LV, the check for not splitting an
LV between two VGs was incorrect, not allowing to split the VG in cases
it should have been possible.
For example, splitting a VG which contains the cache LV with cachevol
only on two devices (sda and sdb here) and leaving third device completely
unused (sdc here) should clearly allow us to split it into a new VG:
❯ vgcreate vg /dev/sd{a..c}
Physical volume "/dev/sda" successfully created.
Physical volume "/dev/sdb" successfully created.
Physical volume "/dev/sdc" successfully created.
Volume group "vg" successfully created
❯ lvcreate -l2 -n main vg /dev/sda
Logical volume "main" created.
❯ lvcreate -l2 -n fast vg /dev/sdb
Logical volume "fast" created.
❯ lvconvert -y --type cache --cachevol fast vg/main
Logical volume vg/main is now cached.
❯ lvs -a -o name,devices vg
lv_name devices
[fast_cvol] /dev/sdb(0)
main main_corig(0)
[main_corig] /dev/sda(0)
❯ lsblk -o name /dev/sd{a..c}
NAME
sda
└─vg-main_corig
└─vg-main
sdb
└─vg-fast_cvol
├─vg-fast_cvol-cdata
│ └─vg-main
└─vg-fast_cvol-cmeta
└─vg-main
sdc
Before this patch:
❯ vgsplit vg vg2 /dev/sdc
Logical volume vg/main must be inactive.
❯ vgchange -an vg
0 logical volume(s) in volume group "vg" now active
❯ vgsplit vg vg2 /dev/sdc
Can't split LV main between two Volume Groups
With this patch applied:
❯ vgsplit vg vg2 /dev/sdc
New volume group "vg2" successfully split from "vg"
The test shell/integrity.sh creates raid arrays, corrupts one of the
legs, then reads the array and verifies that the corruption was
corrected. Finally, the test tests that the number of mismatches on the
corrupted leg is non-zero.
The problem is that the raid1 implementation may freely choose which leg
to read from. If it chooses to read the non-corrupted leg, the corruption
is not detected, the number of mismatches is not incremented and the test
reports this as failure.
Fix this failure by marking the non-corrupted leg as "writemostly", so
that the kernel doesn't try to read it (it reads it only if it finds
corruption on the other leg).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
The test shell/integrity.sh creates raid arrays, corrupts one of the
legs, then reads the array and verifies that the corruption was
corrected. Finally, the test tests that the number of mismatches on the
corrupted leg is non-zero.
The problem is that the raid1 implementation may freely choose which leg
to read from. If it chooses to read from the non-corrupted leg, the
corruption is not detected, the number of mismatches is not incremented
and the test reports this as a failure.
Fix the test by not checking the number of integrity mismatches for
raid1.
Fix check for not splitting an LV between two VGs in case
where the LVs contains an internal layer.
For example, integrity layer for RAIDs and splitting a PV that
is not part of the RAID LV at all (sdc here):
❯ vgcreate vg /dev/sda /dev/sdb /dev/sdc
Volume group "vg" successfully created
❯ lvcreate -l1 -m1 --type raid1 --raidintegrity y vg /dev/sda /dev/sdb
Logical volume "lvol0" created.
Before this patch:
❯ vgsplit vg vg2 /dev/sdc
Logical volume vg/lvol0_rimage_0 (part of lvol0) must be inactive.
❯ vgchange -an vg
0 logical volume(s) in volume group "vg" now active
❯ vgsplit vg vg2 /dev/sdc
Can't split LV lvol0_rimage_0 between two Volume Groups
With this patch applied:
❯ vgsplit vg vg2 /dev/sdc
New volume group "vg2" successfully split from "vg"
For our tests we expect very fast reaction.
So to keep i.e. reporting for the test 'lvextend-thin-metadata-dmeventd'
fast use just 2 seconds grace period (thread reuse).
Add thread reuse mechanism to reduce overhead when devices are frequently
registered/unregistered by introducing a configurable grace period where
monitoring threads wait for potential reuse before termination.
Key changes:
- Add -g option to configure grace period (0-300 seconds, default: 10s)
- Introduce DM_THREAD_GRACE_PERIOD state for threads awaiting reuse
- Implement thread reuse for matching device/dso combinations
- Add pthread condition variables and mutexes for grace period synchronization
- Add thread usage counter and enhanced debug logging
- Add _reset_pending_signal() to handle SIGALRM cleanup on thread reuse
- Refactor _monitor_thread() to support grace period workflow
- Update timeout thread to skip non-running threads
- Simplify thin plugin logging
This optimization significantly reduces thread creation/destruction overhead
in scenarios with rapid device registration/unregistration cycles, such as
creating snapshots of thin volumes, while maintaining proper cleanup and
thread safety.
No longer returning EINVAL when the dmeventd is not monitoring
any device (there are no registered devices).
This makes usable i.e. 'dmeventd -R' for the case, the dmeventd
was not monnitoring anything during this call.
Previously this actually cause refuse of 'restart' - since
restarting 'dmeventd' has seen failing resutl of a call:
DM_EVENT_CMD_GET_STATUS
Common vgextend commands were logging a message about
dev_read_reservation when PR was not being used.
Only attempt PR work from vgextend when either the
PR require or autostart settings are enabled on the VG.
Add device existence check after successful wait ioctl to prevent
processing events on removed devices.
When dm_task_run() succeeds on a WAITEVENT ioctl, the success might
actually indicate that the device was removed rather than a genuine
device event. Without this check, dmeventd would attempt to process
ERROR events on non-existent devices, leading to unnecessary error
commands and log noise.
This fix adds a _fill_device_data() call immediately after successful
wait completion to verify the device still exists. If the device is
gone, execution jumps to the existing ENXIO error path which properly
handles device disappearance.
Note: Future kernel versions may return ENXIO directly from the wait
ioctl when devices are removed, making this workaround unnecessary.
Until then, this extra INFO ioctl provides the needed verification.
Check first whether the monitored device actually really exists,
before resolving its device name.
In the case device is not present in DM table, fail _fill_device_data().
Break down complex one-liner directory check into explicit steps with
clear error messages and comments.
The original code used a dense subshell expression combining multiple
tests with logical operators, making it difficult to understand and
maintain. This refactoring separates the logic into:
1. Check directory accessibility (read/write/execute permissions)
2. Check if directory is empty (including hidden files)
Each check now has a specific error message, making it easier for users
to understand what went wrong. The functionality remains identical, but
the code is now more maintainable and debuggable.
Changes:
- Split accessibility and emptiness checks into separate if statements
- Add descriptive comments explaining each validation step
- Provide specific error messages for different failure conditions
- Maintain original behavior including dotglob/nullglob handling
Replace size_t with uint64_t in struct vdo_header to ensure consistent
64-bit size field across all architectures.
On 32-bit platforms, size_t is only 4 bytes, which causes incorrect
structure layout when reading VDO metadata that expects a 64-bit size
field. This fix ensures the VDO header structure maintains the same
binary layout regardless of target architecture.
While 32-bit architectures are not officially supported for VDO,
this change improves code correctness and prevents potential issues
during cross-compilation or when building on 32-bit development
environments.
If we have LV->crypt->FS stack, check that the adjustment for the crypt
data offset will left space for the crypt data itself. Fix possible underflow.
Example:
❯ lvs -o name,size vg/lvol0
lv_name lv_size
lvol0 124.00m
❯ lsblk /dev/vg/lvol0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vg-lvol0 252:2 0 124M 0 lvm
└─a 252:4 0 108M 0 crypt
Before this patch (the incorrect resulting underflowed file system size after adjustment
for crypt data offset, crypt data then severed by the LV resize without proper
crypt resize beforehand):
❯ lvreduce --yes --resizefs -L -123M vg/lvol0
Rounding size to boundary between physical extents: 120.00 MiB.
Checking crypt device /dev/dm-4 on LV vg/lvol0.
File system size 18446744073696968704b is adjusted for crypt data offset 16777216b.
File system ext4+crypto_LUKS found on vg/lvol0.
File system size (108.00 MiB) is smaller than the requested size (<16.00 EiB).
File system reduce is not needed, skipping.
crypt device is already reduced to 113246208 bytes.
Size of logical volume vg/lvol0 changed from 124.00 MiB (31 extents) to 4.00 MiB (1 extents).
Logical volume vg/lvol0 successfully resized.
With this patch applied:
❯ lvreduce --yes --resizefs -L -123M vg/lvol0
Rounding size to boundary between physical extents: 120.00 MiB.
Checking crypt device /dev/dm-4 on LV vg/lvol0.
Crypt header requires 16.00 MiB, not enough space left for crypt data.
Use correct path - if the system does not have installed lvm udev rules
these /dev/vgname/lvname symlinks are not created - but test
running in test's dev dir gets symlinks created by lvm2 itself.
Use slightly bigger PVs, so the raid takes a bit more time to fully
reshape arrays - so we avoid conversion to be finished to fast even
with slowered devices.
Grab blockdev size before calling 'lvs' command.
We have different behavior for lvreduce (contains the fs size
pre-check before calling the external script) and lvextend
(does not ave the pre-check).
Also, there's different behavior with "lvreduce --fs resize"
(contains the pre-check) and "lvreduce --fs resize_fsadm"
(does not have the pre-check).
Obvously, only "lvreduce --fs resize" does have the pre-check.
To make it consitent all across, we should remove that one
and everybody will be happy.
Kernel without commit kernel commit 9f346f7d4ea73692b82f5102ca8698e4040469ea
cannot reasize raid LV without failing the raid itself.
So skip some tests for kernels 6.13 6.13 6.15.
TODO: maybe we want to print some warning message to the users
with affected kernels ??
With the use of --noudevsync we actually need to also ensure
we verify link create - as without synchronization we are not
able to properly wait - so lvm2 needs to create links itself.
It's hack where it would be better to not use it - but so far
there is no easy fix.
Make skip variables easily overridable by setting make vars.
Also further reduce used PV sizes.
Fix the usage of delay_dev so the test is properly executed
and not skiped. For major mirror slowdown use smalled region
sizes that are cause way more frequent commits so we can go
with significantly smaller delays.
Also check repair to work for failing mirror leg and mirror log.
Improve the RAID reshape size test script with the following changes:
**Code Quality Improvements:**
- Add proper shell quoting throughout the script to prevent word splitting issues
- Replace manual arithmetic with cleaner shell arithmetic syntax
- Improve variable handling and remove unnecessary local variable assignments
- Fix typo: "hilesystem" -> "filesystem"
**Test Reliability Enhancements:**
- Add EXTENSIVE_FSCK environment variable for optional additional filesystem validation
- Reduce delay times from 40ms/25ms to 20ms for all RAID types to speed up testing
- Add helper functions _delay_dev() and _restore_dev() for cleaner device delay management
- Use --noudevsync flag in lvconvert to avoid udev-related timing issues
- Remove unnecessary sleep calls and udevadm settle commands
**Functionality Improvements:**
- Improve _check_size() function to return proper exit codes instead of echo statements
- Better error handling in conditional statements using proper test syntax
- Cleaner parameter passing using "$@" instead of manual argument handling
- More robust device path handling using $DM_DEV_DIR consistently
**Code Structure:**
- Extract device delay logic into reusable helper functions
- Improve readability with better variable naming and consistent formatting
- Add explanatory comments for complex operations
_check_size_timeout function added compensating the block layer bdev update race
thus avoiding other explicit sleeps spread in the test code.
"udevadm settle" approach gone.
No --noudevsync mandatory.
Size check for stripe size reshape was bogus.
Whilst on it, optimize delay on test device to
avoid delying LVM2 MDA and RAID rmeta SubLV.
Also update some comments.
Adapting the test for the changed behavior of lvconvert
where the mirror leg and log count is not changing.
Properly test for this condition - checking only leg count is
not enough to expect error return code.
And finaly fixing invalid bash scripting logic since:
test && TRUE || FALSE
is not bash equivalent of:
if test ; then
TRUE
else
FALSE
fi
Improve the consistency and readability of warning messages.
Capitalize the first word of all warning messages.
Add periods at the end of warning messages.
The changes are purely cosmetic and do not affect functionality.
Some a/an fixups along with a couple more odds and ends.
Hopefully this is useful like this as plain diff output
or let me know if something else will work better.
Simplify printf format strings by removing unnecessary parameters
in libdm-report and toollib components to resolve coverity warnings
about format string mismatches.
Add coverity annotations to suppress false positive warnings for
unimportant results across multiple files including dmeventd,
lvmlockd, display, logging, dmsetup, and man-generator components.
Replace all uses of Linux kernel-style endian conversion macros/functions
(e.g., le32_to_cpu, cpu_to_le32, xlate32, xlate64, etc.) with the standard
POSIX/glibc macros (e.g., le32toh, htole32, htobe32, be32toh, etc.) from
<endian.h>.
- Update all code to use le16toh, le32toh, le64toh, htole16, htole32, htole64,
htobe16, htobe32, htobe64, be16toh, be32toh, be64toh as appropriate.
- Provide fallback macro definitions in xlate.h for systems lacking these
standard macros, ensuring backward compatibility with older glibc and non-glibc
systems.
- Remove or replace all project-specific xlateXX and cpu_to_leXX/cpu_to_beXX
macros.
- No functional change intended; this is a mechanical, treewide modernization
for clarity, portability, and future maintainability.
To have the very same matching logic to raid1 commit:
c901528053
we add similar check for mirror where we check if mirror leg and
log count is not changing which will now return also an error.
Set LVM_DID_EXEC to "1" instead of using the command name string,
avoiding potential issues with unusual command names and improving
consistency in environment variable handling.
Add arg_force_value() function that returns the correct force_t enum
type, replacing direct string comparisons. Update lvconvert and
pvremove to use this new function for better type safety.
This is cleaner solution over just plain cast to force_t as we can
validate force level in use.
Replace custom binary search implementation with the standard library's
bsearch() function for better maintainability.
Also convert command_name from pointer to char array and simplify name
ordering validation logic.
Fix inconsistent spacing in command help output by introducing
_print_opt_with_align() function that properly handles alignment
for options with and without short forms. This resolves the extra
spaces that were being printed in --longhelp output.
Fixes regression introduced in commit:
491c6652ae.
Replace direct structure assignment with explicit copying to prevent
potential undefined behavior from structure aliasing. This ensures
proper memory handling when working with sanlock structures.
This possibly fixes regression introduced with commit: e9640e5178
as unintended side effect.
We already check loop devices for LVM_LOOP_PV_ACTIVATED="1" in udev
rules to see if have executed pvscan before. If that is the case, we
don't want to execute it again to avoid VG reactivation.
However, the rules missed the IMPORT{db}="LVM_LOOP_PV_ACTIVATED" rule
to actually get the value already stored in udev db from previous event.
As a result, the pvscan executed on each CHANGE udev event, hence the
VG autoactivation triggered each time as well.
Fix this by adding the missing IMPORT{db}="LVM_LOOP_PV_ACTIVATED" rule
(just like we already do for MD devices).
Note: Keep the behavior for ADD events. That is, we still want the
autoactivation to trigger each time, otherwise coldplug will not work
(again, we have the same principle used for MD devices).
In case a Raid1LV has e.g. 3 images already, running "lvconvert --mirrors 2 $Raid1LV"
results in success even though the image count didn't change.
Make it fail in such case.
Related issue: https://issues.redhat.com/browse/RHEL-82138
Add explicit type casts to resolve GCC signness comparison warnings:
- Cast dest_size to int in lvmlockctl.c
- Cast NVME_IDENTIFY_DATA_SIZE to int in nvme.c
- Cast bitwise AND results to int in nvme.c and persist.c
- Cast DM_STATS_GROUP_NOT_PRESENT to int in libdm-stats.c
This commit fixes C90 compatibility issues by ensuring variable declarations
are placed at the beginning of code blocks before any executable statements.
Change kernel_send() function signature from accepting struct dm_ulog_request*
to void* to properly handle variable-sized data access.
The dm_ulog_request structure contains a flexible array member (data[0])
that allows variable-sized payloads to be appended. When accessing data
beyond the base structure size, Coverity correctly flags this as a
potential buffer overrun since the structure size doesn't account for
the variable data.
By using void* as the parameter type, we make it explicit that we're
working with a memory region that may extend beyond the base structure,
eliminating the false positive while maintaining type safety through
proper casting within the function.
Replace malloc() with calloc() when allocating the string pointer array
for regex pattern matching in _create_field_selection(). This ensures
all array elements are initialized to NULL pointers, preventing potential
use of uninitialized memory objected by Coverity.
Existing code sets all elements in the follow up loop.
Add explicit (void) cast to ignore the return value of _lv_types_match()
in _check_lv_rules() function. This addresses Coverity warning about
unused return values and maintains consistency with other similar
function calls in the codebase that properly handle return values.
Add Coverity annotations to suppress false positive warnings in several
files where the static analysis tool incorrectly flags potential issues
that are actually safe due to proper validation or intentional behavior.
The annotations address the following false positives:
- daemons/dmeventd/dmeventd.c: overflow_sink warnings for 'current' variable
that is validated to be positive before use in buffer operations
- daemons/lvmlockd/lvmlockd-core.c: overflow_sink warning for 'ret' variable
that is validated to be positive before use
- lib/config/config.c: overflow_sink warning for 'sz' variable that is
validated to be positive before use in read operations
- libdm/dm-tools/dmsetup.c: overflow_sink, overflow, and deref_overflow
warnings for 'n' variable that is validated to be positive before use
in buffer operations and string termination
- libdm/libdm-stats.c: overflow_sink warning for 'i & j' variables that are
validated to be positive before use in array indexing
Introduce _mda_is_ignored() and _rlocn_is_ignored() wrapper functions
with warn_unused_result attribute to enforce return value checking.
This follows the established pattern used by dm_strncpy() and _dm_strncpy():
- Functions without underscore prefix can be used without checking return values
- Functions with underscore prefix must have their return values checked
The change improves static analysis coverage by ensuring that critical
metadata area and raw location ignored state checks are properly validated,
reducing the risk of unhandled error conditions in metadata processing.
- Fix potential resource leak by ensuring va_end() is called on error path
- Set matched = -1 and break from while loop instead of immediate return
- This ensures proper cleanup of va_list when unsupported format
specifiers are encountered in lvmlockctl.
The szscanf function is a custom string scanner used for parsing lvmlockd
status information. Previously, encountering an unsupported format
specifier would cause an immediate return without calling va_end(),
leading to undefined behavior and potential resource leaks.
This fix ensures proper cleanup of variable argument lists in all code paths.
Replace potentially unsafe strcpy() with bounds-checked memcpy()
and proper null termination in the _sysfs_get_dm_name() function.
This prevents buffer overflow vulnerabilities when copying device
mapper names from sysfs.
Also replace another strcpy() with _dm_strncpy() which also checks
whether copied string with into a given buffer in the function
_sysfs_get_kernel_name().
Changes:
- Replace strcpy(buf, temp_buf) with memcpy(buf, temp_buf, len) + buf[len] = '\0'
- Use stack-allocated buffers instead of malloc/dm_malloc for better performance
- Improve error handling logic for ENOENT vs other errors
- Add proper newline stripping with bounds checking
- Remove memory allocation failure paths and cleanup code
This addresses potential security issues identified by static analysis
tools while also improving performance by avoiding dynamic memory allocation.
- Add bounds checking for sysconf(_SC_PAGESIZE) return value
- Validate page size is positive and reasonable (< 16MB)
- Use size_t for page_size variable to match usage context
- Simplify page calculation with ceiling division
- Fix potential integer overflow in page count calculation
This improves robustness when handling edge cases in clustered
mirror log disk operations.
Use signed int instead of size_t for loop counter to prevent reporting
unsigned underflow when decrementing from 0 in Coverity.
Loops itself worked just fine.
The _count variable was declared as uint64_t but used in arithmetic operations
that could result in underflow when subtracting from smaller values. This
could cause issues when calculating interval numbers or handling count-based
reporting.
Changes:
- Change _count variable type from uint64_t to int64_t
- Update _interval_num() to use proper casting for arithmetic
- Change UINT64_MAX to INT64_MAX for default count value
- Remove unnecessary casting in count assignment
This prevents potential underflow issues when _count is decremented or used
in subtraction operations, ensuring proper behavior for interval-based
reporting and count tracking in dmsetup commands.
The fix maintains compatibility while providing safer integer arithmetic
for the reporting loop logic.
The _stats_map_extents() function processes file extents returned
by FIEMAP ioctl calls. When handling the case where a file has
only a single extent, the code accesses fm_ext[i - 1] to check
if the logical offset is 0.
However, when i is 0 (no extents processed yet), this results in a negative
array index access which can cause undefined behavior or crashes.
So check early whether there are fm_mapped_extents to process.
This avoids using negative index array.
Existing code already checks fm_mapped_extents == 0 before calling
this function so the patch is not fixing any real bug.
Store vg->lock_type in a local variable to avoid repeated null checks
and string comparisons throughout the function. This improves code
readability and eliminates redundant conditional evaluations.
Change return value semantics:
- Previously returned -1 for non-mirrored segments (error condition)
- Now returns 0 for non-mirrored segments (no failed mirrors) and
reports INTERNAL_ERROR for such case.
- This makes the function more consistent as it counts failures, not errors
Simplify recursive calls:
- Remove unnecessary intermediate variable 'r' in both functions
- Directly add recursive call results to return value
- Eliminates redundant error checking for recursive calls
Code cleanup:
- Remove unused variable declarations
- Improve code readability and reduce compiler warnings
The refactoring makes the functions more robust and easier to understand
while preserving their core behavior of counting failed mirror images
and logs across logical volume segments.
The stats implementation was using uint64_t for group_id and loop variables
while calling dm_bit_* functions that return int types. This created a
potential issue where large values could be incorrectly handled due to
implicit casting between signed and unsigned types.
Changes:
- Change loop variables from uint64_t to int in _stats_group_tag_len()
- Change loop variable from uint64_t to int in _stats_clear_group_regions()
- Change loop variables from uint64_t to int in dm_stats_get_counter()
- Change loop variable from uint64_t to int in dm_stats_get_region_len()
- Fix _stats_create_group() to properly handle dm_bit_get_first() return value
This ensures consistent type usage and prevents potential issues with
values exceeding INT_MAX (2^31 - 1). Also this limitats group_id to 31 bits
as a constraint that may need addressing in the future.
FIXME: Maybe consider implementing 64-bit variants of dm_bit functions or
documenting this limitation more prominently is this ever become an
issue...
Function dm_strncpy() ensures the last character is \0,
so pass whole array size for buffer size.
This give 1 extra character for use to store owner state and name.
lvcreate accepts --setautoactivation for thick logical volumes, but
not for e.g. thin volumes (see linked bug report). Currently, in such
cases, a subsequent lvchange invocation is necessary to change the
autoactivation flag.
To fix this, make lvcreate accept the flag for all volumes by adding
it to LVCREATE_ARGS.
Fixes: https://gitlab.com/lvmteam/lvm2/-/issues/32
Fixes: 0a28e3c44 ("Add metadata-based autoactivation property for VG and LV")
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Update test infrastructure to use the lvresize_fs_helper script
from the LVM2 codebase instead of relying on the system-installed
version. This ensures consistent testing behavior and avoids
issues when the system version doesn't exist or differs from
the expected implementation.
- Add lvresize_fs_helper to LVM_SCRIPTS in test/Makefile.in
for installation
- Configure global/lvresize_fs_helper in test/lib/aux.sh
to point to the test version
- Add global_lvresize_fs_helper_executable_CFG config option to allow
specifying the path to the lvresize_fs_helper script.
- Add DEFAULT_LVRESIZE_FS_HELPER_PATH macro for default value.
- Update _get_lvresize_fs_helper_path() in lib/device/filesystem.c
to read the path from configuration using find_config_tree_str,
similar to _fsadm_cmd().
- This allows users to override the helper path via configuration,
improving flexibility and consistency with fsadm_executable handling.
- Avoid using static variable to get helper path just once, since
it may change between commands via lvm.conf in lvm2 shell.
alphasort was replaced with versionsort in 09e508cd43 commit.
This commit fixes the following compiler error.
device/device_id.c:1608:65: error: use of undeclared identifier 'versionsort'
1608 | sort_count = scandir(dirpath, &namelist, _filter_backup_files, versionsort);
| ^
= and == are equivalent in Bash for strings, but = is the only portable
operator for compatibility with other shells. Before this change,
running ./configure with Dash as /bin/sh resulted in:
./configure: 14558: test: yes: unexpected operator
and the test did not work (i.e. --enable-cmdlib --disable-shared allowed
a failed build to continue). Now, the test works in Bash and Dash.
Enable lvm to use persistent reservations on a VG, which
are applied to each PV in the VG.
. lvmpersist is a low level script, which uses commands
sg_persist, mpathpersist, and nvme to do PR operations
on devices. This script is used by higher level lvm
commands, and would not often be run by users.
. vgchange --setpersist is a VG metadata configuration command
that specifies how PR should be started and enforced for a VG
relative to other lvm commands.
. vgchange --persist is a command to change PR state of PVs in
the VG, e.g. start PR to register and reserve.
The lvmpersist man page contains a complete description.
LVM objects (like pv_device_id) can sometimes contain literal control characters. The JSON spec disallows control characters. So we need to escape them for output.
Closes https://gitlab.com/lvmteam/lvm2/-/issues/35
Update test scripts to use new command-line options for skipping tests,
replacing the previous use of environment variables. The following options
are now supported:
--skip-root-dm-check
--skip-with-devices-file
--skip-with-lvmpolld
--skip-with-lvmlockd
- Add command-line option parsing to lib/inittest.sh.
- Replace SKIP_ROOT_DM_CHECK=1 with --skip-root-dm-check.
- Replace SKIP_WITH_DEVICES_FILE=1 with --skip-with-devices-file.
- Replace SKIP_WITH_LVMPOLLD=1 with --skip-with-lvmpolld.
- Replace SKIP_WITH_LVMLOCKD=1 with --skip-with-lvmlockd.
- Update 410 test files to use the new syntax.
This change provides a cleaner, more consistent, and maintainable
way to handle test skipping across the test suite.
Also eliminates number of shellcheck errors.
TODO: convert couple more remaining.
Fix include paths in cmd_enum.h and command.c by removing the "include/"
prefix from #include directives. This resolves build failures when
building from a directory different from the source directory.
The include paths were incorrectly referencing "include/cmds.h" instead
of "cmds.h", causing compilation errors in out-of-source builds.
Remove unnecessary quotes from section headers in man pages:
- lvm_import_vdo.8_main: .SH "NAME" -> .SH NAME
- fsadm.8_main: .SH "NAME" -> .SH NAME
This improves consistency and follows man page conventions.
Originated-by: Cursor AI
Fix company name format in .TH headers by adding missing comma:
- dmeventd.8_main: 'Red Hat Inc' -> 'Red Hat, Inc.'
- cmirrord.8_main: 'Red Hat Inc' -> 'Red Hat, Inc.'
- lvmpolld.8_main: 'Red Hat Inc' -> 'Red Hat, Inc.'
- lvmdbusd.8_main: 'Red Hat Inc' -> 'Red Hat, Inc.'
- lvm_import_vdo.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmlockctl.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmcache.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmvdo.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- blkdeactivate.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmthin.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmreport.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmsadc.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmautoactivation.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmlockd.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- fsadm.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmraid.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmsystemid.7_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
- lvmsar.8_main: 'Red Hat, Inc' -> 'Red Hat, Inc.'
This ensures consistent company name formatting across all man pages.
Originated-by: Cursor AI
- cmirrord.8_main: Remove unnecessary article 'the' before 'corosync'
- dmsetup.8_main: Fix multiple grammar and punctuation issues
* Add missing 'to' in 'Set this to zero to continue'
* Fix 'eg,' -> 'e.g.,' and add missing comma
* Add missing 'the' in 'for the live device'
* Fix 'customised by following options' -> 'customised by the following options'
* Fix 'comma-separate' -> 'comma-separated'
- fsadm.8_main: Improve sentence structure for dm-crypt description
- lvm.8_main: Fix capitalization 'volume Groups' -> 'Volume Groups'
- lvm_import_vdo.8_main: Fix header title and multiple grammar issues
* Fix header title 'FSADM' -> 'LVM_IMPORT_VDO'
* Fix 'LV a backend device' -> 'LV as a backend device'
* Fix 'the of volume group' -> 'the name of the volume group'
* Remove extra 'with' in 'with within volume group'
- lvmsystemid.7_main: Fix punctuation 'e.g.' -> 'e.g.,' and 'i.e.' -> 'i.e.,'
These changes improve grammatical correctness, consistency, and readability.
Originated-by: Cursor AI
Override LC_NUMERIC part of the locale to "C" if we detect that the
radix character interferes with JSON_STD format. If that's the case,
override LC_NUMERIC locale to "C" in report_format_init, that is,
before any reporting is executed (including log reporting). Restore
it back in report_format_destroy, that is, once we're sure that all
reporting is finished.
Related: https://gitlab.com/lvmteam/lvm2/-/issues/33
When a report is under DM_REPORT_GROUP_JSON_STD, the formatting of the
report follows more standard form of the JSON output. This includes
unquoted numbers (as opposed to the DM_REPORT_GROUP_JSON).
The JSON standard dictates the radix character (decimal point) must
be '.' only (https://www.rfc-editor.org/rfc/rfc7158#section-6).
However, some locales may use other character for the radix delimiter,
like ','. This character also interferes with ',' used as delimiter for
json items.
Therefore, we need to check whether current locale is not posing an
issue when using DM_REPORT_GROUP_JSON_STD. If that's the case, simply
error out from dm_report_group_create as we don't want to override
current locale in libdm or do anything else at this level. The libdm
caller is responsible here for setting the proper locale.
Related: https://gitlab.com/lvmteam/lvm2/-/issues/33
This commit adds lvresize/lvextend/lvreduce support for btrfs.
'btrfs filesystem resize [devid:][+/-]<newsize>[kKmMgGtTpPeE]|[devid:]max <path>'
is used to resize one device only when it's mounted.
The code pattern is like xfs but it supports shrink.
For multi-devices btrfs, There is one difficulty to be handled:
If `lvreduce --fs resize` is given, lvm2 will check newsize vs current fs size
to judge if it's need to shrink fs or not.
For one device btrfs, fslastblock * fsblocksize/FSSIZE is the correct value like
ext* and xfs. But for multi-devices btrfs, the two values are whole fs size.
There is no other way without relying btrfs superblock parse. It's too
complicated and inproper to implemnt the logic in lvm2.
So here just sets fs_last_byte to 0 for btrfs and skips boundary check in
_fs_reduce_allow(). It's safe as btrfs will handle it well.
The another complicated part is how to get mount point info if multi-devices.
There is only one mnt entry per mounted fs in /etc/mtab even it's a
multi-devices btrfs. So we first get uuid from lv device then traverse devices
under /sys/fs/btrfs/$uuid/devices and compare them to the mnt entry to get the
mount point.
Signed-off-by: Su Yue <glass.su@suse.com>
Add new field fs_info::uuid to record device uuid when calling
fs_get_blkid() for further use.
No functional change.
Signed-off-by: Su Yue <glass.su@suse.com>
As we always require and check for version 3.7,
avoid extra CHECK_EXIST and go for CHECK_MODULE.
LOCKDSANLOCK_SUPPORT is not defined is the build
is not enabled.
When build for sanlock is enabled, and
CHEKC_MODULE does not detect at least version 3.7
then whole configure process errors out.
Count or clear transiently failed devices as of dm-raid superblocks.
Updated debuging.
Use lvconvert --repair to repair transiently failed legs.
Activating all 'meta' LVs with single sync_local_dev_names().
Using proper DM path for meta LV.
Modified-by: zkabelac@redhat.com
Use the new sanlock_acquire2() which returns info about the owner
of a lease. Pass this info back to the lvm command, where it's
initially used to print the host_id of a host holding a lock
when it cannot be acquired.
Add szscanf() to use in place of sscanf. It takes a buffer size for
strings, so avoids needing to use max field width, which is hard to
read when implemented with stringify macros.
Manually enhance pages for thin, cache, raid, vdo.
Replace usage of .HP with .TP when it makes sense (but keep .HP
where so far we don't have replacement giving same visual results).
Use .CMS, .CME macros in dmsetup/dmstats so it easy to switch
But keep using .HP when the rendering looks simply better in terminal
although HTML output does not looks that well - so we may eventually
switch here to .TP.
For basic command synopsis use .NSY macro that will
use .SY for graphical rendering (postscript/pdf) but keep
using .TP for ASCII terminal output, as here many HTML renderers
are emitting unreadable pages.
For options use '\ ' (non-breakable space) between option and
its argument and option are not across line.
Reformat lines to fit in 80 columns.
Use .EX .. .EE for example output - this improves
character alignment for poststrict/pdf rendering as
it use monospace fonts (unlike .nf .. .fi).
Also with Example section never let the line being with space
and use '\' for such line.
If the Example line should being with '.' it needs to be
prefix with \&.
Add .nf/.fi macros for cases where .EX & .EE are undefined
(this happens i.e. with man2html)
Avoid use of tables (.TS .. .TE) as HTML renderers often use
images (.png) files for such tables and this does not scale well
when user changes font size in browser.
(and the large/long table were split into 2 pieces
so it does fit to 80 columns).
Use .MT .ME for emails.
Use .UR .UE. for URL.
Use .\|.\|.\& as sequence of 3 dots and add \& to not create end of
sentence and possibly wrong alignment.
On lines starting with .BR & .IR avoid using \fB \fI as this
can cause problem when i.e. html rendered may keep use
italic bold when just italic was really wanted.
When using series of .TP/.IP - set the size only with the
first tag - and let renderer align others to match the column.
Correct some small typographical rendering issues.
Rendering was evaluated for readable results with:
- mandoc -T html (-O style=mandoc.css)
- groff -Thtml -mman
- man2html
- man -Thtml
- man -Tps
- man
Unfortunatelly various g/troff troubles are with each of them,
so we need to select usage of macros in a way, that is not mangling
results for above engines.
Improve generated output for better compliance with '-T lint' checker
(mandoc -T lint and groff --mandoc).
Try to properly place .P sections and also correctly use .TP rendering
where we need to place '.na .ad' after the first rendered keyword,
otherwise it has not the desired impact.
Also use .nh .na .. .ad .hy around whole command USAGE description
so we avoid unwanted alignment,spacing,hyphenation there.
Make sure we are emitting properly order paragraphs and avoid
i.e. submit of .br after .RS/.in that has this implicit
Also more frequently emit '\n' so there are not too long lines as
rendering engine will format line breaking according to its rules.
To keep generated page better controllable emit more empty line
and use such lines for every .SH, .TP.
Use 'short_ops' loop to avoid duplicating code.
Emit .\|.\|. for 3dot sequence for proper graphical rendering
Emit \\0\\0\\0 (3 white space of width of letter '0')
for better alignment of options with graphical rendering.
There is ATM prepare '#define TABBED' - enabling this make
initial option list nicely aligned in graphical rendering
and doesn't seem to have bad side effect on text rendering.
man: generator use macros for options
Predefine all options used by the command into list of '.de O_name'
macros that are pregenerated in the front of man page.
(interestingly usage of groff strings (.ds) seems to have some
non-trivial issues across rendering engines)
This allows to use '\t' without producing warnings with
'mandoc -T lint' - as normally tabs are allowed only within
'.nf ... .fi' section, but than there is not working line-breaking.
While we could use purely 'tab' base version, for some 'html' (ascii)
redering its producing not so well indented option list.
For this reason use tabs only with graphical renderers .ie t / .el
and use only spaces for ascii rendering.
Ensure the ... (3dots continuation is properly renderer with
a single space after repeatable option/argument and not adding extra
space before i.e. closing bracket.
Using .ta for graphical rendering - allows to keep option aligned
with proportional font.
Couple commands (lvcreate,lvconvert,vgcreate,lvchange,vgchange)
has the 'specific' property that within them the option --profile
behaves like --metadataprofile, while for all other commands this
option should be simply an alias for --commandprofile.
We may eventually drop this rather confusing behavior in the future
version and there will be only one use as --[command]profile
It should be noted this --commandprofile can be often used
instead of --config option for preconfiguring setting
for some group of commands - we should possibly more propagate
this usage.
Recent patch set for select enhancement missed to initialize
ssl struct element regex to NULL and this code might have
crashed on this code path evaluation.
Actually mirror were never supposed to be usable with any newer
target as they are very problematic with any stacked usage.
So now it's going to be properly checked and prohibited.
Users are always supposed to use 'raid1' --type instead.
We can print list of supported LV types for
options like --cachepool,--thinpool,--vdopool when
they are specifying particular LV.
TODO: while we nicely document them, the parser engine ATM
is not capable to validate and enforce these properties,
so the code needs to match them on its own.
Although our command-line description file describes
supported types for conversion with some rules,
these are technically not yet fully implemented in
the code, thus we need explicit functionality to
validate passed LVs for conversion.
Add some explicit warning for commands that are destroying content
of converted volume.
Add thinpooldata to the list of allow LVs for caching
(as the code already support this).
See Linux source Documentation/admin-guide/blockdev/zram.rst .
zram devices offer a good performance and efficient resource utilization
through the use of compression.
Signed-off-by: David Disseldorp <ddiss@suse.de>
If lvresize is given a size > the maximum COW size for a given origin
the command will fail with an internal error and no error message:
# lvresize --size 1.6g fedora/snaptest-snap
Rounding size to boundary between physical extents: <1.59 GiB.
Reached maximum COW size <1.01 GiB (258 extents).
Command failed with status code 5.
With -vvv:
Found snapshot target v1.16.0.
Getting target version for snapshot-origin
dm versions [ opencount flush ] [2048] (*1)
Found snapshot-origin target v1.9.0.
Reached maximum COW size <1.01 GiB (258 extents). <<<
Unlock: Memlock counters: prioritized:0 locked:0 critical:0 daemon:0 suspended:0
Syncing device names
Unlocking /run/lock/lvm/V_fedora
_undo_flock /run/lock/lvm/V_fedora
Freeing VG fedora at 0x55781b142890.
Freeing VG fedora at 0x55781b136860.
global/notify_dbus not found in config: defaulting to 1
Destroy lvmcache content
Completed: lvresize -vvv --debug --size 1706243072b fedora/snaptest-snap
Internal error: Failed command did not use log_error
This happens because in this case _lvresize_adjust_extents() returns
early without setting lp->resize to either LV_EXTEND or LV_REDUCE after
capping lp->extents to the maximum COW size.
Fix this by just capping lp->extents and relying on the existing code in
_lvresize_adjust_extents() to fixup lp->resize in the case that
lp->extents == existing_logical_extents. This is consistent with the
no-op case where -l is given as the existing size:
root@localhost:~/src/git/lvm2# LD_LIBRARY_PATH="$PWD/tools" ./tools/lvm lvresize -L 1.6g fedora/snaptest-snap
Rounding size to boundary between physical extents: 1.60 GiB.
Reached maximum COW size <1.01 GiB (258 extents).
New size (258 extents) matches existing size (258 extents).
No size change.
The c065b407cb77a7a14d7c7c3c94e09fcca2fcff09..872e085030ae8039f18908f6e45bad7ba99250a7
was for device_mapper/libdm-report.c. Do the same for libdm/libdm-report.c
Recognize regex in string list selection criterion, including grouping
items by using {} and [] together with && (or ",") and || (or "#")
logical operators:
- [ <regex> && <regex> ... ]
- [ <regex> || <regex> ... ]
- { <regex> && <regex> ... ]
- { <regex> || <regex> ... ]
Also recognize simple "<regex>" (without any grouping operators)
as a shortcut for "{ <regex> }".
Regex remembers the mempool it was given during dm_regex_create and
then it uses it for further allocation during dm_regex_match. This
could be dangerous in case we used the same mempool for any other
allocations/frees in between dm_regex_create and dm_regex_match calls
in the outer code. This patch adds separate regex mempool for the
report/select to avoid the possible issues.
Previous patch made a proper difference between [...||...] and
[...&&...]. If the criterion for a string list does not use any [] or
{}, we need to make sure that proper matching function is called -
in this case not using {} or [] is the same as if {} was used
(matching subset).
Matching a string list criterion which had [... || ... ] was not
correctly implemented - it was the same as [ ... && ... ]. This patch
makes a difference between the two:
- [ ... || ... ] matches if all items from string list value are
matched by ANY item from selection string list (that is, not
all the selection string list items need to match)
- [ ... && ... ] matches if all items from string list value are
matched by an item from selection string list 1:1 (that is,
all the selection string list items need to match)
Remove superfluous struct reserved_value_wrapper param for
_tok_value_regex function. The only thing that _tok_value did was
zeroing the reserve field within the struct. But this one is already
zero-initialied in outer _parse_selection function.
Describe missed --segments opt for lvdisplay (matches lvs).
Describe lvm-fullreport --all option - show text for lvs,vgs,pvs.
Missed '.' for --separator.
Initialization of union is somewhat tricky as it initialize only
the first member + padding, but in our case this does not clear
the whole size of union so explicitly set \0 after 2 'struct id'
and make sure DM uuid is not using random characters from stack.
Also add explicit .id designators (c99).
After detecting that a VG has wrongly claimed a PV, unpair
the pv->dev setting. This will cause the usual "missing PV"
message to appear for that VG. Make this message, and some
others, clearer by using the VGID rather than the VG name
when there are multiple VGs with the same name.
_vg_read() calls lvmcache_update_vg_from_read() which detects
that the VG metadata is incorrectly claiming the PV. Flag this
condition in the PV status as WRONG_VG. Later, vg_read() can
simply check the WRONG_VG flag rather than repeating the same
PV/VG checks that were already done in lvmcache_update_vg_from_read.
Outdated VG metadata that appears when an old device is attached
to the system can result in PVs appearing to belong to the
old/wrong VG, and commands are allowed to use (corrupt) the PVs.
- vgcreate old /dev/sda /dev/sdb /dev/sdc
- offline /dev/sda
- vgreduce --removemissing old
- vgremove old
- vgcreate new /dev/sdb /dev/sdc
- online /dev/sda
When sda is reattached, sdb and sdc will appear to be
in VG old again. An attempt to correct the problem,
e.g. with vgremove old or vgreduce old, would modify
sdb and sdc, removing them from the new VG.
To fix this, check that sdb and sdc contain metadata for
VG old before allowing VG old to claim ownership of them.
With the fix, sdb and sdc are not displayed as part of
VG old, and commands to change VG old will fail as long
as it references incorrect PVs.
To fix VG old (sda), remove the incorrect PVs from VG old
while limiting the command to see only the correct PVs:
vgreduce --removemissing --devices /dev/sda old
If an inactive LV is being cached in writeback mode, then
removing the cache does a temporary activation to flush
the cache back to the main LV. However, it forgot to
deactivate the LV again, so the temporary activation
was left in place.
Drop 'TEST WARNING' from this case - the code works the same
were for 'pvmove -b' polling as for lvmpolld.
Just keep there 'TODO' notice we want to eventually reduce amount
of 'worked' pvmove monitoring process - ATM there will be 1 per LV.
The current manpage is unclear in the example of a raid10 type LV RAID
with --mirrors 1 --stripes 4. The number of devices in each raid1 mirror
is NumberStripes/(NumberMirrors+1). Thus the example should read:
e.g. mirrors 1 and stripes 4 will stripe data across two raid1
mirrors, where each mirror contains two devices.
Fixes: https://gitlab.com/lvmteam/lvm2/-/issues/26
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Previous commit 874a8ab4d0 missed 'IGNORE' mode.
Fix it by adding rather 'explicit' test for this value,
so the code is better readable.
Also unlock memory earlier and drop unneeded <backtrace>
from return since we already logged error in this function.
Since kernel patch 3d9a9e9a77c5ebecda43b514f2b9659644b904d0 (6.14)
it seem device size is no limitted to <8Exa bytes (so LLONG loff_t
works across whole device).
So reduce our tested size to 8191 Peta ~ <8 Exa.
Increase the used mirror size for longer processing.
Add more -vvvv traces to pvmove to better chase test error.
Add extra code to wait for finish of polling pvmove after it's
been aborted - so it doesn't break 'next' test.
Use "groff -e ' $' -e '\\~$' <file>" to find obvious trailing spaces.
Use "mandoc -T lint dmstats.8"
Use "test-groff -mandoc -t -ww -z dmstats.8"
-.-.
Lines containing '\c' (' \c' does not make sense):
503:.B \-\-units \c
-.-
Change '-' (\-) to '\(en' (en-dash) for a (numeric) range.
GNU gnulib has recently (2023-06-18) updated its
"build_aux/update-copyright" to recognize "\(en" in man pages.
dmstats.8:470:expressed as a hyphen separated range, for example: '1\-10'.
-.-.
Add a (no-break, "\ " or "\~") space between a number and a unit,
as these are not one entity.
1114:Create a 32M region 1G into device d0
-.-.
Add a "\&" after "e.g." and "i.e.", or use English words
(man-pages(7)).
Abbreviation points should be protected against being interpreted as
an end of sentence, if they are not, and that independent of the
current place on the line.
511:Can also specify custom units e.g. \fB\-\-units\ 3M\fP.
-.-.
Wrong distance between sentences in the input file.
Separate the sentences and subordinate clauses; each begins on a new
line. See man-pages(7) ("Conventions for source file layout") and
"info groff" ("Input Conventions").
The best procedure is to always start a new sentence on a new line,
at least, if you are typing on a computer.
Remember coding: Only one command ("sentence") on each (logical) line.
-.-.
The name of a man page is typeset in bold and the section in roman
(see man-pages(7)).
798:extents). This currently includes \fBxfs(5)\fP and \fBext4(5)\fP.
801:group, and the group alias is set to the \fBbasename(3)\fP of the
-.-.
Use thousand markers to make large numbers easy to read
560:is equivalent to 10000000. Latency values with a precision of less than
-.-.
Remove quotes when there is a printable
but no space character between them
and the quotes are not for emphasis (markup),
for example as an argument to a macro.
1:.TH DMSTATS 8 "Jun 23 2016" "Linux" "MAINTENANCE COMMANDS"
-.-.
Output from "test-groff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z ":
.-.
Additionally:
Fix some arguments for '.TP'. A single-font macro does not work with
'\c', so use a two-font macro.
-.-
Use the pair ".na / .ad" to set no-adjustment (same result as '.ad l')
and '.ad' to restore previous adjustment.
[Replacing ".ad l" ... ".ad b"]
Set singular '.ad b' to '.ad \*(AD' as the user should have the choice to
control the adjustment from the command line.
Add an empty string to string 'AD' with '.as AD "\"' to avoid a warning
about an undefined string.
-.-
Generally:
Split (sometimes) lines after a punctuation mark; before a conjunction.
Updated-by: zkabelac@redhat.com
Checking for defects with a new version
Use test-[g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z < "man page"
Use "groff -e ' $' -e '\\~$' <file>" to find obvious trailing spaces.
Use "mandoc -T lint dmsetup.8":
Use "test-groff -mandoc -t -ww -z dmsetup.8":
-.-.
Add a (no-break, "\ " or "\~") space between a number and a unit,
as these are not one entity.
-.-.
Use "\e" to print the escape character instead of "\\" (which gets
interpreted in copy mode).
487:with its hex value (two digits) prefixed by \\x.
-.-.
Strings longer than 3/4 of a standard line length (80)
Use "\:" to split the string at the end of an output line, for example a
long URL (web address)
1030 <name>,<uuid>,\:<minor>,<flags>,\:<table>\:[,<table>+]\:[;<name>,<uuid>,\:<minor>,<flags>,<table>\:[,<table>+]]
-.-.
Add a "\&" after "e.g." and "i.e.", or use English words
(man-pages(7)).
Abbreviation points should be protected against being interpreted as
an end of sentence, if they are not, and that independent of the
current place on the line.
581:Note: Same cookie should be used for same type of operations i.e. creation of
767:Attempts to remove all device definitions i.e. reset the driver. This also runs
946:e.g. striped 2 32 /dev/hda1 0 /dev/hdb1 0
-.-.
Wrong distance between sentences in the input file.
Separate the sentences and subordinate clauses; each begins on a new
line. See man-pages(7) ("Conventions for source file layout") and
"info groff" ("Input Conventions").
The best procedure is to always start a new sentence on a new line,
at least, if you are typing on a computer.
Remember coding: Only one command ("sentence") on each (logical) line.
Mark a final abbreviation point as such by suffixing it with "\&".
Some sentences (etc.) do not begin on a new line.
Split (sometimes) lines after a punctuation mark; before a conjunction.
-.-.
Use \(en (en-dash) for a dash at the beginning (en) of a line,
or between space characters,
not a minus (\-) or a hyphen (-), except in the NAME section.
-.-.
Remove quotes when there is a printable
but no space character between them
and the quotes are not for emphasis (markup),
for example as an argument to a macro.
1:.TH DMSETUP 8 "Apr 06 2006" "Linux" "MAINTENANCE COMMANDS"
-.-.
Output from "test-groff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z ":
Updated-by: <zkabelac@redhat.com>
Handle interruption caught in sleep between polling and
abort() tool execution in such case.
(Although ATM we are not normally signalling the tool this way).
Improve error handling in polling functions where errors
were previously ignored. These errors result from serious
failures (e.g., allocation errors) and should lead to a full
command exit, as the tool cannot function in such a state.
FIXME:
However, there is a fundamental design issue worth considering:
when a command like pvmove --abort cancels an ongoing operation,
the existing polling command continues running and only terminates
once it detects that there is nothing left to poll.
Next issue is perment reopening of a VG when 'monitoring' progress.
And the last is big trouble with '--interval 0' which is able to
wait in DM ioctl() and hold the VG lock, and there is not good way
to about such operation (other then sending a signal to such process).
Remove any older lcov generated files ('*.gcda|gcno') then the currently
generated 'make.file' before creating a new lcov report.
Otherwise we may hit the problem of using some older generated files
possibly with different format.
Restore missing symbols to the libdevmapper.so library.
These symbols:
dm_bitset_parse_list@@DM_1_02_138
and dm_tree_node_size_changed@Base
become 'lost' with commit: 40b277ae17
which supposedly cleaned local 'symbols' from visibility,
however these missing symbols were impproperly exported.
Signed-off-by: Jianqi Zeng <zengjianqi@kylinos.cn>
It looks like occasionally supports_json() in cmdhandler.py
for some reason does not find 'fullreport' in err output
of lvm help... let's see more traces...
The simple common case of locking the LV to remove with a
persistent lock would usually be fine, but there are a number
of special cases that were not addressed:
- no locking was done for removing cow snapshot
- direct locking to vdo pool
- dm-cache uncache using lvremove was not handled
Use the same lockd_lvcreate_lock() for all cases in which
creating a new LV first requires locking another associated
LV, e.g. locking the pool or origin for the new LV.
Refactor _get_split_name(). code to simplify detection of memory leak
in _destroy_split_name(). Now there are always just 2 pointers
instead of conditional pointer free() which is hard to follow.
In most cases header should be self-compilable, so the
do not expect other 'header' files to be used upfront
so the header would be compilable.
No functional change.
These internal header were using misleading variable names
in function prototypes, but correct names were used in
function definition. Noticed with:
clang-tidy --checks=readability-inconsistent-declaration-parameter-name
No functional change.
Bit flags likely should never have been 'enum' but since
we have this in a public header - it might be hard to
replace this. So at least add missing 'enum' element
we use.
Although the code was exiting only for (update == 0),
the later code actually used requires 'extents' to exist
also for (update != 0).
TODO: The logic here is not very clear, more testing needed.
After reorganizing elements in `possible_takeover_reshape_type`
(in commit 5b92ce741f),
it became apparent that the code relied on struct overlap,
which is somewhat unsafe. This commit removes it and ensures
proper `const` qualification for the struct usage.
While building lcov files - ignore errors from 'negative' counter
(perhaps we can use -fprofile-update=atomic - but it would be another
slowdown of test runs)
Also ignore unexecuted blocks warnings with 'gcov'.
Failure of lcov goal is not supposed to error whole make build.
When test runs with lvmpolld - we cannot check
messages from pvmove - as those are visible through
output of lvmpolld - so just skip this and
only check LVs are in expected state.
Since long_opt was changed to char[], we were only comparing pointers
that always exist, whereas the original intention of the test was
to verify the presence of a string
(i.e., checking that the first byte is not \0).
vgreduce --removemissing --force replaces a partial image
with an error target. When that image includes an integrity
layer, that layer needs to first be removed.
Fix logging of VDO configuration info message which has acutally
printed " and," using next element..
Increase the array element size so it can store >=5 bytes
for " and" + \0.
Similar to the pvmove update, enhance error path handling
for scenarios where legs or logs remain open and cannot be
closed during the splitting of a mirror image.
Remove the now obsolete _delete_lv() function,
as it will no longer be needed.
When the pvmove operation is completing, it attempts to deactivate
the temporary mirror and remove its mirror legs. However,
if an external tool holds these volumes open, the operation would
previously abort entirely, leaving the LVM2 metadata in a partially
unusable state that required manual administrative fixes.
To improve this, the code has been enhanced to handle such scenarios
more gracefully. It will now complete the pvmove operation even
if some volumes cannot be deactivated, marking them in the metadata
with an error segment. While the command will report errors,
the metadata will remain in a usable state. The administrator
can then remove the orphaned volumes when they are no longer in use.
Fix the code that limited the total number of backup files.
It failed, and left excess files, when the file version
number was greated than 9999, exceeding the four digit suffix.
Now, after version 9999, the suffix intentionally grows beyond
four digits as needed, and is not a fixed width, or zero padded.
Fix for recent commit "lvmlockd: free resource structs for LVs"
When a vg_write() fails in lvcreate, lvmlockd sees init_lv()
followed by free_lv(). The LV lock is not acquired prior to
free_lv, and no prior resource struct exists. This wasn't being
handled.
Previous commit 5f71cebcbe was not
correct. 4k requirement cannot be put on attribute_offset - where
it is valid to have this only 512b aligned.
The rule might get more complicated to recognized invalid values.
For this moment however add more easier requirement - we
impose 4K restriction on minimal and optimal io size if they
are bigger then 1 sector (512B).
Sscan may automatically add 0 after field width mark,
and since it's not exactlu trivial to do a macro calculation
for PATH_MAX - 1, rather make buffer for sscanf results bigger.
Also use matching FSTYPE_MAX as field width specifier.
lvm2 is caching DM nodes with the use of DM_LIST_DEVICES ioctl().
And tried to preserve the cached structure for the same list,
however there was 1 case where cache was empty, and new LIST ioctl
returned some elements - if this DM table change has happened
in the moment of 'scanning' and locking - lvm2 has then continued
to use 'invalid' empty cache.
Fix by capturing this missed case and update cache properly.
TODO: we could possibly use plain memcmp() with previous ioctl result.
Since we started to use DM cache now also for basic checks
whether the DM devices is present in DM table, this cache
now needs to be actually refreshed when the LOCK is taken.
This hiddenly happend if there was enabled 'scan_lvs' however
still not at the right place.
Move this explicit cache update call right after the moment
vg_read grabs the lock.
TODO: in the optimal case, we should mark the 'cache invalid'
and later refresh this cache, when the first reader appears,
but since this would be large patch, do this little fix step patch
first and improve performance later.
Once created, resource structs for LVs were never being freed.
If LVs are activated, then later removed or never used again,
the unused structs waste memory and cause the resource list
to grow.
There was a lot of messy and inefficient locking calls sent from
a command to lvmlockd when working with thin volumes, e.g.
- requesting a lock numerous times that was already held
- releasing a lock numerous times that was already unlocked
- repeating lock/unlock/lock/unlock rather than holding the
lock until it was no longer needed
Mistakes in the locking could easily hide among all the noise.
The mess was largely because thin-related commands involve a
lot of internal LV manipulations, and lvmlockd calls were done
at the lower level of LV activation/deactivation. This change
adds locking code that is more specific to the thin command
being run, so it can be more intelligent in acquiring and
releasing locks where needed.
Since we now support disabling memory locking by setting
reserved memory or stack to 0 - it would be useful if this would
work also with cmdline --config option.
TODO: rework creation and usage of cmdtool context so we avoid
several places in the code which do try to initialized something...
Fix regression introduced with commit:
964012fdb9
that effectively disabled memory locking before suspending volumes.
From merging/testing there remained wrong condition
as we really want to check for 0 memory reservation value
for both checked settings.
Because now, we are doing the fsinfo check before extending an LV and if
that check fails, we do not proceed to the LV extension itself and the
lvextend command bails out immediatelly.
It seems we need new_size_bytes in places where struct fs_info is also
passed. Store the new_size_bytes inside the struct fs_info so we
can just pass that one to all the functions we call and hence make
the code a bit cleaner and easier to follow.
This avoids a situation where we would extend an LV and then we would
not do anything to the FS on it because the FS info check failed for some
reason, like the type was not supported (e.g. swap) or we could not resize
the FS unless being in some supported state (e.g. XFS to be mounted for
the xfs_growfs to work).
Before this patch (LV resized, FS not resized):
❯ lvextend --fs resize -L+4M vg/swap
Size of logical volume vg/swap changed from 32.00 MiB (8 extents) to 36.00 MiB (9 extents).
File system extend is not supported (swap).
File system extend error.
Logical volume vg/swap successfully resized.
With this patch (LV not resized, FS not resized):
❯ lvextend --fs resize -L+4M vg/swap
File system extend is not supported (swap).
Deactivate converted volume to pool early, so the conversion
exits early and does not leave some already created metadata
volumes that needed manual cleanup by user after command
aborted its conversion operation when the converted volume
was actually in-use (i.e. when user tried to convert
a mounted LV into a thin-pool, 2 extra volumes needed removal).
Device quirks may cause sysfs wwid file to change what it
displays, from a bogus eui... string to an nvme... string.
The old wwid may be saved in system.devices, so recognizing
the device requires finding the old value from libnvme.
After matching the old bogus value using libnvme, system.devices
is updated with the current sysfs wwid value.
All block devices have a disk sequence number assigned (an ever-increasing 64 bit
sequence number) since kernel v5.15 (February 2021). The number is exported through
/sys/block/<disk>/diskseq property and also as DISKSEQ udev event variable.
The diskseq helps with referencing a device throughout its existence in
race-free way.
By default, the /usr/lib/udev/rules.d/60-persistent-storage.rules set
/dev/disk/by-diskseq/<diskseq> symlink for each block device. However,
these rules do not apply for DM devices because we manage the symlinks
ourselves in 13-dm-disk.rules where it properly follows the
DM_UDEV_DISABLE_DISK_RULES flag, among other things.
Add a rule to 13-dm-disk.rules to create the /dev/disk/by-diskseq/<diskseq>.
Since commit d106ac04ab ("configure.ac: use LIBSYSTEMD"),
lvmlockd is not built with SD_NOTIFY by default but depending
on LIBSYSTEMD_LIBS. There are three prerequisites of
nonempty LIBSYSTEMD_LIBS:
NOTIFYDBUS_SUPPORT, SYSTEMD_JOURNAL_SUPPORT and SYSTEMD_JOURNAL_SUPPORT.
If ./configure is called with options ' --disable-systemd-journal
--disable-app-machineid --enable-lvmlockd-sanlock
--disable-notify-dbus', the lvmlockd built is without sd_notify
support which causes hang of start lvmlockd service in notify type.
This commit adds options disable-sd-notify and enable-sd-notify.
The default value is autodetected and when the lvm2 is build with
systemd then sd-notify is enabled.
If systemd/sd-daemon.h is existed, call PKG_CHECK_MODULES libsystemd.
Signed-off-by: Su Yue <glass.su@suse.com>
Modified-by: Zdenek Kabelac <zkabelac@redhat.com>
Also fix the use of --all that was mistakenly included
as an accepted option for vgdisplay and two cases of pvdisplay
in commit "tools: enhance lvdisplay vgdisplay pvdisplay"
Remove --noudevsync option - as this breaks synchronization with
udev which is necessary when trying to i.e. create _rmeta_3
and wipe it - as the symlinks must be present for wiping.
So if there was some other issue (behind the comment) - we need to
check for the problem elsewhere instead of disabling udev sync.
Automatically use --enable-notify-dbus when building lvmlockd
if not configured otherwise by a configure user -
as the lvmlockd.service is notify based.
Split description for display commands so we can better describe
it's usage and combination of individual options in man page.
Now we can separately describe:
lvdisplay, lvdisplay -c, lvdisplay -C
vgdisplay, vgdisplay -c, vgdisplay -C
pvdisplay, pvdisplay -c, pvdisplay -C
TODO: Drop validation from command code itself.
Update the _print_man_option_desc() to also handle common parts
as the initial text without any specified section and also
add support for '#\n' to be able to revert to common part.
Avoid printing lvm2 command trace, if the test finds the 'dmeventd'
was started unxpectedly during testing as the last command is hardly
ever responsible for this
Also reorder some messages when doing teardown of devices.
Do not print 'help' message from hostname command, when it does
not support option '-I'.
Since we now keep lv names valid all the time (as they are part
of radix_tree) - there is a problem with this renaming code, that
for a moment used duplicated name in vg struct.
Fix it by interating LVs backwared - which avoids breaking consitency
and also actually makes code more simple.
Add check for 'leaked' symlinks after test and trap
the case when some 'danglink' links are present.
This might be some problem with udev synchronization
or some other strange race.
All such symlinks will be also removed so they will not
influence following tests.
When command prints warning about suppressing query
for inactive table, because this is not supported
by kernel - 1 printed message is just enough, no
reason to 'spam' command output all the time, message
will remain only in debug log.
Also drop 'WARNING:' from real 'error' message.
WARNIGS are supposed to be just warning and command
then exists with 'success'.
When specifying minimum_io_size with --vdosettings,
command assumed wrong unit (sectors).
So '--vdosettings minimum_io_size=512|4096' resulted into
an error that only 512 or 4096 values are allowed, but
at the same time values 1 or 8 were accepted.
So fix by converting any number >= 512 to 'sectors' and
keep input of 1 or 8 still valid if anyone has been using
this before.
So now we take 512 or 4096 and still also 1 or 8 with the
same effect.
Also correct the 'error' message when invalid minimum_io_size
is specified.
The df -a looks at whole system and it returns an error code in case
there's an inaccessible fs which is not even part of the testing environment.
The -a for df is not actually needed here in the lvresize-xfs test, so remove it.
Fix stripe count and size parameter validation for RAID LVs and
include existing automatic setting of these parameters based
on current shape of the RAID LV in case these are not set
on command line fully.
Previously, this was done only to a certain subset given by this
condition (where the 'stripes' is the '-i|--stripes' cmd line arg
and the 'stripe_size' is actually the '-I|--stripesize' cmd line arg):
!(stripes == 1 || (stripes > 1 && stripe_size))
This condition is a bit harder to follow at first sight and there
are no comments around with explanation for why this one is used,
so let's analyze it a bit more.
First, let's convert this to an equivalent condition (De Morgan law)
so it's easier to read for humans:
stripes != 1 && !(stripes > 1 && stripe_size)
Note: Both stripe and stripesize are unsigned integers, so they can't be negative.
Now, based on that condition, we were running the code to deduce the
stripe/stripesize and do the checks ("the code") only if both of these
are true:
- stripes is different from 1
- we don't have stripes > 1 and stripe_size defined at the same time
But this is not correct in all cases, because:
A) if someone uses stripes = 0, then "the code" is executed
(correct)
B) if someone uses stripes = 1, then "the code" is not executed
(wrong: we still need to be able to check the args against
existing RAID LV stripes whether it matches)
- if someone uses stripes > 1, then "the code" is:
C) if stripe_size = 0, executed
(correct)
D) if stripe_size > 0, not executed
(wrong: we still want to check against existing RAID LV stripes)
Current issues with this condition:
The B) ends up with segfault.
❯ lvextend -i 1 -l+1 vg/lvol0
Rounding size 4.00 MiB (1 extents) up to stripe boundary size 8.00 MiB (2 extents).
Segmentation fault (core dumped)
The D) ends up with errors like:
❯ lvextend -i 3 -l+1 -I128k vg/lvol0
Rounding size 4.00 MiB (1 extents) up to stripe boundary size 8.00 MiB (2 extents).
Rounding size (4 extents) up to stripe boundary size for segment (5 extents).
Size of logical volume vg/lvol0 changed from 8.00 MiB (2 extents) to 20.00 MiB (5 extents).
LV lvol0: segment 1 with len=5 has inconsistent area_len 3
Couldn't read all logical volumes for volume group vg.
Failed to write VG vg.
Conclusion:
The condition needs to be removed so we always run "the code" to check
given striping args given on command line against existing RAID LV
striping. The reason is that we don't want to allow changing stripe
count for RAID LVs through lvextend and we need to end up with the
error:
"Unable to extend <RAID segment type> segment type with different number of stripes"
(We do support changing the striping by lvconvert's reshaping functionality only).
With commit acbeaa7a8d we started
to use symlinks to link test suite shell scripts, however
they remained within CLEAN_TARGETS.
So when running 'make clean' within non-srcdir build dir, we
were cleaning actuall shell script in this dir.
So remove list of this script from CLEAN_TARGETS in this case.
get_sizes_lockspace() may not always initilize all passed values
in case the bitfield would not trigger if() path.
So just in case keep the path initilized.
TODO: maybe add INTERNAL_ERROR to get_sizes_lockspace().
When converting a VG to locktype sanlock, a new
lease is allocated for each existing lv. Finding
a new lease location involved searching the lvmlock
LV from the start for an unused location, which
would be very slow with many LVs. Improve this by
starting each search from the last used location.
Fix regression from commit 7f29afdb06
"lvmlockd: configurable sanlock lease sizes on 4K disks"
That change failed to recognize that a running lockspace will not
exist in lvmlockd when converting a local VG to a sanlock VG, i.e.
vgchange --locktype sanlock vgname. When the vgchange attempted
to initialize new lv leases for existing LVs, lvmlockd would
return an error when it found no lockspace.
When searching for committed LV by uuid, this search can
be expensive for commands like 'vgremove' - so for
this part introduce 'lv_uuids' radix_tree that is
build with first access to lv_committed().
Since there is a group of commands that need to access 'lv_list'
while still need to search for LV by its name, make the whole
struct lv_list a member of logical_volume structure.
This makes it easy to return also 'lv_list' this list this LV
within VG.
Also the patch should not use more memory, since we were allocating
lv_list for each LV anyway when linkin LV to VG.
Since find_lv_by_name() is now using radix_tree(),
use the same 'search for /' in LV in name for both
find_lv() & find_lv_in_vg().
TODO: Possibly refactor code and use only dm_list
instead of lv_list and dereference LV with container_of()
(thus saving pointer within struct logical_volume) - but
we use 'lv_list' currently in many places...
Add lvm.conf config/validate_metadata configurable setting.
Allows to disable validation of volume_group structure before
writing to disk.
Call of vg_validate() is supposed to catch any inconsistency
of in-memory volume group structure and possibly early aborting
commnand before making any more 'damage' in case the VG struct
is found insistent after some metadata manipulation.
This is almost always useful for devel - and also for normal user
as for small metadata size this doesn't add too much overhead.
However if the volume_group size is large and operations are just
adding removing simple LVs - this validation time may add noticable
to final command running time.
So if the user seeks the highest perfomance of command and does
not do any 'complex' metadata manipulation - it's reasonably safe
to disable validation (with the use of setting "none") here.
With presence of uniq_insert, use this function also
here for extra protection and check for duplicate lv_name
when inserting a new name into radix_tree.
Avoid config 'grep' with actual 'randomly' generated path name
which may eventually contain 'cc' as part the path and
causing a mismatch of the grep test.
When 'lvresize -r' is used to resize the volume, it's valid to
resize even to the same size of an LV, as the command then runs
fs-resize utility to eventually upsize the fs to the current
volume size.
Return code of such command then reflects the return value
of this fs-resize tool.
This fixes the regression introduced when the support
for option --fs was added (2.03.17).
Use proper function names in annotation
There are no fuction named print_common_options_cmd()
and print_common_options_lvm(). So, rename them to the
real function named print_usage_common_cmd() and
print_usage_common_lvm().
Signed-off-by: YunJian Long
Tricky one - as the pipe exit codes may result into whole
test failure depending on how quick/slow command exits
are within pipeline.
So get the len without piping.
Replace usage of dm_hash with radix_tree to quickly find LV name
with a vg and also index PV names with set of available PVs.
This PV index is only needed during the import, but instead
of passing 'radix_tree *' everywhere, just keep this within
a VG struct as well and once the parsing is finished, release
this PV index radix_tree.
This also makes it easier to replace this structure
in the future if needed.
lv_set_name now uses radix_tree remove+insert to keep lv_names
tree in-sync and usable for find_lv queries.
Enhance usage with uniq_insert and also try to better
utilize CPU cache and do a smaller loop for individual
hashing of lvname and separately lvid.
Also correcting usage of 'continue' within validation of
historical names as it should report as much errors
as it can within a loop.
When using radix_tree to identify duplicate entries we may
avoid to call an extra 'lookup()' prior the insert() operation
add radix_tree_uniq_insert/_ptr() that is able to report -1 if
there was already set a value for the given key.
Avoid finding problems in vg_validate when restoring
invalid VG metadata as that would lead to internal error.
i.e. adding unsupported METADATA_FLAG to zero segtype
can trigger such thing.
Previous update needed to add handling segtype within flag.c
which somewhat breaks API separition and also had bug in hanlding
actual flags.
So instead keep segtype code in _read_segtype_and_lvflags() within
import_vsn1.c and handle purly flags in read_lvflags() from const
string.
Instead of duplicating whole segtype string with flags and
using 2 calls read_segtype_lvflags() + get_segtype_from_string(),
merge the functionality into a single read_segtype_and_lvflags().
This allows to make only a local string copy (no allocs) and eventually
to not copy segtype string at all, when there are no flags.
As the 'emit_to_buffer' uses relatively complex
vsnprintf() call inside, try to reduce number
of unnecessary calls and try replace some more
complex string build with a single call instead.
With existing code, the cache was working only to the 2nd. locking.
So i.e. when 'lvs' scans system with more then one VG, the caching
was effectively not working.
Update the code, so the label invalidate code is able to update DM
cache - so whenever we take a new lock - we will refresh the cache.
TODO: the refresh ATM does a very simple compare of old a new list
of cached DM device, and with the first spotted difference, it just
fallback to the full rebuild of DM cache - with large amount of active
devices this might not the most efficient way....
Since we detect 'debug' level after calling 'log_debug()' - all
the arguments are evaluated, so in this case display_lvname() was
preparing a string that is not used in case debugging is not enabled.
So since these string are on 'hot-path' and it's already known
which VG is being worked on, in these few cases just use lv->name.
When processing LVs for a command we stored '*object_id' & '*group_id'
as printable string that was however only used with json reporting.
Refactor code so we simply store there 'struct id*' that is just
converted into printable string when json reporting is really used.
Also check for 'sigint()' right before loop processing begins which
is primary purpose of this test.
This function call is able to setup config parser so it stops
parsing 'subsection' nodes after parsing named section node.
Only nodes at 'level' 0 will be still processed. And this nodes
are found by searching for last \n}\n sequence from the end of
buffer (instead of trying to analyze all the text in buffer).
Replace use of dm_hash with radix_tree when making PV index names.
Store just the index number itself and use pv%d for outf() string.
For lookup up a PV - use just the PV pointer itself, it's faster then
converint for it's ID to UUID format.
Split single check_lv_segments() into 2 separate
versions so they can be called independently.
This allow to 'skip' already checked segment
check after it's been imported to VG and also
avoid another repeated checking when validating
segment with complete vg.
**
check_lv_segments_incomplete_vg()
this check just basic LV segment properties and does not
validate those requiring full VG.
**
check_lv_segments_complete_vg()
Remaining check that expects complete VG is present.
ATM this rather save a lot of unncessary log entries as it grabs
the global autoextend_threshold (profile == NULL) just once instead
of revealing it every time with NULL profile.
Track whether import has even seen segment of LV with log_lv,
and call fixup mirror only in this case.
Also avoid repeated lookup of get_segtype_from_string for
SEG_TYPE_NAME_MIRROR.
Use bigger memory pool chunk size and reduces amount of
memory pool extensions when handling larger metadata, but do not
make it noticable bigger when handling small ones...
Use same large value also when allocating VG memory pool.
Instead of allocating string from a pool, for shorted strings
use buffer on stack since the string after the use in _find_or_make_node()
as no longer needed.
Eventually we may enhance code also for TOK_STRING_ESCAPED and TOK_STRING,
but they appear to be unused for _section().
For the most common part check for '#' when it's known it's not a space.
And also when we checked for '\n' we dont need to check again isspace().
Also help a bit more 'gcc' optimizer to grab buffer char just once and
simplify jump to next characted in the buffer when checking for token.
Avoid double dm_pool allocation call by copying string
for node name and config value directly after the end
of node/value structure.
It would be likely better to not copy these strings at all
and derefence it from the original string however that
needs futher changes in the code base.
This code is faster when calculating crc32 checksum for larger
block areas. There is also SIMD variant present in the code,
however ATM the influence on performance of lvm2 is not that big..
When BLKZEROOUT ioctl fails, it should not stop us from trying the direct
zeroing as a fallback action, since this is an optimization only.
We should be able to continue with new LV creation if we succeed
with that direct fallback then.
Related report: https://issues.redhat.com/browse/RHEL-58737
commit a125a3bb50 "lv_remove: reduce commits for removed LVs"
changed "lvremove <vgname>" from removing one LV at a time,
to removing all LVs in one vg write/commit. It also changed
the behavior if some of the LVs could not be removed, from
removing those LVs that could be removed, to removing nothing
if any LV could not be removed. This caused a regression in
shared VGs using sanlock, in which the on-disk lease was
removed for any LV that could be removed, even if the command
decided to remove nothing. This would leave LVs without a
valid ondisk lease, and "lock failed: error -221" would be
returned for any command attempting to lock the LV.
Fix this by not freeing the on-disk leases until after the
command has decided to go ahead and remove everything, and
has written the VG metadata.
Before the fix:
node1: lvchange -ay vg/lv1
node2: lvchange -ay vg/lv2
node1: lvs
lv1 test -wi-a----- 4.00m
lv2 test -wi------- 4.00m
node2: lvs
lv1 test -wi------- 4.00m
lv2 test -wi-a----- 4.00m
node1: lvremove -y vg/lv1 vg/lv2
LV locked by other host: vg/lv2
(lvremove removed neither of the LVs, but it freed
the lock for lv1, which could have been removed
except for the proper locking failure on lv2.)
node1: lvs
lv1 test -wi------- 4.00m
lv2 test -wi------- 4.00m
node1: lvremove -y vg/lv1
LV vg/lv1 lock failed: error -221
(The lock for lv1 is gone, so nothing can be done with it.)
Detect when we have mixed dos partition with gpt's PMBR partition.
This is not a sane configuration, but detect it anyway, just in case
someone configures such partition layout manually and forcefully and
incorrectly defines one of the partition types to be the GPT's PMBR.
For example:
❯ fdisk -l /dev/sdc
Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 67583 65536 32M 83 Linux
/dev/sdc2 67584 262143 194560 95M ee GPT
Before:
(The partition filter passes even though there's real existing dos
partition - the empty GPT PMBR overrides it.)
❯ pvcreate /dev/sdc
WARNING: PMBR signature detected on /dev/sdc at offset 510. Wipe it? [y/n]:
Wiping PMBR signature on /dev/sdc.
Physical volume "/dev/sdc" successfully created.
With this patch applied:
(The GPT PMBR does not override the existence of the dos partition.)
❯ pvcreate /dev/sdc
Cannot use /dev/sdc: device is partitioned
This provides better hints when trying to resize the fs on top of an LV.
Also needs a3f6d2f593 for proper operation.
❯ lvs -o name,size vg/swap
lv_name lv_size
swap 60.00m
Before:
❯ lvextend -L72m vg/swap
Size of logical volume vg/swap changed from 60.00 MiB (15 extents) to 72.00 MiB (18 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L60m vg/swap
File system swap found on vg/swap.
File system device usage is not available from libblkid.
❯ lvreduce -L50m vg/swap
Rounding size to boundary between physical extents: 52.00 MiB.
File system swap found on vg/swap.
File system device usage is not available from libblkid.
After:
❯ lvextend -L72m vg/swap
Size of logical volume vg/swap changed from 60.00 MiB (15 extents) to 72.00 MiB (18 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L60m vg/swap
File system swap found on vg/swap.
File system size (60.00 MiB) is equal to the requested size (60.00 MiB).
File system reduce is not needed, skipping.
Size of logical volume vg/swap changed from 72.00 MiB (18 extents) to 60.00 MiB (15 extents).
Logical volume vg/swap successfully resized.
❯ lvreduce -L50m vg/swap
Rounding size to boundary between physical extents: 52.00 MiB.
File system swap found on vg/swap.
File system size (60.00 MiB) is larger than the requested size (52.00 MiB).
File system reduce is required and not supported (swap).
blkid does not report FSLASTBLOCK for a swap device. However, blkid
does report FSSIZE for swap devices, so use this field (and including
the header size which is of FSBLOCKSIZE for the swap) instead to
set the "filesystem last block" which is used subsequently for
further calculations and conditions.
We already detect msdos partition table. If it is empty, that is, there
is just the partition header and no actual partitions defined, then the
filter-partitioned passes, otherwise not.
Do the same for GPT partition table.
New config setting sanlock_align_size can be used to configure
the sanlock lease size that lvmlockd will use on 4K disks.
By default, lvmlockd and sanlock use 8MiB align_size (lease size)
on 4K disks, which supports up to 2000 hosts (and max host_id.)
This can be reduced to 1, 2 or 4 (in MiB), to reduce lease i/o.
The reduced sizes correspond to smaller max hosts/host_id:
1 MiB = 250 hosts
2 MiB = 500 hosts
4 MiB = 1000 hosts
8 MiB = 2000 hosts (default)
(Disks with 512 byte sectors always use 1MiB leases and support
2000 hosts/host_id, and are not affected by this.)
In cases user is sure he is not using his 'rootfs' or 'swap' on LVs
managed with his command - it possible to completely bypass pinning
process to RAM which may eventually slightly speedup command execution,
(however at the risk the process can be eventually delayed by swapping).
Basicaly use this only at your risk...
TODO: add some dmeventd support for this.
Previously, lvmlockd detected the end of the lvmlock LV
by doing i/o to it until an i/o error was returned.
This triggered sanlock warning messages, so use the LV
size to avoid accessing beyond the end of the device.
Previously, every lvcreate would refresh the lvmlock LV
in case another machine had extended it. This involves
a lot of unnecessary work in most cases, so now compare
the LV size and device size to detect when a refresh is
needed.
lvremove of a thin lv while the pool is inactive would
leave the pool locked but inactive.
lvcreate of a thin snapshot while the pool is inactive
would leave the pool locked but inactive.
lvcreate of a thin lv could activate the pool to check
a threshold before the pool lock was acquired in lvmlockd.
The lv_hash wasn't being passed to the seg-specific text import
functions, so they were doing many find_lv() calls which consumes
a lot of time when there are many LVs in the metadata.
While performing udev sync semaphore's inc/dec operation, we use the
result from GETVAL semctl just to print a debug message with current
value of that sempahore, nothing else.
If the GETVAL fails for whetever reason while the actual inc/dec
completes successfully, just log a warning message about the GETVAL
(and print the debug messages without the actual semaphore value)
and return success for the inc/dec operation as a whole.
Clean up udev sync semaphore on fail path during its creation, otherwise
the caller will have no handle returned to clean it up itself and the
semaphore will keep staying in the system. The only way to clean it up
would be to call `dmsetup udevcomplete_all` which would destroy all
udev sync semaphores, not just the failed one, which we don't want.
The same message is printed while performing create/inc/dec operation and
the GETVAL semctl fails. Add a prefix so we know exactly in which of
these functions the issue actually happened.
LV with pvmove_ prefix is not allowed to be created by user
so bigger chance our selected name will never exist.
TODO: probably add code to get generic unused LV name...
Older gcc doesn't really like complex types (buffer, struct) to be
initialized without extra {} around such type.
So pick any other 'single type' var from a struct and set it to 0,
rest will do the compiler without emitting a warning.
Revert 373372c8ab and instead update
our validation code to handle LVs with empty segment - currently
we should need this only for pvmove operation, thus such LV should
have name 'pvmove%u'.
This fixes a problem where user tried i.e. pvmove on a VG with single
PV - as reported: https://github.com/lvmteam/lvm2/issues/148
Reported-by: bob@redhat.com
The .cache and compile_commands.json is used by popular source crawling and
indexing clang tools which in turn may be integrated with source code editors.
We may reuse the .cache directory for for other caches and temporary
files.
The /doc/.ikiwiki and /public are related to the ikiwiki.
Avoid possible udev race - since dmsetup create is
not using the same cookie logic as lvm2 commands,
try to avoid racing on some systems with udev scanning.
The option can be used in multiple ways (like --cachesettings):
--integritysettings key=val
--integritysettings 'key1=val1 key2=val2'
--integritysettings key1=val1 --integritysettings key2=val2
Use with lvcreate or lvconvert when integrity is first enabled
to configure:
journal_sectors
journal_watermark
commit_time
bitmap_flush_interval
allow_discards
Use with lvchange to configure (only while inactive):
journal_watermark
commit_time
bitmap_flush_interval
allow_discards
lvchange --integritysettings "" clears any previously configured
settings, so dm-integrity will use its own defaults.
lvs -a -o integritysettings displays configured settings.
log/command_log_report config setting defaults to 1 now if json or json_std
output format is used (either by setting report/output_format config
setting or using --reportformat cmd line arg).
This means that if we use json/json_std output format, the command log
messages are then part of the json output too, not interleaved as
unstructured text mixed with the json output.
If log/command_log_report is set explicitly in the config, then we still
respect that, no matter what output format is used currently. In this
case, users can still separate and redirect the output by using
LVM_OUT_FD, LVM_ERR_FD and LVM_REPORT_FD so that the different types
do not interleave with the json/json_std output.
In case of different PV sizes in a VG, the lvm2 allocator falls short
to define extended segments resiliently asked for 100%FREE RaidLV extension
and a RAID distinct allocation check fails. Fix is to release a memory pool
on the resulting error path.
Until the lvm2 allocator gets enhanced (WIP) to do such complex (and other)
allocations proper, a workaround is to extend a RaidLV to any free space on
its already allocated PVs by defining those PVs on the lvextend command line
then iteratively run further such lvextend commands to extend it to its
final intended size. Mind, this may be a non-trivial extension interation.
The cmd struct is now required in many more functions, and
it's added as a function arg for most direct dev-cache function
calls. The cmd struct is added to struct device (dev->cmd) so
that it can be accessed in many other cases where dev-cache
functions are being called from places where getting the cmd
struct is too difficult.
The dm devs cache is separate from the ordinary dev cache,
so give the function names distinct prefixes, using
"dm_devs_cache" to prefix dm devs cache functions.
When a PV is stacked on an LV, the PV needs to be
dropped from bcache before the LV is processed.
The LV can be found in dev-cache using its name
rather than the devno.
The list of dm devs was in the cmd struct and had a
different lifetime than the radix trees referencing
those dm devs. Now the list and radix trees are
created and destroyed together.
In the context of dm, 'device' refers to a dm device, but
in the context of lvm, 'device' refers to struct device.
Change some lvm function names to make that difference clearer.
dev_manager_get_device_list() -> dev_manager_get_dm_active_devices()
get_device_list() -> get_dm_active_devices()
device_get_uuid() -> dev_dm_uuid(), devno_dm_uuid()
The comment explained that the ex global lock was just
used to trigger global cache invalidation, which is no
longer needed. This extra locking can cause problems
with LVM-activate when local and shared VGs are mixed
(and the incorrect exit code for errors was causing
problems.)
vgchange -an vg is permitted when the vg lockspace
is not available, because LVs could still be active
for some reason, and they should be inactive when not
properly locked. In case lvmlockd was not running, or
the lockspace was not started, the command was
unnecessarily trying and failing to unlock every LV,
printing errors for every LV. We can skip this when
the lockspace is known to not be available.
vgchange --lockstop will fail with EBUSY if orphan locks in the
lock manager prevent stopping the lockspace. The orphan locks
can then be adopted and released, and the lockspace then stopped
cleanly.
Lock adoption is not part of standard command behavior, but can
be used for manual recovery or cleanup from unexpected failure
cases. Like other lockopt values, they are hidden options for
--lockopt. Different lock managers will behave differently.
Adopting locks with lvmlockd -A1 is more accurate and automatic.
--lockopt adoptls
. for vgchange --lockstart
. adopt existing ls, or fail if no existing lockspace is found
--lockopt adoptgl | adoptvg | adoptlv
. for commands using lvmlockd locks
. adopt orphan gl/vg/lv lock, or fail the lock request if
no orphan lock is found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
--lockopt adopt
. for lockstart or any command using lvmlockd locks
. adopt existing lockspace, or start lockspace if none exists
. adopt orphan gl/vg/lv lock, or acquire new lock if no orphan found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
. with dlm this option only works for ls
Stop printing "Skipping global lock: lockspace not found or started"
for vgchange --lockstart, since it's generally an inherent limitation
that the global lock isn't available until after locking is started.
Update the start delay warning to "a few seconds".
The lvb is used to hold lock versions, but lock verions are
no longer used (since the removal of lvmetad), so the lvb
is not actually useful. Disable their use for sanlock to
avoid the extra i/o required to maintain the lvb.
vgremove with --lockopt force should skip lvmlockd-related
steps and allow a forced vg cleanup, in addition to using
--nolocking to skip normal locking calls.
Previously, a command would call lockd_vg() for a local VG,
which would go to lvmlockd, which would send back ENOLS,
and the command would not care when it saw the VG was local.
The pointless back-and-forth to lvmlockd for local VGs can
be avoided by checking the VG lock_type in lvmcache (which
label_scan now saves there; this wasn't the case back when
the original lockd_vg logic was added.) If the lock_type
saved during label_scan indicates a local VG, then the
lockd_vg step is skipped.
The idea in the patch 6e6d4c62b for handling -suffix as
indication of private device needs to be disabled.
Some problematic cases are currently not resolvable and some
more thinking is needed.
Once fixed, we can revert this patch.
For large device sets our dm_hash can produce larger amounf of mapping
collision and we would need to further increase our has size.
So instead use the radix_tree code which is immune agains growing size
of devices and uses memory more effiecently to store all the paths.
Instead of less efficient 'btree' switch dev_cache to use
radix_tree, that is generating more efficient tree mapping.
Some direct use of btree iteration replace with our dev_iter code.
Convert the persisten filter to use more memory compact radix_tree as
dm_hash is bound to preallocated number of slots and stores whole
key together with value.
When DM uuid cache is available, we can possibly avoid unnecessary
status ioctl() when we check the device for 'usable' uuid.
If this test passes the existing code will got through the full check.
Move the code around caching active dm device devno, name and uuid
from device_mapper/libdm-iface to dev_cache file - as libdm layer
cares about 'decoding' ioctl data from kernel and caching for use by
lvm stays within lvm.
Introduce:
dev_cache_update_dm_devs
dev_cache_get_dm_dev_by_devno
dev_cache_get_dm_dev_by_uuid
Use radix_tree for searching.
Do not attach layer suffix to the UUID when activating component LV.
In this case we want to see allow this LV to be public, thus
such LV should not be using -layer suffix in its UUID.
This also requires that our 'cached' access will check for
both UUID (with & without suffix) which was unnoticed issue before.
This change is now necesssary since our udev rule automatically expects
any LV with -layer suffix is private and will prevent generaring
any systemd unit even when there are no 'DM' flags bits passed via
cookie mechanism while creating such LV.
In order to free SubLVs after a stripe removing reshape, lvconvert has
to be run without layout changes. Prevent a layout changing request
in case any such freed SubLVs exist and inform the user about the fact
requesting to release them first.
Calculates expected size before/after reshapes adding/removing stripes
to/from RaidLVs with levels 4/5/6/10 and compares it with the actual
one the block layer shows. Stripes reshaped to are listed in the
tst_stripes variable. mkfs/fsck/resize2fs the respective RaidLVs
to confirm ext4 can be resized accordingly without issues.
2024-06-11 15:44:36 +02:00
859 changed files with 50153 additions and 21699 deletions
Most of linux ditribution offer packaged LVM tools.
Most of linux distribution offer packaged LVM tools.
Depending on your distribution use
# RPM based distributions (Fedora):
@@ -49,7 +49,7 @@ List of official [mirror sites](https://sourceware.org/mirrors.html) (including
### LVM Releases
[[!inline pages="release-notes/2.03.* and !*/template and !*/Discussion and !tagged(draft) and !tagged(pending)" limit=2 rootpage="release-notes"]]
[[!inline pages="release-notes/2.03.* and !*/template and !*/Discussion and !tagged(draft) and !tagged(pending)" limit="2" show="2" rootpage="release-notes"]]
[[More releases|release-notes/index]]
@@ -64,7 +64,7 @@ tool with low level access and one may seriously harm their data when used
incorrectly!
* Physical Volume (PV) is underlaying disk, local or remote, encrypted or even
* Physical Volume (PV) is underlying disk, local or remote, encrypted or even
a mdadm RAID volume. PV is divided into so called Physical Extents (PE) which
are a basic allocation unit.
List PVs using [pvs(8)](https://man7.org/linux/man-pages/man8/pvs.8.html) or
@@ -72,7 +72,8 @@ incorrectly!
Make one by running `pvcreate /dev/sdX`.
See [pvcreate(8)](https://man7.org/linux/man-pages/man8/pvcreate.8.html). This step is optional.
* Volume Group (VG) consisting for one or more PVs is used as a pool from which LVs are allocated.
* Volume Group (VG) consisting of one or more PVs is used as a pool from which LVs are allocated.
List VGs using [vgs(8)](https://man7.org/linux/man-pages/man8/vgs.8.html) or
Make one by running `lvcreate [-n LVNAME] -L SIZE VGNAME`, and you are done!
See [vgcreate(8)](https://man7.org/linux/man-pages/man8/vgcreate.8.html).
See [lvcreate(8)](https://man7.org/linux/man-pages/man8/lvcreate.8.html).
To change size of LV it is recommended to use [lvresize(8)](https://man7.org/linux/man-pages/man8/lvresize.8.html) with `--resizefs` option.
To change properties of LV (e.g. to acivate/deactivate a volume, or change it to read only) use [lvchange(8)](https://man7.org/linux/man-pages/man8/lvchange.8.html).
To change the type of LV (e.g. change a linear volume to a RAID) use [lvconvert(8)](https://man7.org/linux/man-pages/man8/lvconvert.8.html).
## Avoiding Problems
Good start is to avoid using `{--force|-f}` and `{--yes|-y}` options which are
Good start is to **avoid using `{--force|-f}` and `{--yes|-y}` options** which are
often seen on internet discussions.
there is a possibility of data loss, LVM tools usually ask, so read the prompts
carefully! Using `--yes` removes these safety.
Also in some cases where it is too dangerous to proceed, e.g. device is used,
LVM refuses to do so, which can be overridden by `--force`.
Second, when resizing and especially when shrinking LVs it is always a good
idea to use `--resizefs` option which ensures the devices are resized in
Second, when **resizing** and especially when shrinking LVs it is always a good
idea to **use `--resizefs` option** which ensures the devices are resized in
correct order.
Third, if you still make a mess, never ever run fsck on damaged LV/FS, this is
Third, if you still make a mess, **never ever run fsck on damaged LV/FS**, this is
usually the final blow to your data. It is always better to ask first!
/* Define to 1 to include code that uses lvmlockd IDM option. */
#undef LOCKDIDM_SUPPORT
/* Define to 1 to include code that uses lvmlockd sanlock option. */
/* Define version of sanlock. */
#undef LOCKDSANLOCK_SUPPORT
/* Define to 1 if 'lstat' dereferences a symlink specified with a trailing
@@ -598,6 +598,9 @@
/* Define to 1 to include code that uses lvmlockd. */
#undef LVMLOCKD_SUPPORT
/* Path to lvmpersist script. */
#undef LVMPERSIST_PATH
/* Path to lvmpolld pidfile. */
#undef LVMPOLLD_PIDFILE
@@ -633,6 +636,9 @@
/* Define to 1 to include code that uses dbus notification. */
#undef NOTIFYDBUS_SUPPORT
/* Use libnvme for WWID. */
#undef NVME_SUPPORT
/* Define to 1 to enable O_DIRECT support. */
#undef O_DIRECT_SUPPORT
@@ -660,6 +666,9 @@
/* Define to 1 to include the LVM readline shell. */
#undef READLINE_SUPPORT
/* Define to 1 to include code that uses sd_notify. */
#undef SD_NOTIFY_SUPPORT
/* Define to 1 to include built-in support for snapshots. */
#undef SNAPSHOT_INTERNAL
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.