IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The old code was doing unnecessary label scans when
checking to see if the new VG name exists. A single
label_scan is sufficient if it is done after the
new VG lock is held.
When lvmlockd indicates that the lvmetad cache is out of
date because of changes by another node, lvmetad_pvscan_vg()
rescans the devices in the VG to update lvmetad. Use the
new label_scan in this function to use the common code and
take advantage of the new aio and reduced reads.
The new label_scan() function reads a large buffer of data
from the start of the disk, and saves it so that multiple
structs can be read from it. Previously, only the label_header
was read from this buffer, and the code which needed data
structures that immediately followed the label_header would
read those from disk separately. This created a large
number of small, unnecessary disk reads.
In each place that the two read paths (label_scan and vg_read)
need to read data from disk, first check if that data is
already available from the label_read_data buffer, and if
so just copy it from the buffer instead of reading from disk.
Code changes
------------
- passing the label_read_data struct down through
both read paths to make it available.
- before every disk read, first check if the location
and size of the desired piece of data exists fully
in the label_read_data buffer, and if so copy it
from there. Otherwise, use the existing code to
read the data from disk.
- adding some log_error messages on existing error paths
that were already being updated for the reasons above.
- using similar naming for parallel functions on the two
parallel read paths that are being updated above.
label_scan path calls:
read_metadata_location_summary, text_read_metadata_summary
vg_read path calls:
read_metadata_location_vg, text_read_metadata_file
Previously, those functions were named:
label_scan path calls:
vgname_from_mda, text_vgsummary_import
vg_read path calls:
_find_vg_rlocn, text_vg_import_fd
I/O changes
-----------
In the label_scan path, the following data is either copied
from label_read_data or read from disk for each PV:
- label_header and pv_header
- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_summary)
- vg metadata (in config_file_read_fd)
Total of 4 reads per PV in the label_scan path.
In the vg_read path, the following data is either copied from
label_read_data or read from disk for each PV:
- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_vg)
- vg metadata (in config_file_read_fd)
Total of 3 reads per PV in the vg_read path.
For a common read/reporting command, each PV will be:
- read by the command's initial lvmcache_label_scan()
- read by lvmcache_label_rescan_vg() at the start of vg_read()
- read by vg_read()
Previously, this would cause 11 synchronous disk reads per PV:
4 from lvmcache_label_scan(), 4 from lvmcache_label_rescan_vg()
and 3 from vg_read().
With this commit's optimization, there are now 2 async disk reads
per PV: 1 from lvmcache_label_scan() and 1 from
lvmcache_label_rescan_vg().
When a second mda is used on a PV, it is located at the
end of the PV. This second mda and copy of metadata will
not be found in the label_read_data buffer, and will always
require separate disk reads.
This fixes the use of lvmcache_label_rescan_vg() in the previous
commit for the special case of independent metadata areas.
label scan is about discovering VG name to device associations
using information from disks, but devices in VGs with
independent metadata areas have no information on disk, so
the label scan does nothing for these VGs/devices.
With independent metadata areas, only the VG metadata found
in files is used. This metadata is found and read in
vg_read in the processing phase.
lvmcache_label_rescan_vg() drops lvmcache info for the VG devices
before repeating the label scan on them. In the case of
independent metadata areas, there is no metadata on devices, so the
label scan of the devices will find nothing, so will not recreate
the necessary vginfo/info data in lvmcache for the VG. Fix this
by setting a flag in the lvmcache vginfo struct indicating that
the VG uses independent metadata areas, and label rescanning should
be skipped.
In the case of independent metadata areas, it is the metadata
processing in the vg_read phase that sets up the lvmcache
vginfo/info information, and label scan has no role.
LVM's general design for scanning/reading of metadata from disks is
that a command begins with a discovery phase, called "label scan",
in which it discovers which devices belong to lvm, what VGs exist on
those devices, and which devices are associated with each VG.
After this comes the processing phase, which is based around
processing specific VGs. In this phase, lvm acquires a lock on
the VG, and rescans the devices associated with that VG, i.e.
it repeats the label scan steps on the devices in the VG in case
something has changed between the initial label scan and taking
the VG lock. This ensures that the command is processing the
lastest, unchanging data on disk.
This commit moves the location of these label scans to make them
clearer and avoid unnecessary repeated calls to them.
Previously, the initial label scan was called as a side effect
from various utility functions. This would lead to it being called
unnecessarily. It is an expensive operation, and should only be
called when necessary. Also, this is a primary step in the
function of the command, and as such it should be called prominently
at the top level of command processing, not as a hidden side effect
of a utility function. lvm knows exactly where and when the
label scan needs to be done. Because of this, move the label scan
calls from the internal functions to the top level of processing.
Other specific instances of lvmcache_label_scan() are still called
unnecessarily or unclearly by specific commands that do not use
the common process_each functions. These will be improved in
future commits.
During the processing phase, rescanning labels for devices in a VG
needs to be done after the VG lock is acquired in case things have
changed since the initial label scan. This was being done by way
of rescanning devices that had the INVALID flag set in lvmcache.
This usually approximated the right set of devices, but it was not
exact, and obfuscated the real requirement. Correct this by using
a new function that rescans the devices in the VG:
lvmcache_label_rescan_vg().
Apart from being inexact, the rescanning was extremely well hidden.
_vg_read() would call ->create_instance(), _text_create_text_instance(),
_create_vg_text_instance() which would call lvmcache_label_scan()
which would call _scan_invalid() which repeats the label scan on
devices flagged INVALID. lvmcache_label_rescan_vg() is now called
prominently by _vg_read() directly.
To do label scanning, lvm code calls lvmcache_label_scan().
Change lvmcache_label_scan() to use the new label_scan()
which can use async io, rather than implementing its own
dev iter loop and calling the synchronous label_read() on
each device.
Also add lvmcache_label_rescan_vg() which calls the new
label_scan_devs() which does label scanning on only the
specified devices. This is for a subsequent commit and
is not yet used.
This adds the implementation without using it in the code.
The code still calls label_read() on each individual device
to do scanning.
Calling the new label_scan() will use async io if async io is
enabled in config settings. If not enabled, or if async io fails,
label_scan() will fall back to using synchronous io. If only some
aio ops fail, the code will attempt to perform synchronous io on just
the ios that failed.
Uses linux native aio system calls, not the posix wrappers which
are messier and may not have all the latest linux capabilities.
Internally, the same functionality is used before:
- iterate through each visible device on the system,
provided from from dev-cache
- call _find_label_header on the dev to find the sector
containing the label_header
- call _text_read to look at the pv_header and mda locations
after the pv_header
- for each mda location, read the mda_header and the vg metadata
- add info/vginfo structs to lvmcache which associate the
device name (info) with the VG name (vginfo) so that vg_read
can know which devices to read for a given VG name
The new label scanning issues a "large" read beginning at the start
of the device, where large is configurable, but intended to cover
all the labels/headers/metadata that is located at the start of
the device. This large data buffer from each device is saved in a
global list using a new 'label_read_data' struct. Currently, this
buffer is only used to find the label_header from the first four
sectors of the device. In subsequent commits, other functions
that read other structs/metadata will first try to find that data in
the saved label_read_data buffer. In most common cases, the data
they need can simply be copied out of the existing buffer, and
they can avoid issuing another disk read to get it.
The amount of parallelism, aka the number of devices we read
concurrently, is limited by the number of async_events. Once
those event slots are used up, the scanning falls back to
doing synchronous io for remaining devs. So, if there are
more devices than default async_events, increasing the
async_events in the config will improve performance.
The code can be enhanced in the future to improve this by
mixing io submission with result processing. Processing
makes events available so more aio can be submitted without
reverting to sync io.
The interface consists of:
- A context struct, one for the entire command.
- An io struct, one per io operation (read).
- dev_async_context_setup() creates an aio context.
- dev_async_context_destroy() destroys an aio context.
- dev_async_alloc_ios() allocates a specified number of io structs,
along with an associated buffer for the data.
- dev_async_free_ios() frees all the allocated io structs+buffers.
- dev_async_io_get() gets an available io struct from those
allocated in alloc_ios. If none are available, it will allocate
a new io struct if under limit.
- dev_async_io_put() puts a used io struct back into the set
of unused io structs, making it available for get.
- dev_async_read_submit() start an async read io.
- dev_async_getevents() collect async io completions.
Converting from one raid level to another, no changes
of stripes or stripesize can be requested because those
are subject to reshaping. I.e. the process requires to
takeover first and secondly request raid algorithm,
stripe or stripesize changes.
Ignore any related changes display warninngs
and proceed with the takeover.
Without this patch, a takeover requesting
stripesize change causes data corruption!
Just like with other vars support this:
make check_local T=xyz LVM_LOG_FILE_MAX_LINES=10000000
Allows easily to override existing line limit.
Also increase limiting size of logs per command since some of
our commands are becoming very verbose....
Add explaining message, when command was aborted due to the reach
of configure line number count (LVM_LOG_FILE_MAX_LINES)
for logging (used mainly with testing).
Last patch missed to mention, we've improved/fixed generated paths
in units and init.d shell scripts when lvm2 was plainly configured
with just i.e. --prefix.
Note: some distros might have fully specified --sbindir and
--usrsbindir - thus those very not seeing problems in generated paths.
Replace lowercase @sbindir@ with @SBINDIR@ which contains
fully decoded path.
Same with @usrsbindir@ which is also used with clvmd and cmirrord.
Also handle SYSCONFDIR for EnvironmentFile.
Patch fixes generated unit files with strings like:
ExecStart=${exec_prefix}/sbin/lvm
Introduce few more AC_SUBST vars for usage in *.in generation.
In some case we want to replace i.e. $sbindir with full path
instead of current ${exec_prefix}/sbin.
This patch provides:
USRSBINDIR
SBINDIR
DEFAULT_SYS_LOCK_DIR
SYSCONFDIR
At the same time properly use sbindir & usrsbindir with
lvm, fsadm, clvmd from one primary definition.
Do not allow to take snapshot of mirror/raid leg or log or metadata LV.
This was actually never supported, but user was able to create it,
and this put device stack in hardly fixable state (needs manual work).
This prevents such creation to pass.
Also improve validation when recreating snapshot volume type
from origin and COW volume.
Correction to function for extracting vgname out of lvconvert
parameters.
Avoid repeating some checks.
Add code to handle generic options which may provide vgname in its argument
and compare them all so they match to a single vgname (otherwise it's a
error).
Extract default (envvar) vgname only when no position nor optional vgname is
found.
Fixing regression instroduce with patchset started with commit:
1e2420bca8 (2.02.169)
split resize_crypt function in two.
a) Detect proper dm-crypt device type and count new --size
value for cryptsetup resize command.
b) Perform the resize
the bug in LUKS grow/shrink decision in fsadm was
masked due to fact that default LVM2 extent size
was larger than LUKS1 default data offset for dm-crypt
mapping. The new test address this bug.
lvcreate supports a 'conversion' when caching LV.
This normally worked fine, however in case passed LV was
thin-pool's data LV with suffix _tdata we have failed to early.
As the easiest fix looks dropping validation of name when
caching type is select - such name check will happen later
once the VG is opened again and properly detect if the LV
with protected name already exists and can be converted,
or will be rejected as ambigiuous operation requiring user
to specify --type cache | --type cache-pool.
Replaced the confusing device error message "not found (or ignored by
filtering)" by either "not found" or "excluded by a filter".
(Later we should be able to say which filter.)
Left the the liblvm code paths alone.
Fixes the following case with 3PVs and 3 legs "mirror" LV:
# lvcreate -l100%FREE --type mirror -m2 vg3
Insufficient free space for log allocation for logical volume .
Unable to allocate extents for mirror log.
Related: rhbz1269533
Activation lock has a primary purpose to serialize locking of individual
LV in case there is no other protecting mechanism for parallel
execution.
However in the case an activated LV is composed from several other LVs,
noone should be able to manipulate with those LVs as well.
This patch add a very 'naive' global VG activation locking in this case.
In the future we may introduce smarter function detecting minimal closed
graph components if this will appear as bottleneck
Patch checks if the VG Write lock is held - in this case we do not
need any more locking - command has exclusive access to VG.
In case we have clustered VG and we are activating an LV which does not
need other LVs - we also do not need any more locks.
In all other cases take respective lock - for single LV - use lvid,
for complex LVs use vgname.
Creating striped RaidLVs with lv size not divisible by region size
caused the region size to be adjusted:
# lvcreate --type raid5 -n region_check.32.00m_3 -i 3 -L 1g --nosync -R 32.00m raid_sanity
Using default stripesize 64.00 KiB.
Rounding size 1.00 GiB (256 extents) up to stripe boundary size <1.01 GiB(258 extents).
WARNING: New raid5 won't be synchronised. Don't read what you didn't write!
Using reduced mirror region size of 8.00 MiB
Logical volume region_check.32.00m_3 created.
Fix by not imposing "mirror" constraints on "raid".
Resolves: rhbz1404007
vgmerge suffers from a similar problem to the one fixed in commit
8146548d25 ("vgsplit: Fix intermediate
metadata corruption.")
When merging, splitting or renaming VGs, use a new PV status flag
PV_MOVED_VG to mark the PVs that hold metadata with the old VG name and
use this to provide PV-level granularity instead of incorrectly assuming
all PVs in the VG are the same.
Add these for dmeventd systemd unit (dm-event.service):
Before: shutdown.target
Conflicts: shutdown.target
This will cause the dmeventd to be properly stopped at shutdown (after
all the dmeventd clients unregistered their devices from monitoring)
with dm-event.service's stop action (there's no direct action defined
for the "stop" so systemd sends SIGTERM instead).
Before, we let dmeventd to get killed only as part of the very last
SIGTERM/SIGKILL for all the remaining processes late in the shutdown
sequence so we may have missed some logs if dmeventd encountered an
error during its shutdown (logging facilities are already off at this
late time in shutdown sequence).
Ref: https://www.redhat.com/archives/lvm-devel/2017-October/msg00000.html
When dmeventd receives SIGTERM/INT/HUP/QUIT it validates if exit is possible.
If there was any device still monitored, such exit request used to
be ignored/refused. This 'usually' worked reasonably well, however if there
is very short time period between last device is unmonitored and signal
reception - there was possibility such EXIT was ignored, as dmeventd has
not yet got into idle state even commands like 'vgchange -an' has already
finished.
This patch changes logic towards scheduling EXIT to the nearest
point when there is no monitored device.
EXIT is never forgotten.
NOTE: if there is only a single monitored device and someone sends
SIGTERM and later someone uses i.e. 'lvchange --refresh' after
unmonitoring dmeventd will exit and new instance needs to be
started.
If you send a SIGUSR1 (10) to the daemon it will dump all the
threads current stacks to stdout. This will be useful when the
daemon is apparently hung and not processing requests.
eg.
$ sudo kill -10 <daemon pid>
Make sure that any and all code that executes in the main thread is
wrapped with a try/except block to ensure that at the very least
we log when things are going wrong.
Changing the VG of a PV uses the same on-disk mechanism as vgrename.
This relies on recognising both the old and new VG names. Prior to this
patch the vgsplit code incorrectly provided the new VG name twice
instead of the old and new ones. This lead the low-level mechanism not
to recognise the device as already belonging to a VG and so paying no
attention to the location of its existing metadata, sometimes partly
overwriting it and then later trying to read the corrupt metadata and
issuing a checksum error.
In some cases we are seeing where there are no VGs, but the data returned from
lvm shows that the PVs have the following for the VG:
"vg_name":"[unknown]", "vg_uuid":""
The code was only checking for the exitence of the VG name and we called into
the function get_object_path_by_uuid_lvm_id which requires both the VG name and
the LV name to exist (asserts this) which results in the following stack trace:
Traceback (most recent call last):
File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 563, in runner
obj._run()
File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 584, in _run
self.rc = self.f(*self.args)
File "/home/tasleson/lvm2/daemons/lvmdbusd/fetch.py", line 26, in
_main_thread_load
cache_refresh=False)[1]
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 48, in load_pvs
emit_signal, cache_refresh)
File "/home/tasleson/lvm2/daemons/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 40, in
pvs_state_retrieve
p["pv_attr"], p["pv_tags"], p["vg_name"], p["vg_uuid"]))
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 84, in __init__
vg_uuid, vg_name, vg_obj_path_generate)
File "/home/tasleson/lvm2/daemons/lvmdbusd/objectmanager.py", line 318,
in get_object_path_by_uuid_lvm_id
assert uuid
AssertionError
When executing in the main thread, if we encounter an exception we
will bypass the notify_all call on the condition and the calling thread
never wakes up.
@staticmethod
def runner(obj):
# noinspection PyProtectedMember
Exception thrown here
----> obj._run()
So the following code doesn't run, which causes calling thread to hang
with obj.cond:
obj.function_complete = True
obj.cond.notify_all()
Additionally for some unknown reason the stderr is lost.
Best guess is it's something to do with scheduling a python function
into the GLib.idle_add. That made finding issue quite difficult.
There's nothing special about /boot other than it's used during boot.
But when blkdeactivate is called either on all devices or including a
device where the /boot is on top, we should also include this mount
point when doing unmount before deactivation of supported devices.
The new blkdeactivate -r|mdraidoption wait causes blkdeactivate to wait
for any resync/recovery/reshape that is currently in progress before
deactivating the device.
If this option is used, blkdeactivate calls mdadm -W|--wait before
mdadm -S|--stop.
Revert dc50f2f4a0.
We're canonicalizing/escaping the names here and we're reusing the
variable name so the code doesn't need to use extra variables and
further assignments that may confuse us. Let's keep the code simple.
The
local name=(...$name)
is not the same as
local name
name=(...$name)
(I know various code-checking tools fuss about this and recommend
the 2nd way, but let's ignore those tools' nitpicking here please.)
There was a typo in blkdeactivate --dmoption/--lvmoption/mpathoption,
it had missing "s" at the end and it was not recognized properly, only
short names for the options (-d/-l/-m).
When certain cmd def RULE's fail, the error messages can
sometimes be confusing. This expands the error messages
to help clarify why the rule failed, especially in cases
where options are used incorrectly.
In a shared VG, only allow pvmove with a named LV,
so that only PE's used by the LV will be moved.
The LV is then activated exclusively, ensuring that
the PE's being moved are not used from another host.
Previously, pvmove was mistakenly allowed on a full PV.
This won't work when LVs using that PV are active on
other hosts.
Avoid starting test, when test dir has less then 50M of free space.
Better to crash early before letting die machine on weird crash
in OOM cases...
Also show free disk space when test starts
Enable handling of --poolmetadataspare so if user can prevent
creation of _pmspare volume during --repair operation (just
like during actual lvcreate or lvconvert) for pool volumes.
The following commands now pass the device list through a
--select|-S filter before processing:
suspend resume clear wipe_table remove deps status table
In a shared VG, lvconvert must be used to create thin pools
and cache pools, not the lvcreate variants of those commands.
Deny these cases early in lvcreate using the new command defs.
Denying these cases deeper in the code was missing some
cleanup of the partially completed command.
Revert the lvmlockd.c changes from:
commit 0bf836aa14
"tidy: prefer not using else after return"
The commit introduced at least one regression, which broke
lvcreate of a thin pool in a shared VG.
ATM we have several instances of daemonizing code.
Each has its 'special' logic so not completely easy
to unify them all into a single routine.
Start to unify them and use one strategy for rediricting
all input/outpus to /dev/null - use 'dup2' function for this
and open /dev/null before fork to make sure it's available.
When file-locking mode failed on locking, such description was leaked
(typically not an issue since command usually exists afterwards).
So shirt close() at the end of function and use it in all error paths.
Also make sure, when interrrupt is detected, it's really not holding
lock and returns 0.
lvmcache_foreach_mda() can fail for numerous reasons
and failing error code cannot be ignored (out-of-memory...)
TODO: might need more error handling tunning.
gcc warns here about storring 69 bytes in 64 byte array (losing
potentially 4 bytes from 'ls->name').
lvmlockd-core.c:2657:36: warning: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 60 [-Wformat-truncation=]
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~
lvmlockd-core.c:2657:2: note: ‘snprintf’ output between 5 and 69 bytes into a destination of size 64
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replaced with slightly better code - but it still misses error path what
to do if the name would be truncated... - so added FIXME.
Also using all bytes for snprintf() buffer size
(as the size is with \0 included)
After the internal lvmlock LV (holding sanlock leases) is
extended to hold more leases, it needs to be zeroed.
sanlock expects to see either zeroed blocks or blocks
initialized with leases.
currently, lvcreate for sanlock find the free lock offset
from the beginning of the lvmlock every time.
after created thousands of lvs, it will issue thousands of read
ios for lvcreate to find free lock offset.
remeber the last free lock offset will greatly reduce the impact
Signed-off-by: Zhang Huan <zhanghuan@huayun.com>
Fix code checking that the 2nd mda which is at the end of disk really
fits the available free space and avoid any DA and MDA interleaving when
we already have DA preallocated. This mainly applies when we're restoring
a PV from VG backup using pvcreate --restorefile where we may already have
some DA preallocated - this means the PV was in a VG before with already
allocated space from it (the LVs were created). Hence we need to avoid
stepping into DA - the MDA can never ever be inside in such case!
The code responsible for this calculation was already in
_text_pv_add_metadata_area fn, but it had a bug in the calculation where
we subtracted one more sector by mistake and then the code could still
incorrectly allocate the MDA inside existing DA. The patch also renames
the variable in the code so it doesn't confuse us in future.
Also, if the 2nd mda doesn't fit, don't silently continue with just 1
MDA (at the start of the disk). If 2nd mda was requested and we can't
create that due to unavailable space, error out correctly (the patch
also adds a test to shell/pvcreate-operation.sh for this case).
If the PV was originally created with a larger-than-default
metadata area the restored one wasn't and might not even be
large enough to hold the metadata!
Previously the cache remembered an existing bootloaderarea and
reinstated it (without even checking for overlap) when asked to
write out the PV. pvcreate could write out an incorrect layout.
Add the new concise format to dmsetup create, either as a single
command-line parameter or from stdin.
Based on patches submitted by
Enric Balletbo i Serra <enric.balletbo@collabora.com>.
Drop 'DMEVENT' make variable and use BUILD_DMEVENTD like with other daemons.
NOTE: by default we do not build dmeventd - maybe time to change
as lot of build targets basically do need dmeventd...
Switch to use SYSTEMD_LIBS and avoid linking systemd library with
every linked tool when dbus notification are enabled.
(TODO add missing testing for lib presence)
Check first if we need to even link -lrt - since clock functions
are normally emebeded with recent glibc (>=2.17)
Use standard RT_LIBS name.
Avoid duplicate test for realtime clock with lvmlockd
Show better error message when realtime clock support is missing or
disabled.
Link RT_LIBS explicitely with lvmlockd and lvmetad.
Avoid adding -g more then once for debug builds.
Avoid enabling DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
lvm2 warned about zeroing and too big chunksize (>=512KiB), but
only during lvconvert, so lvcreate was creating thin-pools
without any warning about possible slowness of thin provisioning
because of zeroing.
Create a new table output format that concisely shows multiple devices
on one line.
dmsetup table --concise [device...]
<dev_name>,<uuid>,<flags>[,<table>]*[;<dev_name>,<uuid>,<flags>[,<table>]*]*
Table lines are separated by commas.
Devices are separated by semi-colons.
Flags is currently 'ro' or 'rw' (and might be extended in a
yet-to-be-defined way in future).
Any comma, semi-colon or backslash within a field is quoted by a
preceding backslash.
The format can later be supplied as input to dmsetup or even to the
booting kernel as an alternative way to set up devices.
Based on patches submitted by
Enric Balletbo i Serra <enric.balletbo@collabora.com>.
Add an independent command definition for "vgchange --locktype",
and split the implementation out of the set of common metadata
changes. It is unlike normal metadata changes, and can only
be run by itself. (Changing the lock type is similar in
principle to changing the VG name or the VG system ID; it
effects the ability of any host to see or access the VG.)
At some point this command lost the ability to forcibly change
the lock type of a shared VG to "none" (making it a local VG).
This can be necessary to repair shared VGs (e.g. recovery steps
that occur in vg_read are disabled for shared VGs because
they are not locked properly, or recovering sanlock locks
when the PV holding them is lost.)
"vgchange --locktype none --lockopt force VG" is used as the
method of forcing the shared VG to become local so that it
can be repaired.
Since _deactivate_and_remove_lvs() is used in more then one place,
move the needed udev synchronization into this function so other
users automatically get correct fs state before next dm manipulation.
Assumption here is that this udev synchronization 'delay' may also
prevent to 'early' table reloads which might cause kernel problems
for md-core - but we may need more generic time-limited reload
frequency for raid devices.
Note: on udev-less system there will be almost no delay.
Commit 8a912d6dbc missed the wrong logic,
we use 2 vars 'dev' & 'mddev' and their usage can't be mixed.
So correctly separate them so mddev keeps name of MD device.
During test do a more close selection of visible devices.
If some test leaks a device with LVMTEST prefix, next
test should not be influnced (or parallel running one).
If the test is running in non-/dev dir - it's already protected
by full path with $PREFIX in it - however it test is running
in real /dev dir - there was no such protection and test
were confused when they have seen such leaked device.
Commit 9b4b5d449e started to accept
rather way to wide set of strings we do not want to take for size.
i.e. lvresize -t vg0/lvol0 -LNaNM
Restore check for 'digit || locales defined 'dot' | '.'
(as we tend to take '.' even if locales uses ',')
Explictely detect duplicate sing symbols and leave the rest of
double number validation on 'strtod()' function. This way
we can also accept size like:
lvcreate -L.1M
We already accept -L0.1M - but it's common to accept numbers
starting with leading '.' - just as 'strtod()' accepts it).
API for strtod() or strtoul() needs reset of errno, before it's being
called. So add missing resets in missing places and some also some
errno validation for out-of-range numbers.
Since we are reading size as (double) we can get way bigger
number then just plain int64. So to make this check actually
more valid and usable do a maxsize compare in 'double'.
Avoid reading already released memory and do a continue directly.
Invalid read of size 1
at 0x1201B0: main_loop (clvmd.c:931)
by 0x11F640: main (clvmd.c:666)
Address 0x72ddef0 is 32 bytes inside a block of size 224 free'd
at 0x4C30D18: free (vg_replace_malloc.c:530)
by 0x54D6FD1: dm_free_wrapper (dbg_malloc.c:357)
by 0x122E6E: process_work_item (clvmd.c:2034)
by 0x123003: lvm_thread_fn (clvmd.c:2085)
by 0x590A3A8: start_thread (pthread_create.c:465)
by 0x5C3C7FE: clone (in /usr/lib64/libc-2.25.90.so)
Block was alloc'd at
at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
by 0x54D6EF1: dm_malloc_aux (dbg_malloc.c:286)
by 0x54D6F1C: dm_zalloc_aux (dbg_malloc.c:291)
by 0x54D6F96: dm_zalloc_wrapper (dbg_malloc.c:345)
by 0x11F89C: local_rendezvous_callback (clvmd.c:731)
by 0x1203D2: main_loop (clvmd.c:964)
by 0x11F640: main (clvmd.c:666)
Initialize mutex upfront any debugging and fix this report:
Mutex reinitialization: mutex 0x485d20, recursion count 0, owner 1.
at 0x4C38480: pthread_mutex_init_intercept (drd_pthread_intercepts.c:821)
by 0x4C38480: pthread_mutex_init (drd_pthread_intercepts.c:830)
by 0x11F359: main (clvmd.c:562)
mutex 0x485d20 was first observed at:
at 0x4C38F63: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:885)
by 0x4C38F63: pthread_mutex_lock (drd_pthread_intercepts.c:898)
by 0x11E920: debuglog (clvmd.c:254)
by 0x11F1D8: main (clvmd.c:527)
Switch from warn to log_error since this generated
failing return code for command so printing log_error()
is mandatory.
Happens with i.e. pvscan --cache meets crashing lvmetad.
Commit 34504855a7 introduced
flag LV_RESHAPE_DATA_OFFSET and used it to avoid incompatible
activation on older runtime.
Enhance vg_validate() raid checking functions with checks for it.
Patch 72a58ce4b0 was wronly placing
double quotes around this variable which we want to pass expanded,
as it's just set of 'space' device args ATM.
TODO consider using array[@] to make this cleaner.
Add shellcheck directive to skip warning here
Changes:
- BASH_SOURCE index was one off.
- The first line of stacktrace was pure confusion displaying executed
script together with innermost line number (which was either 125 when
STACKTRACE or 229 when skip was called.)
- We can safely ignore innermost call, as stack trace is always produced
by stacktrace function.
- It is safer to test for array length, instead of testing FUNCNAME is
main - if main function were introduced.
- Bashishm is safe to use as this function as a whole is relying on bash.
In order to reject out of place reshaping with segment data_offset
field on old runtime, add a respective segment type incompatibility
flag causing "+RESHAPE_DATA_OFFSET" to be suffixed to the segment
type name.
When reshape space is allocated anew, an update and reload is needed to
promote the new size to the cluster node with the exclusively active RaidLV
or reloading the RaidLV will fail with a size related error. Additionally,
store "data_offset <sectors>" with the RaidLV in the lvm2 metadata so that
it can be retrieved on cluster nodes.
Process allocation of reshape space on a 2-legged raid4/5 (interim layout
to convert from/to linear via raid1) properly in the cluster.
Resolves: rhbz1461562
Resolves: rhbz1448116
Use 1 logic for 2 loops tearing down left device.
First loops tries to remove all closed devices with 'normal' remove.
Second loop tries to replace those left devices with 'error' target.
We can't really sleep that much in teardown as it slows test too much.
So do a nested loop (similar to 'dmsetup remove_all') and keep
removing devices with open count == 0 as long as it works.
When we want to squash as much device as possible,
it's better to give it some delay, so devices have
some time to release it's resouces for next removal.
Also drop surrounding cookie processing and let each
dmsetup call run on its own.
Add missing get_devs.
When $7 is not given use empty string.
See if we can live with less RAM disk for PVs.
Drop limitation on single core as presence 1.12 should address this.
The checking order here has happend after TESTDIR was removed
resulting in weird further error on trap path.
Properly check for unexpected dmeventd before removing TESTDIR
since 'trap' codepath is still using it.
Also try to kill this unexpected dmeventd so testing is
not skipping all next dmeventd tests.
(Downside would be - if user would be accidentally starting
dmeventd by some regular system admin work - such dmeventd
might be killd if it's unused. It can't kill dmeventd in-use.
Fix mirror_images_on() to actually report something useful (thought
it might be tuned later).
So for now the function got through all '_mimages_' and compares
where the order of them is matching given list of devices.
Possible misspelling: FAILED_MIXED_STR may not be assigned, but FAIL_MIXED_STR is.
Possible misspelling: FAILED_MULTI_STR may not be assigned, but FAIL_MULTI_STR is.
Possible misspelling: FAILED_BLACK_STR may not be assigned, but FAIL_BLACK_STR is.
The blkid we call in 13-dm-disk.rules also returns identifiers for
partitions based on which the /dev/disk/by-part{uuid,label} and
gpt-auto-root symlinks should be created in the same manner as we
already create symlinks for filesystem labels and uuids.
This is because we handle blkid calls and symlink creation under
/dev/disk ourselves in our 13-dm-disk.rules for device-mapper devices
for us to have more control over this process.
See also https://lists.freedesktop.org/archives/systemd-devel/2017-July/039220.html
and original report http://tracker.ceph.com/issues/19489 for
the exact case where these symlinks were missing.
Fix the error messages when an unrecognized command is
run from a script. We shouldn't attempt to parse options
for an unrecognized command name, which causes misleading
errors about bad options, but rather exit right when we
know the command name is not valid. Also don't complain
about exiting without an error message when running a
script if no command didn't exist.
1. dm_uuid is 68 byte length, but buf is 64 which
will cause miss match uuid from lv lock manager
2. no lv lock_type path in dm config, use lock_args instead
Signed-off-by: Zhang Huan <zhanghuan@chinac.com>
If the activation step in lvcreate fails (e.g. the specified
minor number is already used), then the lvcreate is reverted,
but the LV lock in lvmlockd was not being unlocked or properly
freed.
Some lvconvert commands can be used directly on the data sublv:
lvconvert ... vg/pool_tdata
The correct LV lock to use in lvmlockd is the one on the pool LV.
Centralise editing of the client list into _add_client() and
_del_client(). Introduce _local_client_count to track the size of the
list for debugging purposes. Simplify and standardise the various ways
the list gets walked.
While processing one element of the list in main_loop(),
cleanup_zombie() may be called and remove a different element, so make
sure main_loop() refreshes its list state on return. Prior to this
patch, the list edits for clients disappearing could race against the
list edits for new clients connecting and corrupt the list and cause a
variety of segfaults.
An easy way to trigger such failures was by repeatedly running shell
commands such as:
lvs &; lvs &; lvs &;...;killall -9 lvs; lvs &; lvs &;...
Situations that occasionally lead to the failures can be spotted by
looking for 'EOF' with 'inprogress=1' in the clvmd debug logs.
Correctly skip the test when lvmdbusd is found already running.
For pgrep usage we need to add '-f -l' options to get python3 name
printed.
Remove no longer used 'pids' local var.
With commit 41c10034aa we actually
do require LV to be used with _vg_write_lv_suspend_commit_backup().
So write a proper separte single wrapper for write && commit && backup.
lvm_run needs to place NULL as the last element into argv[].
Otherwise we get:
Conditional jump or move depends on uninitialised value(s)
_command_required_pos_matches (lvmcmdline.c:1443)
_find_command (lvmcmdline.c:1610)
lvm_run_command (lvmcmdline.c:2770)
lvm2_run (lvmcmdlib.c:91)
Dmeventd reuses 'dm_task' struct for some STATUS operation, but due to
missing reinitization of dm_task target list, it has caused misprocesing
of recieved events as the parsed target has been simply added to the
list of existing status and cause multiple actions being called for
single event.
Since we discovered status reporting from 'md' goes from large set
of weird states we can't just decided based on this word.
So let it pass for rebuild and idle as well
and check for health devices afterwards.
Add function to adjust printing of percent values in better way.
Rounding here is going along following rules:
0% & 100% are always clearly reported with .0 decimal points.
Values slightly above 0% we make sure a nearest bigger
non zero value with given precission is printed
(i.e. 0.01 for %.2f will be shown)
For values closely approaching 100% we again detect and adjust value
that is less then 100 when printed.
(i.e. 99.99 for %.2f will be shown).
For other values we are leaving them with standard rounding mechanism
since we care mainly about corner values 0 & 100 which need to be
printed precisely.
When we want to report primary leg failure, check for intial 'a',
since otherwice 'Aa idle' is normally visible.
Also reset array of bit flags marking dead devices, once
plugin detects raid is in sync.
Functionality of ignore suspend devices is already granted by:
lvm2_disable_dmeventd_monitoring() -> init_run_by_dmeventd() ->
init_ignore_suspended_devices().
In fact plugins should never use --config because it has
some unpleasant technical issues.
When raid leg rimage device is marked as 'D'ead by mdcore,
lvm2 was not able to replace such device with allocate policy,
as device has not appared as missing.
Add detection of transiently failing devices.
Basically reverting commit 58a9f88b8c.
We can use origin_only in case we are snapshot's origin,
as we do support this stack.
So when we are 'uncaching' origin+snaps - we do need to reload only
origin and we do not need to play with snaps.
Handle change of 'region size' better and follow also standard rule
if the command can't success (i.e. size is already same) we return
error for all such cases.
Also log_pring more info about adjusted value (just like we
do for rounding)
Also avoid keep pointers on 'display_*' values - they are in
ringbuffer for immediate use - not to be kept across multiple calls
(as they could be already overwritten by later calls) - so dropped
seg_region_size_str
'lvdisplay -m' tried to go through NULL policy settings,
when such policy was not defined for CachedLV.
Patch is fixing display of cache-pool without defined settings,
as this is now a valid pool and we mostly want users to define
these settings when actually really caching a LV.
Since cache LV can be a stacked device, there is no real reason
trying to use slight optimised tree for origin_only cache reload
(it could be even wrongly implemented in this case).
We can easily go with stardard tree load here.
When user runs command like 'lvconvert --splitcache' the operation
might be actually either slow or not making any progress in kernel,
so lets give user a chance to abort such operation.
When user press 'Ctrl+C' device table is restored to pre-flushing state.
Remove explicit activation of SubLVs and let lv_update_and_reload()
perform the proper (pre-)loading sequencing of tables.
This avoids related callback functions which are removed.
Related: rhbz1448116
Related: rhbz1461526
Related: rhbz1448123
When lock-holding LV differs from actually request locked LV,
we drop origin_only flag as it has no use - it'd be applied
on completely different LV.
Example of problem:
Raid is thin-pool _tdata LV.
Raid run origin_only locking on stacked device.
As lock holder is discovered thinLV.
Whole origin_only operation is then applied only on thinLV
changing the meaning of whole operation.
NOTE: this patch does not change anything for LV that are
already top-level lock holding LVs (i.e. thinLVs, snahoshots/origins).
Disable until we have a proper fix for reshape space allocation,
switching it to begin/end of rimages and activation in the cluster.
Related: rhbz1448116
Related: rhbz1461526
Related: rhbz1448123
Commit 1fe4f80e45 in current version
introduced regression for a terminal user, as he could not enter 'n'
as answer. Add missing break for this case (No whats_new).
New tests to add checking for '100%' in-sync at start of "recover"
process (it shouldn't happen, but I've seen it before). Also,
check status over the whole cycle of various sync processes ("resync"
and "recover").
Enhance reporting code, so it does not need to do 'extra' ioctl to
get 'status' of normal raid and provide percentage directly.
When we have 'merging' snapshot into raid origin, we still need to get
this secondary number with extra status call - however, since 'raid'
is always a single segment LV - we may skip 'copy_percent' call as
we directly know the percent and also with better precision.
NOTE: for mirror we still base reported number on the percetage of
transferred extents which might get quite imprecisse if big size
of extent is used while volume itself is smaller as reporting jump
steps are much bigger the actual reported number provides.
2nd.NOTE: raid lvs line report already requires quite a few extra status
calls for the same device - but fix will be need slight code improval.
Current existing kernels reports status sometimes in weird form.
Instead of showing what is the exact progress, we need to estimate
this in-sync state from several surrounding states.
Main reason here is to never report 100% sync state for a raid device
which will be undergoing i.e. recovery.
Relative to last comit ddf2a1d656:
adjust the dm-raid target version to 1.12.0 which shows
mandatory kernel MD deadlock fixes related to reshaping
are presant in the kernel.
Related: rhbz1443999
First test in this file checks whether 'aa' is ever spotted during
a "recover" operation (it should not be). More tests should follow
in this file to look for oddities in status output - especially as
it relates to the sync_ratio, dev_health, and sync_action fields.
For the test clean-up, I was providing too many devices to the first
command - possibly allowing it to allocate in the wrong place. I was
also not providing a device for the second command - virtually ensuring
the test was not performing correctly at times.
This patch ensures that under normal conditions (i.e. not during repair
operations) that users are prevented from removing devices that would
cause data loss.
When a RAID1 is undergoing its initial sync, it is ok to remove all but
one of the images because they have all existed since creation and
contain all the data written since the array was created. OTOH, if the
RAID1 was created as a result of an up-convert from linear, it is very
important not to let the user remove the primary image (the source of
all the data). They should be allowed to remove any devices they want
and as many as they want as long as one original (primary) device is left
during a "recover" (aka up-convert).
This fixes bug 1461187 and includes the necessary regression tests.
Add the checks necessary to distiguish the state of a RAID when the primary
source for syncing fails during the "recover" process.
It has been possible to hit this condition before (like when converting from
2-way RAID1 to 3-way and having the first two devices die during the "recover"
process). However, this condition is now more likely since we treat linear ->
RAID1 conversions as "recover" now - so it is especially important we cleanly
handle this condition.
Previously, we were treating non-RAID to RAID up-converts as a "resync"
operation. (The most common example being 'linear -> RAID1'.) RAID to
RAID up-converts or rebuilds of specific RAID images are properly treated
as a "recover" operation.
Since we were treating some up-convert operations as "resync", it was
possible to have scenarios where data corruption or data loss were
possibilities if the RAID hadn't been able to sync completely before a
loss of the primary source devices. In order to ensure that the user took
the proper precautions in such scenarios, we required a '--force' option
to be present. Unfortuneately, the force option was rendered useless
because there was no way to distiguish the failure state of a potentially
destructive repair from a nominal one - making the '--force' option a
requirement for any RAID1 repair!
We now treat non-RAID to RAID up-converts properly as "recover" operations.
This eliminates the scenarios that can potentially cause data loss or
data corruption; and this eliminates the need for the '--force' requirement.
This patch removes the requirement to specify '--force' for RAID repairs.
Two of the sync actions performed by the kernel (aka MD runtime) are
"resync" and "recover". The "resync" refers to when an entirely new array
is going through the process of initializing (or resynchronizing after an
unexpected shutdown). The "recover" is the process of initializing a new
member device to the array. So, a brand new array with all new devices
will undergo "resync". An array with replaced or added sub-LVs will undergo
"recover".
These two states are treated very differently when failures happen. If any
device is lost or replaced while "resync", there are no worries. This is
because any writes created from the inception of the array have occurred to
all the devices and can be safely recovered. Even though non-initialized
portions will still be resync'ed with uninitialized data, it is ok. However,
if a pre-existing device is lost (aka, the original linear device in a
linear -> raid1 convert) during a "recover", data loss can be the result.
Thus, writes are errored by the kernel and recovery is halted. The failed
device must be restored or removed. This is the correct behavior.
Unfortunately, we were treating an up-convert from linear as a "resync"
when we should have been treating it as a "recover". This patch
removes the special case for linear upconvert. It allows each new image
sub-LV to be marked with a rebuild flag and treats the array as 'in-sync'.
This has the correct effect of causing the upconvert to be treated as a
"recover" rather than a "resync". There is no need to flag these two states
differently in LVM metadata, because they are already considered differently
by the kernel RAID metadata. (Any activation/deactivation will properly
resume the "recover" process and not a "resync" process.)
We make this behavior change based on the presense of dm-raid target
version 1.9.0+.
On conversion from raid10 to raid0 (takeover), all rmeta
devices and the rimage devices of mirrored stripes are
detached from the raid10 LV. The remaining rimage areas
are being shifted down into the slots of the detached
ones hence requiring renames to show proper _N suffix
sequences (e.g. 0,1,2,3 instead of 0,2,4,6). Only the
top-level raid10 LV has a cluster lock, not the detached
SubLVs thus their deactivation is impossible and e.g the
rename from *_rimage_6 to *_rimage_3 will fail. Fix by
activating exclusively before deactivating and removing.
Resolves: rhbz1448123
The file block count stored in the filemap_monitor was lazily
initialised at the time of the first check. This causes problems
in the case that the file has been truncated between this time and
the time the daemon started: the initial block count and current
block count match and the daemon fails to detect a change.
Separate the setting of the block count from the check and make a
call to update the value at the start of _dmfilemapd().
It's not an error to attempt to update regions from an fd that has
been truncated (or otherwise no longer has any allocated extents):
in this case, the call should remove all regions corresponding to
the group, and return an empty region table.
For proper usage of Cache kernel metadata format V2,
new cache_check tool is basically mandatory.
Print warning during configure time about this problem.
Prohibit activation of reshaping RaidLVs on incompatible
lvm2 runtime by storing e.g. 'raid5+RESHAPE' segment type
strings in the lvm2 metadata. Incompatible runtime not
supporting reshaping won't be able to activate those thus
avoiding potential data corruption.
Any new non-reshaping lvconvert command will reset the
segment type string from 'raid5+RESHAPE' to 'raid5'.
See commits
0299a7af1e and
4141409eb0
for segtype flag support.
When old snapshot is merged, lvm2 still can report some data about
merged 'snapshot' - i.e. it occupied space in VG.
This patch fixes regression from commit:
6fd20be629
and resolved RHBZ: 1460161
Avoid reporting 'checking result' as maybe - it should
clearly tell 'yes' or 'no'.
Just shuffle printed message to the place, where we
already know the 'maybe' answer.
So instead of printing 'unclear':
checking whether to enable libblkid detection of signatures when wiping... maybe
checking for BLKID... yes
checking whether to use udev-systemd protocol for jobs in background... maybe
checking for SYSTEMD... yes
show this:
checking for BLKID... yes
checking whether to enable libblkid detection of signatures when wiping... yes
checking for SYSTEMD... yes
checking whether to use udev-systemd protocol for jobs in background... yes
Code path missed validation of lvcreate --cachepool argument.
If the non cache-pool LV was passed in, code has still continued
further work and failed later on internal error. Validate this
condition at right place now.
When a combination of thin-pool chunk size and thin-pool data size
goes beyond addressable limit, such volume creation is directly
prohibited.
Maximum usable thin-pool size is calculated with use of maximal support
metadata size (even when it's created smaller) and given chunk-size.
If the value data size is found to be too big, the command reports
error and operation fails.
Previously thin-pool was created however lots of thin-pool data LV was
not usable and this space in VG has been wasted.
Only support RAID conversions on active LVs.
If we'd accept e.g. upconverting linear -> raid1 on inactive
linear LVs, any LV flags passed to the kernel aren't properly
cleared thus errouneously passing them on every activation.
Add respective check to lv_raid_change_image_count() and
move existing one in lv_raid_convert() for better messages.
If during the process of fetching current lvm state we experience an
exception we fail to call set_result on the queued_requests we were
processing. When this happens those threads block forever which causes
the service to stall infinitely. Only clear the queued_requests after
we have called set_result.
We were not adding background tasks to flight recorder. Add the meta
data to the flight recorder when we start the command and update the meta
data when the command is finished. Locking was added to meta data to
prevent concurrent update and returning string representation as these can
happen in two different threads.
vgreduce previously allowed --all and --removemissing together even though
it only actual did the remove missing. The lvm dbus daemon was passing
--all anytime there was no entries in pv_object_paths. This change supplies
--all if and only if we are not removing missing and the pv_object_paths
is empty.
Vgreduce has and continues to enforce the invalid combination of supplying a
device list when you specify --all or --removemissing so we do not need
to check for that invalid combination explicitly in the lvm dbus service as
it's already covered.
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1455471
Warn about a PV that has the in-use flag set, but appears in
the orphan VG (no VG was found referencing it.)
There are a number of conditions that could lead to this:
. The PV was created with no mdas and is used in a VG with
other PVs (with metadata) that have not yet appeared on
the system. So, no VG metadata is found by lvm which
references the in-use PV with no mdas.
. vgremove could have failed after clearing mdas but
before clearing the in-use flag. In this case, the
in-use flag needs to be manually cleared on the PV.
. The PV may have damanged/unrecognized VG metadata
that lvm could not read.
. The PV may have no mdas, and the PVs with the metadata
may have damaged/unrecognized metadata.
A PV holding VG metadata that lvm can't understand
(e.g. damaged, checksum error, unrecognized flag)
will appear as an in-use orphan, and will be cleared
by this repair code. Disable this repair until the
code can keep track of these problematic PVs, and
distinguish them from actual in-use orphans.
Reject any stripe adding/removing reshape on raid4/5/6/10 because
of related MD kernel deadlock on single core systems until
we get a proper fix in MD.
Related: rhbz1443999
Since lvmetad is using 'MISSING' in status for 'another' purpose,
we need to support ATM also flag get from this place.
Until fixed better - we accept both flags - alhough lvm2 will
only print in flags.
Switch METADATA_FORMAT flag usage to be stored via segtype
instead of 'status' flag which appeared to cause major
incompatibility troubles.
For backward compatiblity segtype flags are still accepted also
via 'status' bits which were used from version 2.02.169 so metadata
saved by this newer lvm2 version should still work nicely, although
new save version will no longer work on this older lvm2 version.
Allow storing LV status bits with segment type name field.
Switching to this since this field has better support for compatibility
with older version of lvm2 - since such unknown segtype will not cause
complete invisiblity of metadata from older lvm2 code - just the
particular LV will become unusable with unknown type of segment.
- Must reread all objects as PVs might be removed.
- Never consider testsuite provided PVs nested, or tearDown fails to
remove any outstanding VGs on them.
When 'fsadm' was running without terminal (i.e. pipe), it's been
automatically working like in '--yes'.
Detect terminal and only accept empty "" input in this mode.
Add more validation to catch mainly renamed devices, where
filesystem utils are not able to handle devices properly,
as they are not addressed by major:minor by rather via some
symbolic path names which can change over time via rename operation.
Since we add more validation to 'detect_mounted' function make sure
we always use it even with 'resize' action, so numerous validations
are not skipped.
Commit 5fe07d3574 failed to set raid5 types
properly on conversions from raid6. It always enforced raid6_ls_6
for types raid6/raid6_zr/raid6_nr/raid6_nc, thus requiring 3 conversions
instead of 2 when asking for raid5_{la,rs,ra,n}.
Related: rhbz1439403
Offer possible interim LV types and display their aliases
(e.g. raid5 and raid5_ls) for all conversions between
striped and any raid LVs in case user requests a type
not suitable to direct conversion.
E.g. running "lvconvert --type raid5 LV" on a striped
LV will replace raid5 aka raid5_ls (rotating parity)
with raid5_n (dedicated parity on last image).
User is asked to repeat the lvconvert command to get to the
requested LV type (raid5 aka raid5_ls in this example)
when such replacement occurs.
Resolves: rhbz1439403
Add an exception to not allowing lvchange to change properties
on hidden LVs. When a thin pool data LV is a cache LV, we
need to allow changing cache properties on the tdata sublv of
the thin pool.
Trap cases where the percentage calculation currently leads to an empty
LV and the message:
Internal error: Unable to create new logical volume with no extents
Additionally convert the calculated number of extents from physical to
logical when creating a mirror using a percentage that is based on
Physical Extents. Otherwise a command like 'lvcreate -m3 -l80%FREE'
can never leave any free space.
This brings the behaviour closer to that of lvresize.
(A further patch is needed to cover all the raid types.)
_check_reappeared_pv() incorrectly clears the MISSING_PV flags of
PVs with unknown devices.
While one caller avoids passing such PVs into the function, the other
doesn't. Move the check inside the function so it's not forgotten.
Without this patch, if the normal VG reading code tries to repair
inconsistent metadata while there is an unknown PV, it incorrectly
considers the missing PVs no longer to be missing and produces
incorrect 'pvs' output omitting the missing PV, for example.
Easy reproducer:
Create a VG with 3 PVs pv1, pv2, pv3.
Hide pv2.
Run vgreduce --removemissing.
Reinstate the hidden PV pv2 and at the same time hide a different PV
pv3.
Run 'pvs' - incorrect output.
Run 'pvs' again - correct output.
See https://bugzilla.redhat.com/1434054
There are certain situations (not fully understood)
where is_missing_pv() is false, but pv->dev is NULL,
so this adds a check for NULL pv->dev after is_missing_pv()
to avoid a segfault.
With recent updates for thin pool monitoring in version 169
we lost multiple WARNINGs to be printed in syslog, when
pool crossed 80%, 85%, 90%, 95%, 100%.
Restore this logic as we want to keep user informed more
then just once when 80% boundary is passed.
Fix a regression introduced in 70bb726 that allows a local variable
in the monitored file checking routine to be accessed before its
assignment when the file has already been unlinked.
When a user does a Manager.PvCreate they can specify the block device using a
device path that may be different than what lvm reports is the device path. For
example a user could use:
/dev/disk/by-id/wwn-0x5002538500000000 instead of /dev/sdc
In this case the pvcreate will succeed, but when we query lvm we don't find the
newly created PV. We fail because it's device path is returned as /dev/sdc. This
change re-uses an internal lookup which can accommodate this and correctly find
the newly created PV.
Corrects https://bugzilla.redhat.com/show_bug.cgi?id=1445654
There were a handful of references to other man
pages using the standard command(N) form which were
not in bold, so they were not turned into links
in html formats.
Many will look in 'man lvm.conf' expecting to find a
description of the lvm.conf fields, which are not there.
State at the beginning how to get this (by running
lvmconfig.)
The lvm(8) man page never said what LVM is,
it never defined what the acronym LVM stands for,
it never spelled out other common acronyms VG, PV, LV,
and never described what they are.
This adds a very minimal definition which at least defines
the acronyms and entities, but it could obviously be
expanded on, either here or elsewhere.
Clean up the handling of memory used for cmd defs
so it doesn't trip up memory debugging.
Allocate memory for commands[] from libmem.
Free temporary memory used by define_commands()
at the end of the function.
Clear all the command def state in in lvm_fin().
Using any arg with a command name in a script file
would cause the command to fail.
The name of the script file being executed was being passed
to lvm_register_commands() and define_commands(), which
prevented command defs from being defined (simple commands
were still being defined only by name which was enough for those
to still work when run trivially with no args).
The logic for suggesting the nearest valid command syntax
was missing the simplest case. If a command has only one
valid syntax, that is the one we should suggest. (We were
suggesting nothing in this case.)
If the device size does not match the size requested
by --setphysicalvolumesize, then prompt the user.
Make the pvcreate checking/prompting code handle
multiple prompts for the same device, since the
new prompt can be in addition to the existing
prompt when the PV is in a VG.
Adding qualifier makes the only unqualified log_debug occurence
consistent with other uses in the same file.
Other possible ways to fix this:
- using `from .utils import log_debug`
- moving the line below `from . import utils` line
Unless a change of the regionsize is requested via "lvconvert -R N ...",
keep the region size when the number of images changes in a raid1 LV.
Related: rhbz1443705
Unless a change of the regionsize is requested via "lvconvert -R N ...",
keep the region size when the number of images changes in a raid1 LV.
Resolves: rhbz1443705
lvconvert parameters not causing a conversion (i.e. no type,
number of stripes, stripesize or regionsize changes) will
remove any allocated reshape space in which case the command
returns success. If reshape space does not exist though,
return error.
When metadata LV size was over DM_THIN_MAX_METADATA_SIZE sectors,
the info() routine was incorrectly trying to match bigger size,
while we do never pass any bigger device.
Fixing a case, where lvs should be displaying status for metadata
LV with 16GB size.
When a cmd def RULE fails because of a disallowed
combination of options, improve the error message
to show the option combination, not just the options
that broke the rule.
Reshape check failed when regionsize changed and current raid type
was provided with no other change requested (stripes or stripesize).
E.g. "lvconvert --type raid6 --regionsize 256K" on a raid6 LV
with != 256K regionsize.
Enable --type in test script.
Remove any newly allocated sub LV (pair) remnants in case
allocation fails due to lag of (parallel) free PV space
and keep initial raid type.
Resolves: rhbz1438013
A command def can include a specific constant option value,
but the value was not being checked for optional opts.
e.g. this is an incorrect command and does not match any
command defs:
lvconvert --type cache --cachepool vg/lv
However, it was mistakely being matched to this cmd def,
where the required options match, but the optional options
do not:
lonvert --cachepool LV_linear_striped_raid_cachepool
OO: --type cache-pool, ...
The optional options were mistakely considered matching
because 'cache' and 'cache-pool' were not being compared.
SIGINT isn't blocked properly after a sigint_allow(),
sigint_restore() cycle leading to illicit interruptable
metadata updates. These can leave corrupted metadata behind.
Issues addressed in this commit:
sigint_allow() fails to set _oldmasked[] members properly due
to an offset by one bug on indexing the members of the array.
It bails out prematurely comparing to MAX_SIGINTS causing nesting
depths to be one less than MAX_SIGINTS. Fix the comparision.
Correct the related comparison flaw in sigint_restore().
Initialize all sig_atomic_t variables consequently.
Resolves: rhbz1440766
Avoid error message
"Logical Volume *_rimage_0 already exists in volume group,,,"
on takeover conversion from a 2-legged raid1 to raid4
(aiming to reshape it adding images).
Resolves: rhbz1439398
Requesting _raid_remove_images() to commit the
metadata missed to reload the origin causing a
kernel takeover error converting a 2-legged raid1
(with previously removed images) to raid5.
Allow the combination of both arguments keeping
the raid level but changing the regionssize
(e.g. "lvconvert --type raid1 --regionsize 1M RaidLV"
on an existing raid1 LV).
Resolves: rhbz1438396
lvchange/lvconvert operations that do not require the LV
to already be active need to acquire a transient LV lock
before changing/converting the LV to ensure the LV is not
active on another host. If the LV is already active,
a persistent LV lock already exists on the host and the
transient LV lock request is a no-op.
Some lvmlockd locks in lvchange were lost in the cmd def
changes. The lvmlockd locks in lvconvert seem to have
been missed from the start.
We have to unset the LoadState variable from previous use when we check
for systemd unit state. We use this variable to check if systemd services
are loaded or not and if it is loaded, we issue systemctl commands to
enable/disable and start/stop the service. We don't issue these commands
if the unit is not loaded to avoid error messages which may confuse users.
The actual value specified by the --poll y|n option was not
being used. The way the --poll value is used is hidden
through an indirection where the value is stored in a
global variable at the start of the command, and then the
value is read from there later. Setting the global
variable early in the command had been lost with the
cmd def changes.
Fill in some gaps where old versions of lvm allowed
--poll and --monitor in combination with other operations,
but those combinations had been lost since the cmd def work.
(The new cmd def code also added some combinations that
had been missed by the old code.)
Changes:
lvchange --activate: add poll and monitor options, and
add calls to them in implementation.
lvchange --refresh: add monitor option (poll already there),
and call to monitor in implementation.
lvchange <metadata ops>: add poll and monitor options, and
add calls to them in implementation.
vgchange <metadata ops>: add poll option (call to poll
already in implementation).
vgchange --refresh: remove monitor option (not used by code)
lvchange --persistent y: add poll and monitor options, and
add calls to them, and to activate
in the implementation. (Making it
match the main lvchange metadata
command.)
Summary of current usage:
lvchange --activate: monitor, poll
vgchange --activate: monitor, poll
lvchange --refresh: monitor, poll
vgchange --refresh: poll
lvchange --monitor: ok
lvchange --poll: ok
lvchange --monitor --poll: ok
vgchange --monitor: ok
vgchange --poll: ok
vgchange --monitor --poll: ok
lvchange <metadata ops>: monitor, poll
vgchange <metadata ops>: poll
Enhance commit 25b5915c9b
to process options requiring immediate metadata commits
and reloads after those we can group together doing just
one commit and an optional reload for the whole group.
Backup metadata after processing options successfully.
Related: rhbz1437611
Don't abbreviate the --help output quite as much
when there are many command defs. Print all the
options in the cmd defs that are shown. --longhelp
output is unchanged and includes everything.
These days --partial is only used with activation in
lvchange/vgchange. It probably had another meaning
at some point in history which is no longer used,
so ignore it in those cases.
When included in the OO list, the option is advertised in
help/man output, implying it is meaningful to the command,
when in fact the command never uses it.
The IO list means the option won't cause an error if it's
used, but is not displayed as an valid option for the command.
If the option is not included in either OO or IO lists,
using the option would cause a command error, which would cause
problems for anyone is using the option for historical reasons.
_lvchange_properties_single() processes multiple command
line arguments in a loop causing metadata updates and/or
backups per argument.
Optimize to only perform one update and/or backup
(but necessary interim ones; e.g. for --resync)
per command run.
Related: rhbz1437611
Use only selected paths for finding .so in builddir.
So if builddir constains some embeded subdirs with some more
occurences of project (i.e. 'make rpm' build subdir)
those library paths will not get into list.
With monolithic kernels we can't actually modprobe
for cache modules as they are already compiled-in
and policy modules do not export version symbol.
Reported issue on list:
https://www.redhat.com/archives/dm-devel/2017-March/msg00061.html
Fix will try to look for explicit kernel symbols first before
calling modprobe.
Fix a regression introduced by new command line processing
(see starting commit 1e2420bca8)
causing activation to be rejected in combination with setting
persistent minor device numbers.
Resolves: rhbz1437611
Since _filemap_monitor_set_notify() is only called after daemon
start up it should not use the _early_log() macros. Instead, use
log_sys_error to log errors from the two syscalls in the function.
The fallback branch in _stats_update_file() is redundant (since the
branch taken when the daemon starts successfully must jump to the
'out' label anyway): remove it and re-order the conditions to
improve readability.
Use log_sys_error rather than log_error if execvp() fails:
/mnt/redhat/xdoio.13752.XIORQ: Created new group with 2 region(s) as group ID 0.
# execvp() failed.
vs:
/var/lib/libvirt/images/rhel7-vm1.qcow2: Created new group with 884 region(s) as group ID 0.
dmfilemapd: execvp failed: No such file or directory
The function _filemap_monitor_check_file_unlinked() attempts to
test whether a fd value should be closed by comparison to zero:
if ((fd > 0) && close(fd))
log_error("Error closing fd %d", fd);
The test should be '>=' since 0 is a valid file descriptor.
Similar to 40fb91a, but for the file descriptor opened using the
link path reported by /proc/<self>/fd/<fd>.
The daemon opens a new file descriptor from /proc/<self>/fd when
checking for an unlinked file with mode=inode. Ensure that it is
always closed even if the same file test fails.
The daemon opens a new file descriptor from fm->path when checking
for an unlinked file with mode=inode. Ensure that it is always
closed even if the same file test fails.
The path argument check in dmfilemapd incorrectly tests for:
if (argv[0] == '/')
Rather than testing the 1st character in the string pointed to by
argv[0].
Removing some unused new lines and changing some incorrect "can't
release until this is fixed" comments. Rename license.txt to make
it clear its merely an included file, not itself a licence.
This patch fixed lvm2 compilation running on x32 arch.
(Using 64bit x86 cpu features but running on 32b address space,
so consuming less mem in VM).
On x32 arch 'time_t' is 64bit while 'long' is 32bit.
Commits a29bb6a14b
... 5c199d99f4
narrowed down on addressing the escaping of hyphens
in the dynamic creation of manuals whilst avoiding
them in creating help texts. This lead to a sequence
of slipping through hyphens adrressed by additional
patches in aforementioned commit series.
On the other hand, postprocessing dynamically man-generator
created and statically provided manuals catches all hyphens
in need of escaping.
Changes:
- revert the above commits whilst keeping man-generator
streamlining and the detection of any '\' when generating
help texts in order to avoid escapes to slip in
- Dynamically escape hyphens in manaual pages using sed(1)
in the respective Makefile targets
- remove any manually added escaping on hyphens from any
static manual sources or headers
raid1 doesn't allow to set all images to writemostly because at
least one image is required to receive any written data immediately.
The dm-raid target will detect such invalid request and
fail it iwith a kernel error message.
Reject such request in uspace displaying a respective error message.
The recent command definitions commit took the command
name from argv[0] without applying basename to the value,
so a pathname, e.g. /usr/sbin, would cause lvm to not
recognize the command name.
This commit supersedes reverted 1e4462dbfb
to avoid changes to liblvm and the libdm API completely.
The libdevmapper interface compares existing table line retrieved from
the kernel to new table line created to decide if it can suppress a reload.
Any difference between input and output of the table line is taken to be a
change thus causing a table reload.
The dm-raid target started to misorder the raid parameters (e.g. 'raid10_copies')
starting with dm-raid target version 1.9.0 up to (excluding) 1.11.0. This causes
runtime failures (limited to raid10 as of tests) and needs to be reversed to allow
e.g. old lvm2 uspace to run properly.
Check for the aforementioned version range in libdm and adjust creation of the table line
to the respective (mis)ordered sequence inside and correct order outside the range
(as described for the raid target in the kernels Documentation/device-mapper/dm-raid.txt).
This reverts commit 1e4462dbfb
in favour of an enhanced solution avoiding changes in liblvm
completetly by checking the target versions in libdm and emitting
the respective parameter lines.
Fixes command defs related to creating a new thin pool and
then a new thin lv in the new pool.
1. lvcreate --size --virtualsize --thinpool
Needs a cmd def, it was missing.
The def is unique by the three required
options: size, virtualsize and thinpool.
2. lvcreate --size --virtualsize --thinpool VG
Needs a cmd def, it was missing.
The def is unique by the three required
options: size, virtualsize and thinpool,
and one required positional arg: VG.
3. lvcreate --thin --virtualsize --size LV_new|VG
This existing def should not accept an optional
--type thin, which if used makes it indistinct
from another def.
4. lvcreate --size --virtualsize VG
This existing def should not accept an optional
--type thin or --thin, which if used makes it
indistinct from other defs (e.g. 3)
This reverts commit 642d682d8d.
Using the thinpool option with this cmd def makes it
indistinct from other cmd defs where thinpool is a
required option.
The libdevmapper interface compares existing table line retrieved from
the kernel to new table line created to decide if it can suppress a reload.
Any difference between input and output of the table line is taken to be a
change thus causing a table reload.
The dm-raid target started to misorder the raid parameters (e.g. 'raid10_copies')
starting with dm-raid target version 1.9.0 up to (excluding) 1.11.0. This causes
runtime failures (limited to raid10 as of tests) and needs to be reversed to allow
e.g. old lvm2 uspace to run properly.
Check for the aforementioned version range and adjust creation of the table line
to the respective (mis)ordered sequence inside and correct order outside the range
(as described for the raid target in the kernels Documentation/device-mapper/dm-raid.txt).
lvcreate --thinpool POOL1 -L 100M --virtualsize 100M snapper_thinp
https://bugzilla.redhat.com/1434027
(The general rule is that a command is accepted if it is unambiguous.
The combination -L -V --thinpool uniquely identifies the operation.)
Udev events can come in like a flood when something changes. It really
doesn't do us any good to refresh the state of the service numerous times
when 1 would suffice. We had something like this before, but it was
removed when we added the refresh thread. However, we have since learned
that we need to sequence events in the correct order and block dbus
operations if we believe the state has been affected, thus udev events are
being processed on the main work queue. This change limits spurious
work items from getting to the queue.
If we always disable the sending of notify dbus events then in the case
where all the users are lvm dbus users we will be in udev handling mode
until at least 1 external lvm command occurs. Instead we will not disable
notify dbus until after we get at least 1 external event. This makes the
service get into the correct mode of operation faster.
lvconvert should not leak 'error' device.
(This patch is not fix the problem, just makes it more easily visible
instead of more confusing 'clvmd' trace).
Starting with dm-raid target version 1.9.0 shrinking of mapped devices is supported.
Check for support being present in lvresize and lvreduce.
Related: rhbz1394048
Since we want to test different cache policies with profiles mq&smq
raise version to 1.8.
TODO: maybe split in more tests so older targets can test here as well.
N.B.: passthrough is also not supported with version 1.3
Quering non-thin-pool segment for discard property may lead
to intenal error if the segment had set 'out-of-range' value,
so only thin-pool is allowed, for other it returns NULL.
If SubLVs to be removed still exist after an image removing
conversion (i.e. "lvconvert --yes --force --stripes N "
with N < total stripes) any request to convert to a different
striped/raid* level has to be rejected until after those freed
SubLVs got removed by running the aforementioned lvconvert again.
Add tests to check conversion to striped/raid* gets rejected.
Enhance a test comment.
Related: rhbz1191935
Related: rhbz1366296
Adjust to final target version 1.10.1 supporting reshape
properly and to recently changed report field specifications
(e.g. rehape_len_le) to allow these tests to run.
Lower mirror region size to suite the tiny test VG.
Previous commit 506d88a2ec introduced disabling lvmetad on repairs.
Avoid calling lvscan and use of any --config options altogether
in the mirror and raid DSOs.
Related: rhbz1380521
Repairing missing devices does not work reliably
with lvmetad, so disable lvmetad before repair.
A standard lvmetad refresh (pvscan --cache) will
enable lvmetad again.
Commit 07ded8059c assumed that the mirror is blocked which is not the case.
It is accessible, degraded and in need of repair because some of its legs
(partially) failed. Any auto-repair via dmeventd fails though because
of lvmetad not providing proper data about the failed PV(s). That's why
this workaround got introduced in commit 76f6951c3e until we get to
the lvmetad interaction core issue.
Mind any mirror auto-repair failure is caused by such lvmetad interaction
problems not yet solved so disabling lvmetad works as a resort as elaborated
on in the related bz.
Reintroducing the interim solution.
Resolves: rhbz1380521
Effectively revert whole 76f6951c3e.
We need to figure out some other solution.
At this moment usage of --config with 'repair' of blocked mirror
is 'freezing' combination.
For each section 8 man page, a .8_gen file is created from one of:
.8_main - Old-style man page - content used directly
.8_des and .8_end - Description and end section of a generated page
.8_pregen - Pre-generated page used if the generator fails
Other man sections are not generated and use the suffix .5_main or .7_main.
Developers should use 'make generate' to regenerate the .8_pregen files.
When a given command:
- matches command definition A based on required options,
but uses optional options that are not accepted by A.
- matches command definition B based on required options,
(but fewer required items than A), and uses no
unaccepted optional options.
then B is the correct choice. Command A would fail because
of the unaccepted optional options. The logic was mistakenly
letting A win because it had a greater number of required option
matches, without accounting for the optional option mismatches.
Systematically outline every combination of:
--type striped, --type mirror, --type raid, --mirrors, --stripes
and make sure each is assigned to one specific cmd def.
This revealed that a new command def is needed for
lvcreate that uses both --mirrors and --stripes
but no --type option.
The use of LVCREATE_RAID shortcut for an option set
resulted in mirrors/stripes being included in optional
opts set when they were already in the required list.
Sending %d as format argument in lvmetad_vg_remove_pending() will cause
segfaults in config_make_nodes_v() when va_arg() casts to int64_t. Also, it is
clearly advertised in the lvm source code that using plain %d is prohibited, so
let's switch to FMTd64.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Require that the path argument to dmfilemapd be an absolute path
and document this in tool output, libdevmapper.h and dmfilemapd.8.
The check is also enforced by dm_stats_start_filemapd() to avoid
forking a new process with an invalid path argument.
The initial check on argc incorrectly returns 1 when the wrong
number of arguments are present, causing a segfault in main()
when no arguments are given:
# dmfilemapd
Wrong number of arguments.
usage: dmfilemapd <fd> <group_id> <path> <mode> [<foreground>[<log_level>]]
Segmentation fault (core dumped)
Commit f4b30b0dae was about displaying visible LV size
when reshape space is allocated. Take parity devices
into account when displaying the user visible LV size.
As was recently done with relative signes for sizes/extents,
limit the signs used with the mirrors option, e.g.
lvcreate --mirrors now does not accept or advertise an
optional minus sign with the value. lvconvert --mirrors
accepts +|-.
Add a warning about maximum supported numbers of stripes
with striped LVs realtive to RAID conversions.
Add examples for a more elaborate, multi-step conversion
from linear to striped (and vice versa).
Shrink lvs examples output.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
The LVCONVERT_RAID shortcut for including options ended
up including mirrors/stripes as optional opts in defs where
they were already required, or in defs where they would
not be used. Remove the option set and specify in each
case only the options wanted.
Better support for lvdisplay.
By default info about running (in kernel) cache status is printed.
To get 'segtype' info, user runs: 'lvdisplay -m', example:
--- Logical volume ---
LV Path /dev/vg/lvol0
LV Name lvol0
VG Name vg
LV UUID Y4uWuN-TBGk-duer-aPWl-yBWn-iFFR-RU1gg1
LV Write Access read/write
LV Creation host, time linux, 2017-03-01 20:52:39 +0100
LV Cache pool name lvol2
LV Cache origin name lvol0_corig
LV Status available
# open 0
LV Size 12,00 MiB
Cache used blocks 10,42%
Cache metadata blocks 0,49%
Cache dirty blocks 0,00%
Cache read hits/misses 112 / 34
Cache wrt hits/misses 133 / 0
Cache demotions 0
Cache promotions 20
Current LE 3
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Segments ---
Logical extents 0 to 2:
Type cache
Chunk size 64,00 KiB
Metadata format 1
Mode writethrough
Policy smq
Setting migration_threshold=100000
OO_LVCREATE_CACHE accepts --cachemetadataformat.
Support new option --cachemetadataformat auto|1|2 for caching.
Word 'auto' can be also be given as '0'.
Report CMFmt column with cache metadata format version.
Report KMFmt column with 'kernel cache metadata format version' for device.
(a value reported from status).
(Update 'CacheMode' to name 'Cache' as primary segtype).
Only cache-pool segtype may store cache_metadata_format.
Only supported values are 0,1,2
Format 2 requires LV status uses LV_METADATA_FORMAT.
Format 0 (unselected) or 1 shall not set this 'incompatible' status.
Cache pool read/writes metadata_format within its segment type..
For CachePoolLV unselected metadata format is NOT stored in metadata.
For CacheLV when metadata format is not present/selected in lvm2 metadata,
it's automatically assumed to be the version 1 (backward compatible).
To ensure older lvm2 will not 'miss-read' metadata with new version 2,
such LV is marked with METADATA_FORMAT status flag (segment is
specifying metadata format). So when cache uses metadata format 2,
it will become inaccesible on older system without such support.
(kernel dm cache < 1.10, lvm2 < 2.02.169).
Add new profilable configation setting to let user select
which metadata format of a created cache pool he wish to use.
By default the 'best' available format is autodetected at runtime,
but user may enforce format 1 or 2 ATM.
Code also detects availability for metadata2 supporting cache target.
In case of troubles user may easily Disable usage of this feature
by placing 'metadata2' into global/cache_disabled_features list.
Older library version was not detecting unknown 'feature' bits
and could let start target without needed option.
New versioned symbol now checks for supported feature bits.
_Base version keeps accepting only previously known features and
mask/ignores unknown bits.
NB: if the older binary passed in 'random' bits, it will not get
metadata2 by chance. New linked binary get new validation function.
Library user is required to not pass 'trash' for unsupported bits,
as such calls will be rejected.
Dm cache target version 1.10 introduces new cache metadata format
(upstream kernel >=4.11).
New format is enable by passing new target feature flag metadata2.
Interace side on libdm uses DM_CACHE_FEATURE_METADATA2.
This feature bit is now also recognized on status
and set in 'feature_flags' field of dm_status_cache structure.
Code also adds check for 'highest' supported feature flag bit.
So it rejects properly any 'unknown' feature bit set by application.
Better code to enforce writethrough caching for cleaner policy.
Only check for cleaner when DM_CACHE_FEATURE_PASSTHROUGH or
DM_CACHE_FEATURE_WRITEBACK is set.
As now we can properly recognize all paramerters for pool creation,
we may drop PASS_ARG_ defines and rely on '_UNSELECTED' or 0 entries
as being those without user given args.
When setting are not given on command line - 'update' function
fill them from profiles or configuration. For this 'profile' arg
was needed to be passed around and since 'VG' itself is not needed,
it's been all replaced with 'cmd, profile, extents_size' args.
Reuse same code for getting/setting cache parameters with lvcreate.
So there is single one place how to get vars from profiles and configs.
Variables declarations are moved to start of function and
there is no need to initialize them as they are always
defined by functions get_cache_params() and get_pool_params().
Since cache chunk might be huge and there is no technical need
to enforce rounding and there is actually more 'real' VG space
used then necessary - keep rounding on 'chunk' bounrary only
for thin volumes - where it's the space used anyway.
NB: we support conversion of any-size 'existing' LV into cached LV.
Fix missing reset of '*settings' pointer when no args were given.
Handle cache_chunk settings like all other settings, so it is properly
updated only with non-zero settings and the existing cache-pool
chunk_size is not being reconfigured.
User can specify metadata profile which stores important cache
geometry data for easy configuration.
Fix missing support for getting chunk_size, cache_mode, cache_policy
for a cache/cache pools volumes from configuration or metadata profile.
Split-off patch that just replaces returns with 'goto bad'
so there is single place to release policy_settings.
In the next patch we will start to use some shared
function call between lvconvert and lvcreate
(effectively restoring previous logic which has got lost).
To more easily recognize unselected state from select '0' state
add new 'THIN_ZERO_UNSELECTED' enum.
Same applies to THIN_DISCARDS_UNSELECTED.
For those we no longer need to use PASS_ARG_ZERO or PASS_ARG_DISCARDS.
Basically code moving operation to have a single place resolving
thin_pool_chunk_size_policy.
Supported are generic & performance profiles.
Function is now shared between thin manipulation code and configuration
_CFG logic to obtain defaults and handle correct reporting upward coding
stack.
Test for NULL in dm_stats_destroy() and return immediately if
the struct dm_stats pointer is NULL (similar to free(NULL)).
This simplifies cleanup code which otherwise needs to:
out:
if (dms)
dm_stats_destroy(dms);
return;
Older compilers cannot tell that the 'mode' variable is only
used in branches in which it is assigned:
dmsetup.c:5651: warning: "mode" may be used uninitialized in this function
dmsetup.c:5023: warning: "mode" may be used uninitialized in this function
Avoid this by always assigning the variable a value.
Older compilers are not able to determine that although group_id
is only assigned in one branch of a conditional, it is never used
used when the other branch is taken:
libdm-stats.c:3319: warning: "group_id" may be used uninitialized in this function
Avoid this by always initialising the variable when it is
declared.
Utilizing the --config option we will utilize global/notify_dbus=0 so
that the service itself doesn't generate change events which it then needs to
process.
We need to place query operations in the queue to prevent the case where
a client knows of something before the service does. For example if a
client creates a PV/VG/LV outside of the dbus API and then immediately
tries to lookup and use that resource in the lvm dbus service it should
be present. By placing the queries in the work queue any previous
refresh operation will complete before we process the query.
In addition to the already supported conversion between 2-legged
raid1 and raid5, raid1 and raid4 can be also converted into each
other with 2 legs (raid4/5 are limited to map a 2-legged raid1).
This patch supports the missing raid4 conversion in the sequence
linear -> 2-legged raid1 -> raid4/5, then restripe to more than one
data stripes for performance and resilience reasons and optionally
convert to striped/raid0.
The other conversion sequence is also possible by converting N-way
striped/raid0 to raid4/5, then restripe to 2 legs followed by a
conversion to raid1 and optionally to linear (loosing all resilience).
Add examples for reshaping number of stripes
and converting from raid6 to striped to raid10.
Remove trailing spaces.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
The filemap daemon takes its program_id from the regions it is
managing: use DM_STATS_ALL_PROGRAMS when retrieving an initial
listing and then obtain the correct program_id from the group
leader.
On conversion from striped to raid0, data LVs are created
and all segments and their respective areas of the striped
LV are moved across to new segments allocated for the raid0
image LVs. This can cause non-canonical segments to be added
to the image LVs.
Add a call to lv_merge_segments() once all segments have been
added to an image LV to compensate for that. This avoids
unsafe table loads on activation.
Fix comments.
Automatic dmeventd repair of mirrors with active lvmetad configured
(mirror_image_fault_policy = "allocate") fails because the lvscan
run before the repair in the mirror DSO does not update the
lvmetad cache properly thus "lvconvert --repair ..." fails.
Need to scan the mirror LV before and after the repair
to have proper cache content after the repair finished.
The cache can't be relied on or the repair will fail.
Resolves: rhbz1380521
Launch an instance of the filemap monitoring daemon when creating,
or updating, a file mapped group, unless the --nomonitor switch is
given.
Unless --foreground is given the daemon will detach from the
terminal and run in the background until it is signaled or the
daemon termination conditions are met.
The --follow={inode|path} switch is added to control the daemon
behaviour when files are moved, unlinked, or renamed while they
are being monitored.
The daemon runs with the same verbosity as the dmstats command
that starts it.
Add a daemon that can be launched to monitor a group of regions
corresponding to the extents of a file, and to update the regions as the
file's allocation changes.
The daemon is intended to be started from a library interface, but can
also be run from the command line:
dmfilemapd <fd> <group_id> <path> <mode> [<foreground>[<log_level>]]
Where fd is a file descriptor open on the mapped file, group_id is the
group identifier of the mapped group and mode is either "inode" or
"path". E.g.:
# dmfilemapd 3 0 vm.img inode 1 3 3<vm.img
...
If foreground is non-zero, the daemon will not fork to run in the
background. If verbose is non-zero, libdm and daemon log messages will
be printed.
It is possible for the group identifier to change when regions are
re-mapped: this occurs when the group leader is deleted (regroup=1 in
dm_stats_update_regions_from_fd()), and another region is created before
the daemon has a chance to recreate the leader region.
The operation is inherently racey since there is currently no way to
atomically move or resize a dm_stats region while retaining its
region_id.
Detect this condition and update the group_id value stored in the
filemap monitor.
A function is also provided in the the stats API to launch the filemap
monitoring daemon:
int dm_stats_start_filemapd(int fd, uint64_t group_id, const char *path,
dm_filemapd_mode_t mode, unsigned foreground,
unsigned verbose);
This carries out the first fork and execs dmfilemapd with the arguments
specified.
A dm_filemapd_mode_t value is specified by the mode argument: either
DM_FILEMAPD_FOLLOW_INODE, or DM_FILEMAPD_FOLLOW_PATH. A helper function,
dm_filemapd_mode_from_string(), is provided to parse a string containing
a valid mode name into the appropriate dm_filemapd_mode_t value.
It's not an error to call dm_stats_group_present() on a handle
that contains no regions.
This causes dmfilemap to log a false backtrace during shutdown
if all regions are removed from the corresponding device:
exiting _filemap_monitor_get_events() with deleted=0, check=0
waiting for FILEMAPD_WAIT
dm message (253:1) [ opencount flush ] @stats_list dmstats [32768] (*1)
<backtrace>
Filemap group removed: exiting.
Change this to only emit a backtrace if the handle is NULL.
Splitting off an image LV of a 2-legged
raid1 LV causes loss of resilience.
Ask user to avoid uninformed loss of all resilience.
Don't ask for N > 2 legged raid1 LVs.
Adjust tests.
Splitting off an image LV of a 2-legged raid1 LV tracking changes
causes loosing partial resilience for any newly written data set.
Full resilience will be provided again after the split off image LV
got merged back in and the new data set got fully synchronized.
Reason being that the data is only stored on the remaining single
writable image during the split.
Ask user to avoid uninformed loss of such partial resilience.
Don't ask for N > 2 legged raid1 LVs.
In case N images fail (N <= parity chunks) _and_
a "vgreduce --removemissing --force VG" was applied
a following repair of the RaidLV fails:
Unable to remove N images: Only 0 devices given.
Failed to remove the specified images from tb/r.
Failed to replace faulty devices in tb/r.
Fix as of this commit results in correct repair:
Faulty devices in tb/r successfully replaced.
Add new values for different sign variations, resulting in:
size_VAL no sign accepted
ssize_VAL accepts + or -
psize_VAL accepts +
nsize_VAL accpets -
extents_VAL no sign accepted
sextents_VAL accepts + or -
pextents_VAL accepts +
nextents_VAL accepts -
Depending on the command being run, change the option
values for --size, --extents, --poolmetadatasize to
use the appropriate value from above.
lvcreate uses no sign (but accepts + and ignores it).
lvresize accepts +|- for a relative change.
lvextend accepts + for a relative change.
lvreduce accepts - for a relative change.
Like opt and val arrays in previous commit, combine duplicate
arrays for lv types and props in command.c and lvmcmdline.c.
Also move the command_names array to be defined in command.c
so it's consistent with the others.
command.c and lvmcmdline.c each had a full array defining
all options and values. This duplication was not removed
when the command.c code was merged into the run time.
seg->extents_copied has to be defined properly on reducing
the size of a raid LV or conversion from raid5 with 1 stripe
to raid1 will fail.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Reshaping a raid5 LV to one stripe aiming to convert it to
raid1 (and optionally to linear) reports the wrong LV size
when still having reshape space allocated.
The lv_extend/_lv_reduce API doesn't cope with resizing RaidLVs
with allocated reshape space and ongoing conversions. Prohibit
resizing during conversions and remove the reshape space before
processing resize. Add missing seg->data_copies initialisation.
Fix typo/comment.
For the time being raid10 is limited to even number of total stripes
as is and 2 data copies. The number of stripes provided on creation
of a raid10(_near) LV with -i/--stripes gets doubled to define
that even total number of stripes (i.e. images).
Apply the same on disk adding conversions (reshapes) with
"lvconvert --stripes RaidLV" (e.g. 2 stripes = 4 images
total converted to 3 stripes = 6 images total).
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Part of the UNIT description was still living in the
--size description. Move it to the Size[UNIT] description
since it is used by other options, not just --size.
In lvcreate/lvconvert, --poolmetdatasize can only be an
absolute value, but in lvresize/lvextend, --poolmetadatasize
can be given a + relative value.
The val types only currently support relative values that
go in both directions +|-. Further work is needed to add
val types that can be relative in only one direction, and
switching various option values to one those depending on
the command name. Then poolmetdatasize will not appear
with a +|- option in lvcreate/lvconvert, and will
appear with only the + option in lvresize/lvextend.
Enhance the raid report functions for the recently added LV fields
reshape_len, reshape_len_le, data_offset, new_data_offset, data_copies,
data_stripes and parity_chunks to cope with "lvs --select".
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Clean up and correct the details around --extents and --size.
lvcreate/lvresize/lvreduce/lvextend all now display the
extents option in usages.
The Size and Number value variables for --size and --extents
are now displayed without the [+|-] prefix for lvcreate.
A special case is needed to display
--extents for lvcreate since the cmd defs
treat --extents as an automatic alternative
to --size (to avoid duplicating every cmd def).
Now that we got the "data_stripes" field key, adjust the "stripes" field description.
Enhance the "regionsize" field description to cover raids as well.
When a cmd def includes multiple sets of options (OO_FOO),
allow multiple OO_FOO sets to contain the same option and
avoid repeating it in the cmd def.
There was confusion in the code about whether or not the
--size option accepted a sign. Make it consistent and clear
that it does.
This exposes a new problem in that an option can only
accept one value type, e.g. --size can only accept a
signed number, it cannot accept a positive or negative
number for some commands and reject negative numbers for
others.
In practice, lvcreate accepts only positive --size
values and lvresize accepts positive or negative --size
values. There is currently no way to encode this
difference. Until that is fixed, the man page output
is hacked to avoid printing the [+|-] prefix for sizes
in lvcreate.
Settings specified in other command line args take precedence over
profiles and --config, which takes precedence over settings in actual
config files.
Since commit 1e2420bca8 ('commands: new
method for defining commands') commands like this:
lvchange --config 'global/test=1' -ay vg
have been printing the 'TEST MODE' message, but nevertheless making
real changes.
Commit 48778bc503 introduced new RAID reshaping related report fields.
The inclusioon of segtype.h in properties.c isn't mandatory, remove it.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Commit 80a6de616a versioned the dm_tree_node_add_raid_target_with_params()
and dm_tree_node_add_raid_target() APIs for compatibility reasons.
There's no user of the latter function, remove it.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
During an ongoing reshape, the MD kernel runtime reads stripes relative
to data_offset and starts storing the reshaped stripes (with new raid
layout and/or new stripesize and/or new number of stripes) relative
to new_data_offset. This is to avoid writing over any data in place
which is non-atomic by nature and thus be recoverable without data loss
in the transition. MD uses the term out-of-place reshaping for it.
There's 2 other areas we don't have report capability for:
- number of data stripes vs. total stripes
(e.g. raid6 with 7 stripes toal has 5 data stripes)
- number of (rotating) parity/syndrome chunks
(e.g. raid6 with 7 stripes toal has 2 parity chunks; one
per stripe for P-Syndrome and another one for Q-Syndrome)
Thus, add the following reportable keys:
- reshape_len (in current units)
- reshape_len_le (in logical extents)
- data_offset (in sectors)
- new_data_offset ( " )
- data_stripes
- parity_chunks
Enhance lvchange-raid.sh, lvconvert-raid-reshape-linear_to_striped.sh,
lvconvert-raid-reshape-striped_to_linear.sh, lvconvert-raid-reshape.sh
and lvconvert-raid-takeover.sh to make use of new keys.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Add/remove the SECONDARY_SYNTAX flag to cmd defs.
cmd defs with this flag will be listed under the
ADVANCED USAGE man page section, so that the main
USAGE section contains the most common commands
without distraction.
- When multiple cmd defs do the same thing, one variant
can be displayed in the first list.
- Very advanced, unusual or uncommon commands should be
in the second list.
Recently added check for reshaping in this function called for
a cleanup to avoid proliferating it with more explicit conditionals.
Base the reshaping check on the given _features array.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
An initialization was missing when converting striped to raid0(_meta)
causing unitialized reshape_len in the new component LVs first segment.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
The imposed minimum region size can cause rejection on
disk removing reshapes. Lower it to avoid that.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Commit 27384c52cf lowered the maximum number of devices
back to 64 for compatibility.
Because more members have been added to the API in
'struct dm_tree_node_raid_params *', we have to version
the public libdm RAID API to not break any existing users.
Changes:
- keep the previous 'struct dm_tree_node_raid_params' and
dm_tree_node_add_raid_target_with_params()/dm_tree_node_add_raid_target()
in order to expose the already released public RAID API
- introduce 'struct dm_tree_node_raid_params_v2' and additional functions
dm_tree_node_add_raid_target_with_params_v2()/dm_tree_node_add_raid_target_v2()
to be used by the new lvm2 lib reshape extentions
With this new API, the bitfields for rebuild/writemostly legs in
'struct dm_tree_node_raid_params_v2' can be raised to 256 bits
again (253 legs maximum supported in MD kernel).
Mind that we can limit the maximum usable number via the
DEFAULT_RAID{1}_MAX_IMAGES definition in defaults.h.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
- Combine the equivalent lvconvert --type raid defs.
(Two cmd defs must be different without relying
on LV type, which are not known at the time the
cmd def is matched.)
- Remove unused optional options from lvconvert --stripes,
and lvconvert --stripesize.
- Use Number for --stripes_long val type.
- Combine the cmd def for raid reshape cleanup into the
existing start_poll cmd def (they were duplicate defs).
Calls into the raid code from a poll opertion will be
added.
Commit 64a2fad5d6 raised the maximum number of RAID devices to 64.
Commit e2354ea344 introduced RAID_BITMAP_SIZE as 4 to have
256 bits (4 * 64 bit array members), thus changing the libdm API
unnecessarilly for the time being.
To not change the API, reduce RAID_BITMAP_SIZE to 1.
Remove an unneeded definition of it from libdm-common.h.
If we ever decide to raise past 64, we'll version the API.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
The fedorahosted git repository shuts down tomorrow:
https://communityblog.fedoraproject.org/fedorahosted-sunset-2017-02-28/
Our upstream git repository has moved back to sourceware.org.
Mailing list hosting is not changing.
Gitweb:
https://www.sourceware.org/git/?p=lvm2
Git:
git://sourceware.org/git/lvm2.git
ssh://sourceware.org/git/lvm2.git
http://sourceware.org/git/lvm2.git
Example command to change the origin of a repository clone:
Public:
git remote set-url origin git://sourceware.org/git/lvm2.git
Committers:
git remote set-url origin git+ssh://sourceware.org/git/lvm2.git
The options list was sorted as:
- options with both long and short forms, alphabetically
- options with only long form, alphabetically
This was done only for the visual effect. Change to
sort alphabetically by long opt, without regard to
short forms.
Fixes commit 286d39ee3c, which was correct except
for a reversed strstr. Now uses strchr, and modifies
a copy of the name so the original argv is preserved.
When requesting a regionsize change during conversions, check
for constraints or the command may fail in the kernel n case
the region size is too smalle or too large thus leaving any
new SubLVs behind.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Allow regionsize on upconvert from linear:
fix related commit 2574d3257a to actually work
Related: rhbz1394427
Remove setting raid5_n on conversions from raid1
as of commit 932db3db53 because any raid5 mapping
may be requested.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Add aforementioned but forgotten new test scripts
lvconvert-raid-reshape-linear_to_striped.sh,
lvconvert-raid-reshape-striped_to_linear.sh and
lvconvert-raid-reshape.sh
Those presume dm-raid target version 1.10.2
provided by a following kernel patch.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Because of contraints in renaming shifted rimage/rmeta LV names
the current RaidLV limit is a maximum of 10 SubLV pairs.
With the previous introduction of reshaping infratructure that
constriant got removed.
Kernel supports 253 since dm-raid target 1.9.0, older kernels 64.
Raise the maximum number of RaidLV rimage/rmeta pairs to 64.
If we want to raise past 64, we have to introdce a check for
the kernel supporting it in lvcreate/lvconvert.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces the changes to call the reshaping infratructure
from lv_raid_convert().
Changes:
- add reshaping calls from lv_raid_convert()
- add command definitons for reshaping to tools/command-lines.in
- fix raid_rimage_extents()
- add 2 new test scripts lvconvert-raid-reshape-linear_to_striped.sh
and lvconvert-raid-reshape-striped_to_linear.sh to test
the linear <-> striped multi-step conversions
- add lvconvert-raid-reshape.sh reshaping tests
- enhance lvconvert-raid-takeover.sh with new raid10 tests
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Change:
- allow raid_rimage_extents() to calculate raid10
- remove an __unused__ attribute
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Change:
- add missing raid1 <-> raid5 conversions to support
linear <-> raid5 <-> raid0(_meta)/striped conversions
- rename related new takeover functions to
_takeover_from_raid1_to_raid5 and _takeover_from_raid5_to_raid1,
because a reshape to > 2 legs is only possible with
raid5 layout
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Change:
- enhance _clear_meta_lvs() to support raid0 allowing
raid0_meta -> raid10 conversions to succeed by clearing
the raid0 rmeta images or the kernel will fail because
of discovering reordered raid devices
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- enhance _raid45610_to_raid0_or_striped_wrapper() to support
raid5_n with 2 areas to raid1 conversion to allow for
striped/raid0(_meta)/raid4/5/6 -> raid1/linear conversions;
rename it to _takeover_downconvert_wrapper to discontinue the
illegible function name
- enhance _striped_or_raid0_to_raid45610_wrapper() to support
raid1 with 2 areas to raid5* conversions to allow for
linear/raid1 -> striped/raid0(_meta)/raid4/5/6 conversions;
rename it to _takeover_upconvert_wrapper for the same reason
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add missing possible reshape conversions and conversion options
to allow/prohibit changing stripesize or number fo stripes
- enhance setting convenient riad types in reshape conversions
(e.g. raid1 with 2 legs -> radi5_n)
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add _raid_reshape() using the pre/post callbacks and
the stripes add/remove reshape functions introduced before
- and _reshape_requested function checking if a reshape
was requested
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add vg metadata update functions
- add pre and post activation callback functions for
proper sequencing of sub lv activations during reshaping
- move and enhance _lv_update_reload_fns_reset_eliminate_lvs()
to support pre and post activation callbacks
- add _reset_flags_passed_to_kernel() which resets anyxi
rebuild/reshape flags after they have been passed into the kernel
and sets the SubLV remove after reshape flags on legs to be removed
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add function to support disk adding reshapes
- add function to support disk removing reshapes
- add function to support layout (e.g. raid5ls -> raid5_rs)
or stripesize reshaping
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add function providing state of a reshaped RaidLV
- add function to adjust the size of a RaidLV was
reshaped to add/remove stripes
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add lv_raid_data_copies returning raid type specific number;
needed for raid10 with more than 2 data copies
- remove _shift_and_rename_image_components() constraint
to support more than 10 raid legs
- add function to calculate total rimage length used by out-of-place
reshape space allocation
- add out-of-place reshape space alloc/relocate/free functions
- move _data_rimages_count() used by reshape space alloc/realocate functions
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces local infrastructure to raid_manip.c
used by followup patches.
Add functions:
- to check reshaping is supported in target attibute
- to return device health string needed to check
the raid device is ready to reshape
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces infrastructure prerequisites to be used
by raid_manip.c extensions in followup patches.
This base is needed for allocation of out-of-place
reshape space required by the MD raid personalities to
avoid writing over data in-place when reading off the
current RAID layout or number of legs and writing out
the new layout or to a different number of legs
(i.e. restripe)
Changes:
- add members reshape_len to 'struct lv_segment' to store
out-of-place reshape length per component rimage
- add member data_copies to struct lv_segment
to support more than 2 raid10 data copies
- make alloc_lv_segment() aware of both reshape_len and data_copies
- adjust all alloc_lv_segment() callers to the new API
- add functions to retrieve the current data offset (needed for
out-of-place reshaping space allocation) and the devices count
from the kernel
- make libdm deptree code aware of reshape_len
- add LV flags for disk add/remove reshaping
- support import/export of the new 'struct lv_segment' members
- enhance lv_extend/_lv_reduce to cope with reshape_len
- add seg_is_*/segtype_is_* macros related to reshaping
- add target version check for reshaping
- grow rebuilds/writemostly bitmaps to 246 bit to support kernel maximal
- enhance libdm deptree code to support data_offset (out-of-place reshaping)
and delta_disk (legs add/remove reshaping) target arguments
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
(Change to recent commit 3f4ecaf8c2.)
Use --foo Number[k|UNIT] to indicate that
the default units of the number is k, but other
units listed below are also accepted.
Previously, underlined/italic Unit was used,
like other of variables, but this UNIT is more
like a shortcut than an actual variable.
The MD kernel raid1 personality does no use any writemostly leg as the primary.
In case a previous linear LV holding data gets upconverted to
raid1 it becomes the primary leg of the new raid1 LV and a full
resynchronization is started to update the new legs.
No writemostly and/or writebehind setting may be allowed during
this initial, full synchronization period of this new raid1 LV
(using the lvchange(8) command), because that would change the
primary (i.e the previous linear LV) thus causing data loss.
lvchange has a bug not preventing this scenario.
Fix rejects setting writemostly and/or writebehind on resychronizing raid1 LVs.
Once we have status in the lvm2 metadata about the linear -> raid upconversion,
we may relax this constraint for other types of resynchronization
(e.g. for user requested "lvchange --resync ").
New lvchange-raid1-writemostly.sh test is added to the test suite.
Resolves: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855895
We use --foo Number[k|Units] to indicate that
the default units of the number is k, but other
units listed below are also accepted.
Capitalize and underline Units so it is consistent
with other variables, and reference it at the end.
Technically, the k should be bold, but this
tends to make the text visually hard to read
because of the excessive highlights scattered
everywhere. So it's left normal text for now
(it's unlikely to confuse anyone.)
Print ... after a repeatable option in the OPTIONS section.
An alternative would be to just mention in the text description
that the option is repeatable.
When the test would need to try to write some large amount of data
we can give it 'zero' segments - for obvious reason such written data
can't be verified but in some test cases it doesn't really matter.
Usage follows 'error_dev' style.
For now test suite is not supporting any combination of
error/delay/zero segments so only 1 type could be used per PV.
Previously when lvremove tried to remove 'active' origin,
it had been asking for every 'snapshot' LV separately
and doing individual single snapshot removals first.
To be faster it now deactivates origin before removal
all connected snapshots.
This avoids multiple reloads of dm table for origin volume
which were unnecessary as origin was meant to be removed as well.
There are two kinds of common options:
1. options common to all variants of a given command name
2. options common to all lvm commands
Previously, both kinds of common options were listed together
under "Common options". Now the first are printed under
"Common options for command" (when needed), and the second
are printed under "Common options for lvm" (always).
Remove the "usage notes" which should just
live in the man pages.
When there are 3 or more variants of a command,
print all the options produces a lot of output,
so require --longhelp to print all the options
in these cases.
Check 'dmsetup version' is called before starting any more
advanced logic in $DM_DEV_DIR.
Call also replaces mkdir as it creates needed path with control node.
The --type option has previously been accepted for
lvresize/lvextend. Using it did not affect the operation
of the command. The value was simply verified as matching
the current seg type of the LV.
Commit f45b689406 caused regression
of lvresize -m and --type parameter
After fix this sequence may work when we also fix syntax description:
lvcreate -l1 -m1 -n lv1 vg
lvextend --type mirror -m1 -l+1 vg/lv1
For this syntax:
lvconvert --thinpool LV1 --poolmetadata LV2
lvconvert --cachepool LV1 --poolmetadata LV2
Restore the metadata swapping behavior in addition to
the pool creation behavior. When LV1 is already a pool,
the metadata LV will be swapped with LV2.
When LV1 is not a pool, it will be converted to a
pool using the specified LV for metadata.
This syntax is no longer advertised because of the
ambiguous behavior. The primary syntaxes for pool
creation and metadata swapping will be the advertised
methods.
As we now user binary search - it's nondeterministict
which of the same 'args' will be give - so duplicates
need 'extra' care.
So provide same hack for output for --uuidstr_ARG as
for input.
Solves 'pvscan -u'.
Since there is a lot of options and lot of searches,
use binary search to keep strcmp at minimum.
The interesting part is - alphabetically sorted array contains
duplicates and some of them are not the 'right anwer', so
after we find matching string but not matching long_ARG,
we may need to check if the surrouding strings are the right matching
one.
The single loops is used also for strictly define --foo_long
(i.e. --stripes) and just differs at final part.
TODO1: replace strstr call with some flag (just like short_opt).
TODO2: drop '--' from being stored and tests by strcmp.
When parsing command defs, track and report all
errors that are found. Add an error return case
from define_commands so the standard error exit
path is used.
When using liblvm2cmd, a process can do multiple
init/exit calls, i.e.
lvm2_init(); lvm2_run(); lvm2_exit();
lvm2_init(); lvm2_run(); lvm2_exit();
...
define_commands() needs to set up the global commands[]
definitions only the first time.
The old ad hoc arg parsing when combining a split snapshot
allowed the first lv to have a vgname, but not the second.
Since lvconvert now uses the standard arg parsing in
process_each_lv(), the old one-off behavior requires a
work around.
This reverts commit 717363bb94.
These alternate forms for swapping metadata cannot be
distinguished from the command for creating a pool.
If we were to add these alternate forms for swapping
metadata, we would need to overload the pool creation
command defs, making those definitions ambiguous.
Change run time access to the command_name struct
cmd->cname instead of indirectly through
cmd->command->cname. This removes the two run time
fields from struct command.
All lvconvert functionality has been moved out of the
previous monolithic lvconvert code, except conversions
related to raid/mirror/striped/linear. This switches
that remaining code to be based on command defs, and
standard process_each_lv arg processing. This final
switch results in quite a bit of dead code that is
also removed.
This is a new explicit version of 'lvconvert LV'
which has been an obscure way of triggering polling
to be restarted on an LV that was previously converted.
Lift all the snapshot utilities (merge, split, combine)
out of the monolithic lvconvert implementation, using
the command definitions. The old code associated with
these commands is now unused and will be removed separately.
This lifts the lvconvert --repair and --replace commands
out of the monolithic lvconvert implementation. The
previous calls into repair/replace can no longer be
reached and will be removed in a separate commit.
The new check_single_lv() function is called prior to the
existing process_single_lv(). If the check function returns 0,
the LV will not be processed.
The check_single_lv function is meant to be a standard method
to validate the combination of specific command + specific LV,
and decide if the combination is allowed. The check_single
function can be used by anything that calls process_each_lv.
As commands are migrated to take advantage of command
definitions, each command definition gets its own entry
point which calls process_each for itself, passing a
pair of check_single/process_single functions which can
be specific to the narrowly defined command def.
. Define a prototype for every lvm command.
. Match every user command with one definition.
. Generate help text and man pages from them.
The new file command-lines.in defines a prototype for every
unique lvm command. A unique lvm command is a unique
combination of: command name + required option args +
required positional args. Each of these prototypes also
includes the optional option args and optional positional
args that the command will accept, a description, and a
unique string ID for the definition. Any valid command
will match one of the prototypes.
Here's an example of the lvresize command definitions from
command-lines.in, there are three unique lvresize commands:
lvresize --size SizeMB LV
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync, --reportformat String, --resizefs,
--stripes Number, --stripesize SizeKB, --poolmetadatasize SizeMB
OP: PV ...
ID: lvresize_by_size
DESC: Resize an LV by a specified size.
lvresize LV PV ...
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync,
--reportformat String, --resizefs, --stripes Number, --stripesize SizeKB
ID: lvresize_by_pv
DESC: Resize an LV by specified PV extents.
FLAGS: SECONDARY_SYNTAX
lvresize --poolmetadatasize SizeMB LV_thinpool
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync,
--reportformat String, --stripes Number, --stripesize SizeKB
OP: PV ...
ID: lvresize_pool_metadata_by_size
DESC: Resize a pool metadata SubLV by a specified size.
The three commands have separate definitions because they have
different required parameters. Required parameters are specified
on the first line of the definition. Optional options are
listed after OO, and optional positional args are listed after OP.
This data is used to generate corresponding command definition
structures for lvm in command-lines.h. usage/help output is also
auto generated, so it is always in sync with the definitions.
Every user-entered command is compared against the set of
command structures, and matched with one. An error is
reported if an entered command does not have the required
parameters for any definition. The closest match is printed
as a suggestion, and running lvresize --help will display
the usage for each possible lvresize command.
The prototype syntax used for help/man output includes
required --option and positional args on the first line,
and optional --option and positional args enclosed in [ ]
on subsequent lines.
command_name <required_opt_args> <required_pos_args>
[ <optional_opt_args> ]
[ <optional_pos_args> ]
Command definitions that are not to be advertised/suggested
have the flag SECONDARY_SYNTAX. These commands will not be
printed in the normal help output.
Man page prototypes are also generated from the same original
command definitions, and are always in sync with the code
and help text.
Very early in command execution, a matching command definition
is found. lvm then knows the operation being done, and that
the provided args conform to the definition. This will allow
lots of ad hoc checking/validation to be removed throughout
the code.
Each command definition can also be routed to a specific
function to implement it. The function is associated with
an enum value for the command definition (generated from
the ID string.) These per-command-definition implementation
functions have not yet been created, so all commands
currently fall back to the existing per-command-name
implementation functions.
Using per-command-definition functions will allow lots of
code to be removed which tries to figure out what the
command is meant to do. This is currently based on ad hoc
and complicated option analysis. When using the new
functions, what the command is doing is already known
from the associated command definition.
Some archs can use even 64K pages and then lvm2 runs into trouble if
the stack is 'too small' to fit extra page capturing stack overwrite.
So when lvm2 limits stack - add extra mem page - be it 4K or 64K.
Relates to ppc64le bug: https://bugzilla.redhat.com/1387279
Remove allocate_pvs from raid_manip.c:_region_size_change_request() API
and lv_extend() using it added for temporary test purpose.
Related: rhbz1366296
Add:
- region size checks to raid_manip.c types array and supporting functions
- tests to lvconvert-raid-takeover.sh to check bogus
"lvconvert --type --regionsize N " requests
Related: rhbz1366296
When we preload device with smaller size, we avoid its resume,
so later suspend/resume of full device tree my process all
existing in flight bios.
Also update comment and avoid using confusing opposite meaning.
Kernel 4.10 (dm-crypt v1.15.0) and later supports loading device
tables with crypt segment having key in kernel keyring retention
service.
dmsetup hid key section of tables output. With this patch dmsetup
no longer hides key section if it detects kernel key description
instead of hex byte representation of key itself.
Add:
- conversion support from striped/raid0/raid0_meta to/from raid10;
raid10 goes by the near format (same as used in creation of
raid10 LVs), which groups data copies together with their original
blocks (e.g. 3-way striped, 2 data copies resulting in 112233 in the
first stripe followed by 445566 in the second etc.) and is limited
to even numbers of legs for now
- related tests to lvconvert-raid-takeover.sh
- typo
Related: rhbz1366296
2017-02-10 19:13:02 +01:00
778 changed files with 61578 additions and 23934 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.