IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Correcting rounding rules for percentage evaluation.
Validate supported range of percentage.
(although ranges are already validated earlier on code path)
This is probably somewhat experimantal patch - but when i.e. raid device
is just extend, there should not be a technical need for flush,
unless the target would stricly need it. It should allow faster
processing of lvm command not being blocked by possibly longer flush.
Since we do not support rimage & rmeta for snapshots - we can
avoid quering for -cow devices and add them as origin_only -
since their snapshots (-cow) could have never existed.
This redumes several ioctl operation during table preloading.
Switch remaining zero sized struct to flexible arrays to be C99
complient.
These simple rules should apply:
- The incomplete array type must be the last element within the structure.
- There cannot be an array of structures that contain a flexible array member.
- Structures that contain a flexible array member cannot be used as a member of another structure.
- The structure must contain at least one named member in addition to the flexible array member.
Although some of the code pieces should be still improved.
While normally the 'mmap' file reading is better utilizing resources,
it has also its odd side with handling errors - so while we normally
use the mmap only for reading regular files from root filesystem
(i.e. lvm.conf) we can't prevent error to happen during the read
of these file - and such error unfortunately ends with SIGBUS error.
Maintaing signal handler would be compilated - so switch to slightly
less effiecient but more error resistant read() functinality.
reproducible steps:
1. vgcreate vg1 /dev/sda /dev/sdb
2. lvcreate --type raid0 -l 100%FREE -n raid0lv vg1
3. do remove the /dev/sdb action
4. lvdisplay show wrong 'LV Status'
After removing raid0 type LV underlying dev, lvdisplay still display
'available'. This is wrong status for raid0.
This patch add a new function raid_is_available(), which will handle
all raid case.
With this patch, lvdisplay will show
from:
LV Status available
to:
LV Status NOT available (partial)
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.com>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
It's better to set most of option as 'commented' with some
documented defaults instead of providing strict values.
This has the advantage we can eventually 'change' defualts
and get them working in future. Otherwise once the setting
is stored in lvm.conf in /etc, such setting has strictly
defined value and that can be only change with file update.
merge.c:_check_lv_segment() was checking regionsize vs. mirrored LV size on
any 'mirror/raid1/raid10' segment type including type 'mirrored' mirror logs.
Avoid the check only for 'mirrored' mirror logs to allow conversion from log
type 'disk' with regionsize > mirror log SubLV size.
As we disabled support for 'mirrored' mirror logs with
commit e82303fd6a which still conditionally
allows to enable it via global/support_mirrored_mirror_logs=1,
patch is mandatory for all distributions.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1712983
Currently lvm2 is not wiping signatures when creating 'metadata' volumes
and raid _rmeta was the only exception - so make the behavior consistent
with other metadata devices and drop wiping ATM.
Drop also some extra debug since they are now more explanatory in
wipe_lv() function.
Also note - although lvm2 now does not wipe signatures - the error
from such wipping used to be actually 'ignored' before wipe_lv()
started to return error (with recent commit) and raid creation
continued with 'unzeroed' metadata device.
TODO: Several issues to resolve:
1. We may want to flip to wipping with all LVs (in that case we need to
support passing --yet & --force).
2. Also we may want to clear whole metadata device - however current
function is also used for wipping i.e. snapshot COW device which
is likely not a good candidate for full device zeroing.
We may also need to think about better logic when extent size is
enforcing very large LVs, when only a small portion of LV is ever
being used.
3. Using TRIM instead of zeroing metadata device might be worth to
implement.
mm
To avoid polution of metadata with some 'garbage' content or eventualy
some leak of stale data in case user want to upload metadata somewhere,
ensure upon allocation the metadata device is fully zeroed.
Behaviour may slow down allocation of thin-pool or cache-pool a bit
so the old behaviour can be restored with lvm.conf setting:
allocation/zero_metadata=0
TODO: add zeroing for extension of metadata volume.
Failure in wiping/zeroing stop the command.
If user wants to avoid command abortion he should use -Zn or -Wn
to avoid wiping.
Note: there is no easy way to distinguish which kind of failure has
happend - so it's safe to not proceed any futher.
When initiated larger write request, it may have happened, bcache
got out of free chunks - fix the loop, that is supposed to wait
until next free chunk becomes avain available.
To create a new cache or writecache LV with a single command:
lvcreate --type cache|writecache
-n Name -L Size --cachedevice PVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, a new cachevol LV is created internally, using PVfast
specified by the cachedevice option.
- Then, the cachevol is attached to the main LV, converting the
main LV to type cache|writecache.
Include --cachesize Size to specify the size of cache|writecache
to create from the specified --cachedevice PVs, otherwise the
entire cachedevice PV is used. The --cachedevice option can be
repeated to create the cache from multiple devices, or the
cachedevice option can contain a tag name specifying a set of PVs
to allocate the cache from.
To create a new cache or writecache LV with a single command
using an existing cachevol LV:
lvcreate --type cache|writecache
-n Name -L Size --cachevol LVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, the cachevol LVfast is attached to the main LV, converting
the main LV to type cache|writecache.
In cases where more advanced types (for the main LV or cachevol LV)
are needed, they should be created independently and then combined
with lvconvert.
Example
-------
user creates a new VG with one slow device and one fast device:
$ vgcreate vg /dev/slow1 /dev/fast1
user creates a new 8G main LV on /dev/slow1 that uses all of
/dev/fast1 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1
-n main -L 8G vg /dev/slow1
Example
-------
user creates a new VG with two slow devs and two fast devs:
$ vgcreate vg /dev/slow1 /dev/slow2 /dev/fast1 /dev/fast2
user creates a new 8G main LV on /dev/slow1 and /dev/slow2
that uses all of /dev/fast1 and /dev/fast2 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1 --cachedevice /dev/fast2
-n main -L 8G vg /dev/slow1 /dev/slow2
Example
-------
A user has several slow devices and several fast devices in their VG,
the slow devs have tag @slow, the fast devs have tag @fast.
user creates a new 8G main LV on the slow devs with a
2G writecache on the fast devs:
$ lvcreate --type writecache -n main -L 8G
--cachedevice @fast --cachesize 2G vg @slow
It's possible for a dev-cache entry to remain after all
paths for it have been removed, and other parts of the
code expect that a dev always has a name. A better fix
may be to remove a device from dev-cache after all paths
to it have been removed.
When either logical block size or physical block size is 4K,
then lvmlockd creates sanlock leases based on 4K sectors,
but the lvm client side would create the internal lvmlock LV
based on the first logical block size it saw in the VG,
which could be 512. This could cause the lvmlock LV to be
too small to hold all the sanlock leases. Make the lvm client
side use the same sizing logic as lvmlockd.
dm-integrity stores checksums of the data written to an
LV, and returns an error if data read from the LV does
not match the previously saved checksum. When used on
raid images, dm-raid will correct the error by reading
the block from another image, and the device user sees
no error. The integrity metadata (checksums) are stored
on an internal LV allocated by lvm for each linear image.
The internal LV is allocated on the same PV as the image.
Create a raid LV with an integrity layer over each
raid image (for raid levels 1,4,5,6,10):
lvcreate --type raidN --raidintegrity y [options]
Add an integrity layer to images of an existing raid LV:
lvconvert --raidintegrity y LV
Remove the integrity layer from images of a raid LV:
lvconvert --raidintegrity n LV
Settings
Use --raidintegritymode journal|bitmap (journal is default)
to configure the method used by dm-integrity to ensure
crash consistency.
Initialization
When integrity is added to an LV, the kernel needs to
initialize the integrity metadata/checksums for all blocks
in the LV. The data corruption checking performed by
dm-integrity will only operate on areas of the LV that
are already initialized. The progress of integrity
initialization is reported by the "syncpercent" LV
reporting field (and under the Cpy%Sync lvs column.)
Example: create a raid1 LV with integrity:
$ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo
Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB.
Logical volume "rr_rimage_0_imeta" created.
Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB.
Logical volume "rr_rimage_1_imeta" created.
Logical volume "rr" created.
$ lvs -a foo
LV VG Attr LSize Origin Cpy%Sync
rr foo rwi-a-r--- 1.00g 4.93
[rr_rimage_0] foo gwi-aor--- 1.00g [rr_rimage_0_iorig] 41.02
[rr_rimage_0_imeta] foo ewi-ao---- 12.00m
[rr_rimage_0_iorig] foo -wi-ao---- 1.00g
[rr_rimage_1] foo gwi-aor--- 1.00g [rr_rimage_1_iorig] 39.45
[rr_rimage_1_imeta] foo ewi-ao---- 12.00m
[rr_rimage_1_iorig] foo -wi-ao---- 1.00g
[rr_rmeta_0] foo ewi-aor--- 4.00m
[rr_rmeta_1] foo ewi-aor--- 4.00m
When vdopool is activated standalone - we use a wrapping linear device
to hold actual vdo device active - for this we can set-up read-only
device to ensure there cannot be made write through this device to
actual pool device.
Creating a snapshot was using a persistent LV lock
on the origin, so if the origin LV was inactive at
the time of the snapshot the LV lock would remain.
(Running lvchange -an on the inactive LV would
clear the LV lock.) Use a transient LV lock so it
will be dropped if it was not locked previously.
When formating VDO volume, the calculated amound of bits
for 'vdoformat --slab-bits' parameter was shifted by 2 bits
(calculated size was making 2MiB vdo_slab_size_mb value appear like if
user would be specifying only 512KiB)
Fixed by properly converting internal size_mb value to KiB.
Fix the anoying kernel message reported:
device-mapper: cache: 253:2: metadata operation 'dm_cache_commit' failed: error = -5
which has been reported while cachevol has been removed.
Happened via confusing variable - so switch the variable to commonly user '_size'
which presents a value in sector units and avoid 'scaling' this as extent length
by vg extent size when placing 'error' target on removal path.
Patch shouldn't have impact on actual users data, since at this moment
of removal all date should have been already flushed to origin device.
m
The previous patch improved read of pipe when lvm2 was looking
for default logical size, but we clearly must read pipe also
for -V case, when the logical size is already defined.
Still the place can be better to block only particular reshape
operations which ATM cause kernel problems.
We check if the new number of images is higher - and prevent to take
conversion if the volume is in use (i.e. thin-pool's data LV).
clang: it's supposedly impossible path to hit, as we should always
have origin_lv defined when running this path, but adding protection
isn't a big issue to make this obvious to analyzer.
Since _reserve_area() may fail due to error allocation failure,
add support to report this already reported failure upward.
FIXME: it's log_error() without causing direct command failure.
Although we expect min_chunk_size to be 32bit value, for
large size of caches it might be useful to do calcs 64bit.
So to avoid doing shift as signed 32bit - use unsigned 64bit
from the start.
reporting fields (-o) directly from kernel:
writecache_total_blocks
writecache_free_blocks
writecache_writeback_blocks
writecache_error
The data_percent field shows used cache blocks / total cache blocks.
Until we resolve reshape for 'stacked' devices, we need to disable it.
So users can no longer reshape i.e. thin-pool data volumes, causing
ATM bad thin-pool problems.
After the VG lock is taken for vg_read, reread the mda_header
and compare the metadata text offset and checksum to what was
seen during label scan. If it is unchanged, then the metadata
has not changed since the label scan, and the metadata does not
need to be reread under the lock for command processing.
For commands that do not make changes (e.g. reporting), the
mda_header is reread and checked on one mda to decide if the
full metadata rereading can be skipped. For other commands
(e.g. modifying the vg) the mda_header is reread and checked
from all PVs. (These could probably just check one mda also.)
When pvcreate/pvremove prompt the user, they first release
the global lock, then acquire it again after the prompt,
to avoid blocking other commands while waiting for a user
response. This release/reacquire changes the locking
order with respect to the hints flock (and potentially other
locks). So, to avoid deadlock, use a nonblocking request
when reacquiring the global lock.
Avoid mem leaking hint on every loop continue and
allocate hint only when it's going to be added into list.
Switch to use 'dm_strncpy()' and validate sizes.
For dev_in_device_list() != 0 allocated 'devl' was
actually leaking - so instead allocate 'devl' only
when !dev_in_device_list() and indent code around.
Since we check for NULL pointers earlier we need
to be consistent across function - since the NULL
would applies across whole function.
When dropping 'mda' check - we are actually
already dereferencing it before - so it can't
be NULL at that places (and it's validated
before entering _read_mda_header_and_metadata).
dev_unset_last_byte() must be called while the fd is still valid.
After a write error, dev_unset_last_byte() must be called before
closing the dev and resetting the fd.
In the write error path, dev_unset_last_byte() was being called
after label_scan_invalidate() which meant that it would not unset
the last_byte values.
After a write error, dev_unset_last_byte() is now called in
dev_write_bytes() before label_scan_invalidate(), instead of by
the caller of dev_write_bytes().
In the common case of a successful write, the sequence is still:
dev_set_last_byte(); dev_write_bytes(); dev_unset_last_byte();
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
When resizing 2 volumes like thin-pool and it's metadata and they
would be of a different type - command would be actually expecting
both LVs being of a same segtype - and would throw an error in
case they are different.
This patch fixes is by setting a new segtype from last segment of
2nd. extented device.
Also it fixes the possible 'percentage' extension setup that
might have been used for 'primary' volume - while the 'secondary'
LV always goes with direct size - as we do not support 'percentage'
setup for them
This affects maily usage of thin-pool where the extension of
thin-pool data size may also lead to extension of metadata size.
Instead of checking all LVs in a VG - do just a direct copy of LVs
from the existing list ->segs_using_thin_lv.
TODO: maybe it could be better to expose seg_list to /tools...
Enhance lv_info with lv_info_with_name_check.
This 'variant' not only check existance if UUID in DM table
but also compares its DM name whether it's matching expected LV name.
Otherwise activation may 'skip' activation with rename in case the
DM UUID already exists, just device is different name.
This change make fairly easier manipulation with i.e. detached mirror
leg which ATM is using same UUID - just the LV name have been changed.
Used code was not able to run 'activation' (and do a rename) and just
skipped the call. So the code used to do a workaround and 'tried'
to deactivate such LV firts - this however work only in non-clvmd case,
as cluster was not having the lock for deactivated LV.
With this extended lv_info code will run 'activation' and will
synchronize the name to match expected LV name.
Patch extends _lv_info() with new paramter 'with_name_check',
which is later translated into 'name_check' argument for
_info_run() which in case of name mismatch evaluates the
check as if device does not exists.
Such call is only used in one place _lv_activate() which then
let activation run. All other invocation of _info() calls
are left intact.
TODO: fix mirror table manipulation (and raid)....
The return value from bcache_invalidate_fd() was not being checked.
So I've introduced a little function, _invalidate_fd() that always
calls bcache_abort_fd() if the write fails.
The resume of 'released' 'COW' should preceed the resume of origin.
The fact we need to do the sequence differently for merge was
cause by bugs fixed in 2 previous commits - so we no longer need
to recognize 'merging' and we should always go with single
sequence.
The importance of this order is - to properly remove '-real' device
from origin LV. When COW is activated as 2nd. '-real' device is
kept in table as it cannot be removed during 1st. resume of origin,
and later activation of COW LV no longer builds tree associated
with origin LV.
When checking device id of a thin device that is just being
merged - the snapshot actually could have been already finished
which means '-real' suffix for the LV is already gone and just LV
is there - so check explicitely for this condition and use
correct UUID for this case.
When a cachevol LV is attached, have the LV keep it's lock
allocated. The lock on the cachevol won't be used while
it's attached. When the cachevol is split a new lock does
not need to be allocated. (Applies to cachevol usage by
both dm-cache and dm-writecache.)
When LV gets cached and uses cache-pool - such cache-pool
will now get _cpool suffix automatically.
Thus 'Pool' column for cached LV will now show either _cvol
or _cpool LV.
Before 'archive()' is called, lvm2 must not touch/modify metadata.
So move setting CACHE_VOL related flags past this point.
Also make sure reading of cache segtype always restores this
flag properly (even if compatible flag would be lost).
Since code is using -cdata and -cmeta UUID suffixes, it does not need
any new 'extra' ID to be generated and stored in metadata.
Since introduce of new 'segtype' cache+CACHE_USES_CACHEVOL we can
safely assume 'new' cache with cachevol will now be created
without extra metadata_id and data_id in metadata.
For backward compatibility, code still reads them in case older
version of metadata have them - so it still should be able
to activate such volumes.
Bonus is lowered size of lv structure used to store info about LV
(noticable with big volume groups).
The first part of a cachevol LV is used for metadata,
and the rest of the space is used for data. The
division of space between metadata and data depends
on the total size of the cachevol.
The previous division gave more space than needed to
metadata, it was:
cachevol size 8M to 128M -> metadata size 16M *
cachevol size 128M to 1G -> metadata size 32M
cachevol size 1G and up -> metadata size 64M
(* if this resulted in over half the LV used as
metadata, then half the cachevol would be used
for metadata, and the other half for data.)
The division of space now gives less space to
metadata, it is:
cachevol size 8M to 16M -> metadata size 4M
cachevol size 16M to 4G -> metadata size 8M
cachevol size 4G to 16G -> metadata size 16M
cachevol size 16G to 32G -> metadata size 32M
cachevol size 32G and up -> metadata size 64M
When a VG contains some LVs with unknown segtypes, the user
should still be allowed to activate other LVs in the VG that
are understood.
$ lvs foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
LV VG Attr LSize
lvol0 foo -wi------- 4.00m
other foo vwi---u--- 48.00m
$ lvcreate -l1 foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
Cannot change VG foo with unknown segments in it!
Cannot process volume group foo
$ lvchange -ay foo/lvol0
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
$ lvchange -ay foo/other
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
Refusing activation of LV foo/other containing an unrecognised segment.
$ lvs foo
WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL.
WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL
LV VG Attr LSize
lvol0 foo -wi-a----- 4.00m
other foo vwi---u--- 48.00m
A cachevol LV had the CACHE_VOL status flag in metadata,
and the cache LV using it had no new flag. This caused
problems if the new metadata was used by an old version
of lvm. An old version of lvm would have two problems
processing the new metadata:
. The old lvm would return an error when reading the VG
metadata when it saw the unknown CACHE_VOL status flag.
. The old lvm would return an error when reading the VG
metadata because it would not find an expected cache pool
attached to the cache LV (since the cache LV had a
cachevol attached instead.)
Change the use of flags:
. Change the CACHE_VOL flag to be a COMPATIBLE flag (instead
of a STATUS flag) so that old versions will not fail when
they see it.
. When a cache LV is using a cachevol, the cache LV gets
a new SEGTYPE flag CACHE_USES_CACHEVOL. This flag is
appended to the segtype name, so that old lvm versions
will fail to use the LV because of an unknown segtype,
as opposed to failing to read the VG.
Enhance activation of cached devices using cachevol.
Correctly instatiace cachevol -cdata & -cmeta devices with
'-' in name (as they are only layered devices).
Code is also a bit more compacted (although still not ideal,
as the usage of extra UUIDs stored in metadata is troublesome
and will be repaired later).
NOTE: this patch my brink potentially minor incompatiblity for 'runtime' upgrade
Let vgck --updatemetadata repair cases where different mdas
hold indepedently valid but unmatching copies of the metadata,
i.e. different text metadata checksums or text metadata sizes.
vgck --updatemetadata would write the same correct
metadata to good mdas, and then to bad mdas, but the
sequence of vg_write/vg_commit calls betwen good and
bad mdas could cause a different description field to
be generated for good/bad mdas. (The description field
describing the command was recently included in the
ondisk copy of the metadata text.)
If a VG is forcibly changed from lock_type sanlock to
lock_type none, the internal lvmlock LV is left behind.
If that LV is not removed before vgremove is run on the
VG, then an internal check will be triggered by the
hidden lvmlock LV. So, check for and remove a left over
lvmlock LV during vgremove.
Add lots of vdo fields:
vdo_operating_mode - For vdo pools, its current operating mode.
vdo_compression_state - For vdo pools, whether compression is running.
vdo_index_state - For vdo pools, state of index for deduplication.
vdo_used_size - For vdo pools, currently used space.
vdo_saving_percent - For vdo pools, percentage of saved space.
vdo_compression - Set for compressed LV (vdopool).
vdo_deduplication - Set for deduplicated LV (vdopool).
vdo_use_metadata_hints - Use REQ_SYNC for writes (vdopool).
vdo_minimum_io_size - Minimum acceptable IO size (vdopool).
vdo_block_map_cache_size - Allocated caching size (vdopool).
vdo_block_map_era_length - Speed of cache writes (vdopool).
vdo_use_sparse_index - Sparse indexing (vdopool).
vdo_index_memory_size - Allocated indexing memory (vdopool).
vdo_slab_size - Increment size for growing (vdopool).
vdo_ack_threads - Acknowledging threads (vdopool).
vdo_bio_threads - IO submitting threads (vdopool).
vdo_bio_rotation - IO enqueue (vdopool).
vdo_cpu_threads - CPU threads for compression and hashing (vdopool).
vdo_hash_zone_threads - Threads for subdivide parts (vdopool).
vdo_logical_threads - Logical threads for subdivide parts (vdopool).
vdo_physical_threads - Physical threads for subdivide parts (vdopool).
vdo_max_discard - Maximum discard size volume can recieve (vdopool).
vdo_write_policy - Specified write policy (vdopool).
vdo_header_size - Header size at front of vdopool.
Previously only 'lvdisplay -m' was exposing them.
Since we now support activation of 'vdo' volume
without explicit activation of 'vdopool' it's now possible
to have active layer vdopool (-vpool) volume and
having vdopool itself inactive - yet still in this
case we can show available stats for this volume.
But we need to show correct activation status and other
standard info.
Avoid checking 'lv_is_active()' since special LV types does this
validation anyway what calling _percent() function and call it
ONLY when none of special types is queried.
This restores support for VDO resize (as with support for
separate VDO pool activation, plain query for lv_is_active()
is not working in this case).
If the linear mapping is lost (for whatever reason, i.e.
test suite forcible 'dmsetup remove' linear LV,
lvm2 had hard times figuring out how to deactivate such DM table.
So add function which is in case inactive VDO pool LV checks if
the pool is actually still active (-vpool device present) and
it has open count == 0. In this case deactivation is allowed
to continue and cleanup DM table.