IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since we declare dev_name in lib/device/device.h
and pvs in commands.h
rename local dev_name to device_name
and pvs to pvs_list to prevent shadowing warning.
m
Switch remaining zero sized struct to flexible arrays to be C99
complient.
These simple rules should apply:
- The incomplete array type must be the last element within the structure.
- There cannot be an array of structures that contain a flexible array member.
- Structures that contain a flexible array member cannot be used as a member of another structure.
- The structure must contain at least one named member in addition to the flexible array member.
Although some of the code pieces should be still improved.
Allow the optional '--type raid1' to be included in the lvconvert
command when adding or removing raid images with integrity.
It does not change the meaning of the command (specifying a type
that matches the current type is redundant but generally allowed.)
When converting volume to pool LV use also wiping of other signatures.
For writecache & pool conversion support --yet and --force
to bypass prompting for signature wiping.
For writecache drop unneded zero_sectors.
Note: currently we have lvconvert doing convertion and prompting
for confirmation of conversion - and then again wipe_lv() prompts
for removing i.e. filesystem signature - we should unify this
prompting into 1 message - althought the 'filesystem' discovery
needs active volume - while the 1st. conversion prompt can
work without active converted volume.
To create a new cache or writecache LV with a single command:
lvcreate --type cache|writecache
-n Name -L Size --cachedevice PVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, a new cachevol LV is created internally, using PVfast
specified by the cachedevice option.
- Then, the cachevol is attached to the main LV, converting the
main LV to type cache|writecache.
Include --cachesize Size to specify the size of cache|writecache
to create from the specified --cachedevice PVs, otherwise the
entire cachedevice PV is used. The --cachedevice option can be
repeated to create the cache from multiple devices, or the
cachedevice option can contain a tag name specifying a set of PVs
to allocate the cache from.
To create a new cache or writecache LV with a single command
using an existing cachevol LV:
lvcreate --type cache|writecache
-n Name -L Size --cachevol LVfast VG [PVslow ...]
- A new main linear|striped LV is created as usual, using the
specified -n Name and -L Size, and using the optionally
specified PVslow devices.
- Then, the cachevol LVfast is attached to the main LV, converting
the main LV to type cache|writecache.
In cases where more advanced types (for the main LV or cachevol LV)
are needed, they should be created independently and then combined
with lvconvert.
Example
-------
user creates a new VG with one slow device and one fast device:
$ vgcreate vg /dev/slow1 /dev/fast1
user creates a new 8G main LV on /dev/slow1 that uses all of
/dev/fast1 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1
-n main -L 8G vg /dev/slow1
Example
-------
user creates a new VG with two slow devs and two fast devs:
$ vgcreate vg /dev/slow1 /dev/slow2 /dev/fast1 /dev/fast2
user creates a new 8G main LV on /dev/slow1 and /dev/slow2
that uses all of /dev/fast1 and /dev/fast2 as a writecache:
$ lvcreate --type writecache --cachedevice /dev/fast1 --cachedevice /dev/fast2
-n main -L 8G vg /dev/slow1 /dev/slow2
Example
-------
A user has several slow devices and several fast devices in their VG,
the slow devs have tag @slow, the fast devs have tag @fast.
user creates a new 8G main LV on the slow devs with a
2G writecache on the fast devs:
$ lvcreate --type writecache -n main -L 8G
--cachedevice @fast --cachesize 2G vg @slow
To add a cache or writecache to a main LV with a single command:
lvconvert --type cache|writecache --cachedevice /dev/ssd vg/main
A cachevol LV will be allocated from the specified cache device,
then attached to the main LV. Include --cachesize to specify the
size of cachevol to create, otherwise the entire cachedevice is
used. The cachedevice option can be repeated to create a cachevol
from multiple devices.
Example
-------
A user has an existing main LV that they want to speed up
using a new ssd.
user adds the new ssd to the VG:
$ vgextend vg /dev/ssd
user attaches the new ssd their main LV:
$ lvconvert --type writecache --cachedevice /dev/ssd vg/main
Example
-------
A user has two existing main LVs that they want to speed up
with a new ssd.
user adds the new 16G ssd to the VG:
$ vgextend vg /dev/ssd
user attaches some of the new ssd to the first main LV,
using half of the space:
$ lvconvert --type writecache --cachedevice /dev/ssd
--cachesize 8G vg/main1
user attaches some of the new ssd to the second main LV,
using the other half of the space:
$ lvconvert --type writecache --cachedevice /dev/ssd
--cachesize 8G vg/main2
Example
-------
A user has an existing main LV that they want to speed up using
two new ssds.
user adds the new two ssds the VG:
$ vgextend vg /dev/ssd1
$ vgextend vg /dev/ssd2
user attaches both ssds their main LV:
$ lvconvert --type writecache
--cachedevice /dev/ssd1 --cachedevice /dev/ssd2 vg/main
Use libblkid to detect sector/block size of the fs on the LV.
Use this to choose a compatible writecache block size.
Enable attaching writecache to an active LV.
When lvconvert is used to remove raid images, we can
skip calling lv_add_integrity_to_raid(), which finds
nothing to do, but the the blocksize validation would
be called unnecessarily and trigger spurious errors.
pvck --dump headers reads the metadata text area
to compute the text metadata checksum to compare
with the mda_header checksum.
The new header_only will skip reading the metadata
text and not validate the mda_header checksum.
dm-integrity stores checksums of the data written to an
LV, and returns an error if data read from the LV does
not match the previously saved checksum. When used on
raid images, dm-raid will correct the error by reading
the block from another image, and the device user sees
no error. The integrity metadata (checksums) are stored
on an internal LV allocated by lvm for each linear image.
The internal LV is allocated on the same PV as the image.
Create a raid LV with an integrity layer over each
raid image (for raid levels 1,4,5,6,10):
lvcreate --type raidN --raidintegrity y [options]
Add an integrity layer to images of an existing raid LV:
lvconvert --raidintegrity y LV
Remove the integrity layer from images of a raid LV:
lvconvert --raidintegrity n LV
Settings
Use --raidintegritymode journal|bitmap (journal is default)
to configure the method used by dm-integrity to ensure
crash consistency.
Initialization
When integrity is added to an LV, the kernel needs to
initialize the integrity metadata/checksums for all blocks
in the LV. The data corruption checking performed by
dm-integrity will only operate on areas of the LV that
are already initialized. The progress of integrity
initialization is reported by the "syncpercent" LV
reporting field (and under the Cpy%Sync lvs column.)
Example: create a raid1 LV with integrity:
$ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo
Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB.
Logical volume "rr_rimage_0_imeta" created.
Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB.
Logical volume "rr_rimage_1_imeta" created.
Logical volume "rr" created.
$ lvs -a foo
LV VG Attr LSize Origin Cpy%Sync
rr foo rwi-a-r--- 1.00g 4.93
[rr_rimage_0] foo gwi-aor--- 1.00g [rr_rimage_0_iorig] 41.02
[rr_rimage_0_imeta] foo ewi-ao---- 12.00m
[rr_rimage_0_iorig] foo -wi-ao---- 1.00g
[rr_rimage_1] foo gwi-aor--- 1.00g [rr_rimage_1_iorig] 39.45
[rr_rimage_1_imeta] foo ewi-ao---- 12.00m
[rr_rimage_1_iorig] foo -wi-ao---- 1.00g
[rr_rmeta_0] foo ewi-aor--- 4.00m
[rr_rmeta_1] foo ewi-aor--- 4.00m
lvm2 supports thin-pool to be later used by other tools doing
virtual volumes themself (i.e. docker) - in this case we
shall not validate transaction Id - is this is used by
other tools and lvm2 keeps value 0 - so the transationId
validation need to be skipped in this case.
Prevent attaching writecache to an active LV until
we can determine the block size of the fs on the LV,
and use that to enforce an appropriate writecache
block size. Changing the block size under a mounted
fs can cause panic/corruption.
Currently the error messages are not clear. This very easy to
guide user to execute "--removemissing --force", it is dangerous
and will make the LVs to be destroied.
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Since VDO is also pool, the old if() case missed to know about this,
and executed unnecesserily initialization of cache pool variables.
This was usually harmless when using 'smaller' sizes of VDO pools,
but for big VDO pool size, we were reporting senseless messages
about big cache chunk sizes.
To write a new/repaired pv_header and label_header:
pvck --repairtype pv_header --file <file> <device>
This uses the metadata input file to find the PV UUID,
device size, and data offset.
To write new/repaired metadata text and mda_header:
pvck --repairtype metadata --file <file> <device>
This requires a good pv_header which points to one or two
metadata areas. Any metadata areas referenced by the
pv_header are updated with the specified metadata and
a new mda_header. "--settings mda_num=1|2" can be used
to select one mda to repair.
To combine all header and metadata repairs:
pvck --repair --file <file> <device>
It's best to use a raw metadata file as input, that was
extracted from another PV in the same VG (or from another
metadata area on the same PV.) pvck will also accept a
metadata backup file, but that will produce metadata that
is not identical to other metadata copies on other PVs
and other areas. So, when using a backup file, consider
using it to update metadata on all PVs/areas.
To get a raw metadata file to use for the repair, see
pvck --dump metadata|metadata_search.
List all instances of metadata from the metadata area:
pvck --dump metadata_search <device>
Save one instance of metadata at the given offset to
the specified file (this file can be used for repair):
pvck --dump metadata_search --file <file>
--settings "metadata_offset=<off>" <device>
using --settings:
mda_offset=<offset> mda_size=<size> can be used
in place of the offset/size that normally come
from headers.
metadata_offset=<offset> prints/saves one instance
of metadata text at the given offset, in
metadata_all or metadata_search.
This reverts commit 7474440d3b.
lvs can use the scanning optimization again since it has
been changed in:
"scanning: optimize by checking text offset and checksum"
The kernel MD runtime requires region size to be larger than stripe size
on striped raid layouts, thus the dm-raid target's constructor rejects
such request.
This causes e.g. an 'lvcreate --type raid10 -i3 -I4096 -R2048 -n lv vg' to fail.
Avoid failing late in the kernel by enforcing region size to be
larger or equal to stripe size.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1698225
When pvcreate/pvremove prompt the user, they first release
the global lock, then acquire it again after the prompt,
to avoid blocking other commands while waiting for a user
response. This release/reacquire changes the locking
order with respect to the hints flock (and potentially other
locks). So, to avoid deadlock, use a nonblocking request
when reacquiring the global lock.
The scanning optimization can produce warnings from
'lvs' when run concurrently with commands modifying LVs,
so disable the optimization until it can be improved.
Without the scanning optimization, lvs will always
read all PVs twice:
1. read metadata from all PVs, saving it in memory
2. for each VG
3. lock VG
4. reread metadata from all PVs in VG, replacing metadata
saved from step 1
5. run command on VG
6. unlock VG
The optimization would usually cause step 4 to be skipped,
and PVs would be read only once.
Running the command in step 5 using metadata that was not
read under the VG lock is usually fine, except for the
fact that lvs attempts to validate the metadata by comparing
it to current dm state. If other commands are modifying dm
state while lvs is running, lvs may see differences between
metadata from step 1 and dm state checked during step 5,
and print warnings.
(A better fix may be to detect the concurrent change and
fall back to rereading metadata in step 4 only when needed.)
Since we check for NULL pointers earlier we need
to be consistent across function - since the NULL
would applies across whole function.
When dropping 'mda' check - we are actually
already dereferencing it before - so it can't
be NULL at that places (and it's validated
before entering _read_mda_header_and_metadata).
When a cachevol LV is attached, have the LV keep it's lock
allocated. The lock on the cachevol won't be used while
it's attached. When the cachevol is split a new lock does
not need to be allocated. (Applies to cachevol usage by
both dm-cache and dm-writecache.)
When a user includes "--type foo" in a command, only
look at command definitions with matching type, as
opposed to using matching/mismatching --type as a
vote for/against a given command def. This means a
command with --type foo will prioritize a command def
with --type foo over other command defs that have
more matching options but an unmatching type. This
makes it more likely that a closely matching command
def will be recommended.
When LV gets cached and uses cache-pool - such cache-pool
will now get _cpool suffix automatically.
Thus 'Pool' column for cached LV will now show either _cvol
or _cpool LV.
Improve the implementation of extracting all text metadata
copies from the metadata area. Use this for the existing
metadata_all dump option.
Add a new metadata_search dump option which does not use
lvm headers to find metadata, but looks in standard
locations. This is useful if headers are damaged and
can't be used to locate metadata.
Adding '-v' to metadata_all or metadata_search will add
the description and creation_time to the printed list of
metadata instances that are found.
Before 'archive()' is called, lvm2 must not touch/modify metadata.
So move setting CACHE_VOL related flags past this point.
Also make sure reading of cache segtype always restores this
flag properly (even if compatible flag would be lost).
When an LV is used as a writecache cachevol, give
it the LV name a _cvol suffix. Remove the suffix
when the cachevol is detached, restoring the
original LV name.
A cachevol LV had the CACHE_VOL status flag in metadata,
and the cache LV using it had no new flag. This caused
problems if the new metadata was used by an old version
of lvm. An old version of lvm would have two problems
processing the new metadata:
. The old lvm would return an error when reading the VG
metadata when it saw the unknown CACHE_VOL status flag.
. The old lvm would return an error when reading the VG
metadata because it would not find an expected cache pool
attached to the cache LV (since the cache LV had a
cachevol attached instead.)
Change the use of flags:
. Change the CACHE_VOL flag to be a COMPATIBLE flag (instead
of a STATUS flag) so that old versions will not fail when
they see it.
. When a cache LV is using a cachevol, the cache LV gets
a new SEGTYPE flag CACHE_USES_CACHEVOL. This flag is
appended to the segtype name, so that old lvm versions
will fail to use the LV because of an unknown segtype,
as opposed to failing to read the VG.
Instead of using 'noflush' option, switch cache_mode into WRITETHROUGH
which does not require flushing, when user confirmed he does not
want flushing for WRITEBACK (because of (partially) missing caching PV)
For wiping we activate and clear 'regular' devices,
since in case of whole process interuption (i.e. kill -9)
we leave metadata & DM table and workable state all the time.
vgck --updatemetadata would write the same correct
metadata to good mdas, and then to bad mdas, but the
sequence of vg_write/vg_commit calls betwen good and
bad mdas could cause a different description field to
be generated for good/bad mdas. (The description field
describing the command was recently included in the
ondisk copy of the metadata text.)
When the PV device names in the VG metadata do not match the
current PV device names seen on the system, do not use the
optimized activation function (that avoids extra device scanning.)
When the device names do not match, it's a clue that there could
be duplicate PVs, in which case we want to scan all devicess to
find any duplicates and stop the activation if found.
This does not prevent autoactivating a VG from the incorrect
duplicate PV, because the incorrect duplicate may appear by itself
first. At that point its duplicate PV does not exist to be seen.
(A future enhancement could use the WWID to strengthen this
detection.)
- use internal CACHE_VOL flag on cachevol LV
- add suffixes to dm uuids for internal LVs
- display appropriate letters in the LV attr field
- display writecache's cachevol in lvs output
. For dm-cache in writethrough, always allow splitcache,
whether the cache is missing PVs or not.
. For dm-cache in writeback, if the cache is missing PVs,
allow splitcache with force and yes.
. For dm-writecache, if the cache is missing PVs,
allow splitcache with force and yes.
Enhance 'activation' experience for VDO pool to more closely match
what happens for thin-pools where we do use a 'fake' LV to keep pool
running even when no thinLVs are active. This gives user a choice
whether he want to keep thin-pool running (wihout possibly lenghty
activation/deactivation process)
As we do plan to support multple VDO LVs to be mapped into a single VDO,
we want to give user same experience and 'use-patter' as with thin-pools.
This patch gives option to activate VDO pool only without activating
VDO LV.
Also due to 'fake' layering LV we can protect usage of VDO pool from
command like 'mkfs' which do require exlusive access to the volume,
which is no longer possible.
Note: VDO pool contains 1024 initial sectors as 'empty' header - such
header is also exposed in layered LV (as read-only LV).
For blkid we are indentified as LV with UUID suffix - thus private DM
device of lvm2 - so we do not need to store any extra info in this
header space (aka zero is good enough).
When pvscan is used to activate a VG via an
asynchronous service (i.e. lvm2-pvscan), there
is no requirement that the command wait for
udev to create device nodes before returning.
It's possible that waiting for udev is slow
enough to cause the service running the command
to time out. So, allow the --noudevsync option
to be given to pvscan to skip waiting for udev.
(This commit is not changing the lvm2-pvscan
service itself to use --noudevsync.)
Still unknown is whether there are any complex
LV activation cases in which lvm itself requires
access to a device node, in which case the udev
wait could be needed by lvm itself.
(When running an activation command directly
from the command line, it's generally expected
that the activated LVs are ready to use when
the command is finished, so lvm waits for
udev to finish creating the dev nodes.)
When an online PV completed a VG, the standard
activation functions were used to activate the VG.
These functions use a full scan of all devs.
When many pvscans are run during startup and need
to activate many VGs, scanning all devs from all
the pvscans can take a long time.
Optimize VG activation in pvscan to scan only the
devs in the VG being activated. This makes use of
the online file info that was used to determine
the VG was complete.
The downside of this approach is that pvscan activation
will not detect duplicate PVs and block activation,
where a normal activation command (which scans all
devices) would.
New udev in rawhide seems to be 'dropping' udev rule operations for devices
that are no longer existing - while this is 'probably' a bug - it's
revealing moments in lvm2 that likely should not run in a single
transaction and we should wait for a cookie before submitting more work.
TODO: it seem more 'error' paths should always include synchronization
before starting deactivating 'just activated' devices.
We should probably figure out some 'automatic' solution for this instead
of placing sync_local_dev_name() all over the place...
Between 'resume' and 'remove' we need to wait for udev to synchronize,
otherwise udev may 'skip' resume event processing if the udev node
is already gone.
Usually md components are eliminated in label scan and/or
duplicate resolution, but they could sometimes get into
the vg_read stage, where set_pv_devices compares the
device to the PV.
If set_pv_devices runs an md component check and finds
one, vg_read should eliminate the components.
In set_pv_devices, run an md component check always
if the PV is smaller than the device (this is not
very common.) If the PV is larger than the device,
(more common), do the component check when the config
setting is "auto" (the default).
Avoid having PVs with different logical block sizes in the same VG.
This prevents LVs from having mixed block sizes, which can produce
file system errors.
The new config setting devices/allow_mixed_block_sizes (default 0)
can be changed to 1 to return to the unrestricted mode.
An active md device with an end superblock causes lvm to
enable full md component detection. This was being done
within the filter loop instead of before, so the full
filtering of some devs could be missed.
Also incorporate the recently added config setting that
controls the md component detection.
Fix commit 7836e7aa1c
"pvscan: ignore device with incorrect size"
which caused pvscan to not consider a PV online (for purposes
of event based activation) if the PV and device sizes differed.
This helped to avoid mistaking MD components for PVs, and is
replaced by triggering an md component check when PV and device
sizes differ (which happens in set_pv_device).
This allows the creation of a striped mirror leg(s) during upconvert
by adding lvconvert command line options --stripes/--stripesize
for 'mirror' to tools/command-lines.in.
In case multiple mirror legs are being added, all will have the
same requested striped layout.
Resolves: rhbz1720705
The cache repair utility does not yet work with a cachevol
(where metadata and data exist on the same LV.) So, warn
and prompt if writeback is specified with a cachevol.
The exported VG checking/enforcement was scattered and
inconsistent. This centralizes it and makes it consistent,
following the existing approach for foreign and shared
VGs/PVs, which are very similar to exported VGs/PVs.
The access policy that now applies to foreign/shared/exported
VGs/PVs, is that if a foreign/shared/exported VG/PV is named
on the command line (i.e. explicitly requested by the user),
and the command is not permitted to operate on it because it
is foreign/shared/exported, then an access error is reported
and the command exits with an error. But, if the command is
processing all VGs/PVs, and happens to come across a
foreign/shared/exported VG/PV (that is not explicitly named on
the command line), then the command silently skips it and does
not produce an error.
A command using tags or --select handles inaccessible VGs/PVs
the same way as a command processing all VGs/PVs, and will
not report/return errors if these inaccessible VGs/PVs exist.
The new policy fixes the exit codes on a somewhat random set of
commands that previously exited with an error if they were
looking at all VGs/PVs and an exported VG existed on the system.
There should be no change to which commands are allowed/disallowed
on exported VGs/PVs.
Certain LV commands (lvs/lvdisplay/lvscan) would previously not
display LVs from an exported VG (for unknown reasons). This has
not changed. The lvm fullreport command would previously report
info about an exported VG but not about the LVs in it. This
has changed to include all info from the exported VG.
When vg_read rescans devices with the intention of
writing the VG, the label rescan can open the devs
RW so they do not need to be closed and reopened
RW in dev_write_bytes.
When monitoring, skip exported VGs without causing a command
failure.
The lvm2-monitor service runs 'vgchange --monitor y', so
any exported VG on the system would cause the service to
fail.
The man page generation for pvchange/lvchange/vgchange was
incorrect (leaving out some option listings) as a result of
commit e225bf5 "fix command definition for pvchange -a"
The -a was being included in the set of "one or more"
options instead of an actual required option. Even
though the cmd def was not implementing the restrictions
correctly, the command internally was.
Adjust the cmd def code which did not support a command
with some real required options and a set of "one or more"
options.
The way that this command now uses the global lock
followed by a label scan, it can simply check if the
new VG name exists, and if not lock it and create it.