IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The pv resize code required that a lvm_vg_write be done
to commit the change. When the method to add the ability
to list all PVs, including ones that are not assocated with
a VG we had no way for the user to make the change persistent.
Thus additional resize code was move and now liblvm calls into
a resize function that does indeed write the changes out, thus
not requiring the user to explicitly write out he changes.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
As locks are held, you need to call the included function
to release the memory and locks when done transversing the
list of physical volumes.
V2: Rebase fix
V3: Prevent VGs from getting cached and then write protected.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Simplified version of lv resize.
v3: Rebase changes to make work. Needed to set sizeargs = 1
to indicate to resize that we are asking for a size based
resize.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
If "vgcreate/lvcreate --profile <profile_name>" is used, the profile
name is automatically stored in metadata for making it possible to
load it automatically next time the VG/LV is used.
Do not keep multiple archives for the executed command.
Reuse the ALLOCATABLE_PV from pv status for
ARCHIVED_VG vg status. Mark VG with the bit with the
first archivation.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
The pv_by_path might be also dangerous to use as it does not
count with any other metadata areas but the ones found on the PV
itself. If metadata was not found on the PV referenced by the path,
it returned no PV though it might have been referenced by metadata
elsewhere (on other PVs...).
Before, the find_pv_by_name call always failed if the PV found was orphan.
However, we might use this function even for a PV that is not part of any VG.
This patch adds 'allow_orphan' arg to find_pv_by_name fn that allows that.
Keep the flag whether given thin pool argument has been given on command
line or it's been 'estimated'
Call of update_pool_params() must not change cmdline given args and
needs to know this info.
Since there is a need to move this update function into /lib, we cannot
use arg_count().
FIXME: we need some generic mechanism here.
The PV header extension information (PV header extension version, flags
and list of Embedding Area locations) is stored just beyond the PV header base.
When calculating the Embedding Area start value (ea_start), the same logic is
used as when calculating the pe_start value for Data Area - the value must
follow exactly the same alignment restrictions for its start value
(the alignment detected automatically or provided via command line using
the --dataalignment and --dataalignmentoffset arguments).
The Embedding Area is placed at the very start of the PV, starting at
ea_start. The Data Area starting at pe_start is placed next. The pe_start is
still properly aligned. Due to the pe_start alignment, it's possible that the
resulting Embedding Area size (ea_size) ends up bigger in size than requested
(but never less than requested).
PV header extension comes just beyond the existing PV header base:
PV header base (existing):
- uuid
- device size
- null-terminated list of Data Areas
- null-terminater list of MetaData Areas
PV header extension:
- extension version
- flags
- null-terminated list of Embedding Areas
This patch also adds "eas" (Embedding Areas) list to lvmcache (lvmcache_info)
and it also adds support for common operations on the list (just like for
already existing "das" - Data Areas list):
- lvmcache_add_ea
- lvmcache_update_eas
- lvmcache_foreach_ea
- lvmcache_del_eas
Also, add ea_start and ea_size to struct physical_volume for processing
PV Embedding Area location throughout the code (currently only one
Embedding Area is supported, though the definition on disk allows for
more if needed in the future...).
Also, define FMT_EAS format flag to mark that the format actually
supports Embedding Areas (currently format-text only).
Extract restorable PV creation parameters from struct pvcreate_params into
a separate struct pvcreate_restorable_params for clarity and also for better
maintainability when adding any new items later.
Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.
Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.
Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes. Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting. We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
Target tells us its version, and we may allow different set of options
to be supported with different version of driver.
Idea is to provide individual feature flags and later be
able to query for them.
The 'copy_percent' function takes the 'extents_copied' field from each
segment in an LV to create the numerator for the ratio that is to
become the copy_percent. (Otherwise known as the 'sync' percent for
non-pvmove uses, like mirror LVs and RAID LVs.) This function safely
works on RAID - not just mirrors - so it is better to have it in
lv_manip.c rather than mirror.c.
There's a lot of different functions that do a lot of different things
in lv_manip.c, so I placed the function near a function in lv_manip.c
that it was close to in metadata-exported.h. Different placement in the
file or a different name for the function may be useful.
Add arg support for discard.
Add discard ignore, nopassdown, passdown (=default) support.
Flags could be set per pool.
lvcreate [--discard {ignore|no_passdown|passdown}] vg/thinlv
Define an 'activation_handler' that gets called automatically on
PV appearance/disappearance while processing the lvmetad_pv_found
and lvmetad_pv_gone functions that are supposed to update the
lvmetad state based on PV availability state. For now, the actual
support is for PV appearance only, leaving room for PV disappearance
support as well (which is a more complex problem to solve as this
needs to count with possible device stack).
Add a new activation change mode - CHANGE_AAY exposed as
'--activate ay/-aay' argument ('activate automatically').
Factor out the vgchange activation functionality for use in other
tools (like pvscan...).
We're refererring to 'activation' all over the code and we're talking
about 'LVs being activated' all the time so let's use 'activation/activate'
everywhere for clarity and consistency (still providing the old
'available' keyword as a synonym for backward compatibility with
existing environments).
Read lvm.conf setting for monitoring for each command. So we should not
activate monitoring if the default compilation is set to monitor during
lvconvert commnads.
Patch also removes check for clustered VG and allows to disable monitoring
for clustered VG with the assumption, the problem with monitoring and dmeventd
flag passing for INGNORE is already fixed.
RAID is not like traditional LVM mirroring. LVM mirroring required failed
devices to be removed or the logical volume would simply hang. RAID arrays can
keep on running with failed devices. In fact, for RAID types other than RAID1,
removing a device would mean substituting an error target or converting to a
lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to RAID0). Therefore, rather
than removing a failed device unconditionally and potentially allocating a
replacement, RAID allows the user to "replace" a device with a new one. This
approach is a 1-step solution vs the current 2-step solution.
example> lvconvert --replace <dev_to_remove> vg/lv [possible_replacement_PVs]
'--replace' can be specified more than once.
example> lvconvert --replace /dev/sdb1 --replace /dev/sdc1 vg/lv
To ensure we properly handle LV cluster locking - explicitely do
not allow to change the availability of the thin pool that is in use
for some thin LV.
As soon as the thin volume is created the only way to activate pool
is via implicit dependency.
Ignore thinpool open count for lv/vgchange operations.
lvm part of messaging.
Each message is now stored it's own thin pool section:
message1 {
create = lv
}
Messages are queued to thin pool dm target when this target
is going to be resumed or used through some dependency.
Currently 'delete' message are purely queued and processed
with next thin pool resume operation (i.e. create_thin).
WARNING - thin provisioning support is developmental code.
Example:
~> lvconvert --type raid1 vg/mirror_lv
Steps to convert "mirror" to "raid1"
1) Allocate a RAID metadata LV for each mirror image from the same PVs
on which they are located.
2) Clear the metadata LVs. This involves writing LVM metadata, so we don't
change any aspects of the mirror LV before this so that the user can easily
remove LVs from the failed convert attempt while retaining the original
mirror.
3) Remove the mirror log, if it exists.
4) Add metadata LVs to mirror LV
5) Rename mirror sub-lvs (s/mimage/rimage/)
6) Change flags and segtype from mirror to raid1
Revert John patch, which fixed only 1 place where ~LVM_WRITE was in use and
convert ommited LVM_READ/WRITE flags to 64bit constants as well.
(Since both 'status' flags for LV and VG are 64bit.)
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Users already have the ability to split an image from an LV of "mirror"
segtype. This patch extends that ability to LVs of "raid1" segtype.
This patch only allows a single image to be split off, however. (The
"mirror" segtype allows an arbitrary number of images to be split off.
e.g. 4-way => 3-way/linear, 2-way/2-way, linear,3-way)
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
Implementation described in doc/lvm2-raid.txt.
Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
are affected by the move. (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.
The current fix was selected so as to minimise the testing surface. I
hope eventually to replace it with a cleaner one that extends the
deptree code.
Some lvconvert scenarios still suffer from related problems.
allows us to allocate all images of a mirror (or RAID array) at one
time during create.
The current mirror implementation still requires a separate allocation
for the log, however.
Since format instances will use own memory pool, it's necessary to properly
deallocate it. For now, only fid is deallocated. The PV structure itself
still uses cmd mempool mostly, but anytime we'd like to add a mempool
in the struct physical_volume, we can just rename this fn to free_pv and
add the code (like we have free_vg fn for VGs).
Format instances can be created anytime on demand and it contains
metadata area information mostly (at least for now, but in the future,
we may store more things here to update/edit in a PV/VG). In case we
have lots of metadata areas, memory consumption will rise. Using cmd
context mempool is not quite optimal here because it is destroyed too
late. So let's use a separate mempool for format instances.
Reference counting is used because fids could be shared, e.g. each PV
has either a PV-based fid or VG-based fid. If it's VG-based, each PV has
a shared fid with the VG - a reference to VG's fid.
We allow writing non-orphan PVs only for resize now. The "orphan PV" assert
in pv_write fn uses the "allow_non_orphan" parameter to control this assert.
However, we should find a more elaborate solution so we can remove this
restriction altogether (pv_write together with vg_write is not atomic, we
need to find a safe mechanism so there's an easy revert possible in case of
an error).
Fixing some const warnings - with API change in:
int vg_extend(struct volume_group *vg, int pv_count, const char *const *pv_names,
Change is needed - as lvm2api expects const behaviour here.
So vg_extend() is doing local strdup for unescaping.
skip_dev_dir return const char* from const char* vg_name.
Rest of the patch is cleanup of related warnings.
Also using dm_report_filed_string() API change to simplify
casting in _string_disp and _lvname_disp.
Add configurable option to define minimal size of
of block device usable as a PV.
pv_min_size() is added to lvm-globals and it's being
initialized through _process_config.
Macro PV_MIN_SIZE is unused and removed.
New define DEFAULT_PV_MIN_SIZE_KB is added to lvm-global
and unlike PV_MIN_SIZE it uses KB units.
Should help users with various slow devices attached to the system,
which cannot be easily filtered out (like FDD on /dev/sdX):
https://bugzilla.redhat.com/show_bug.cgi?id=644578