IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Normally, the lvm dumpconfig processes only the configuration tree
that is at the top of the cascade. Considering the cascade is:
CONFIG_STRING -> CONFIG_PROFILE -> CONFIG_MERGED_FILES/CONFIG_FILE
...then:
(dumpconfig of lvm.conf only)
raw/~ $ lvm dumpconfig allocation
allocation {
maximise_cling=1
mirror_logs_require_separate_pvs=0
thin_pool_metadata_require_separate_pvs=0
thin_pool_chunk_size=64
}
(dumpconfig of selected profile configuration only)
raw/~ $ lvm dumpconfig --profile test allocation
allocation {
thin_pool_chunk_size=8
thin_pool_discards="passdown"
thin_pool_zero=1
}
(dumpconfig of given --config configuration only)
raw/~ $ lvm dumpconfig --config 'allocation{thin_pool_chunk_size=16}' allocation
allocation {
thin_pool_chunk_size=16
}
The --mergedconfig option causes the configuration cascade to be
merged before processing it with dumpconfig:
(dumpconfig of merged selected profile and lvm.conf)
raw/~ $ lvm dumpconfig --profile test allocation --mergedconfig
allocation {
maximise_cling=1
thin_pool_zero=1
thin_pool_discards="passdown"
mirror_logs_require_separate_pvs=0
thin_pool_metadata_require_separate_pvs=0
thin_pool_chunk_size=8
}
(dumpconfig merged given --config and selected profile and lvm.conf)
raw/~ $ lvm dumpconfig --profile test --config 'allocation{thin_pool_chunk_size=16}' allocation --mergedconfig
allocation {
maximise_cling=1
thin_pool_zero=1
thin_pool_discards="passdown"
mirror_logs_require_separate_pvs=0
thin_pool_metadata_require_separate_pvs=0
thin_pool_chunk_size=16
}
Hence with the --mergedconfig, we are able to see the
configuration that is actually used when processing any
LVM command while using any combination of --config/--profile
options together with lvm.conf file.
When CFG_DEF_TREE_MISSING is created, it needs to know the status
of the check done on the tree used (the CFG_USED flag).
This bug was introduced with f1c292cc38
"make it possible to run several instances of configuration check at
once". This patch separated the CFG_USED and CFG_VALID flags in
a separate 'status' field in struct cft_check_handle.
However, when creating some trees, like CFG_DEF_TREE_MISSING,
we need this status to do a comparison with full config definition
to determine which items are missing and for which default values
were used. Otherwise, all items would be considered missing.
So, pass this status in a new field called 'check_status' in
struct config_def_tree_spec that defines how the (dumpconfig) tree
should be constructed (and this struct is passed to
config_def_create_tree fn then).
Start separating the validation from the action in the basic lvresize
code moved to the library.
Remove incorrect use of command line error codes from lvresize library
functions. Move errors.h to tools directory to reinforce this,
exporting public versions of the error codes in lvm2cmd.h for dmeventd
plugins to use.
Condition needs to check for passed in pool_metadata_lv_name
which needs to be renamed to _tmeta, for !pool_metadata_lv_name
it's already created with correct _tmeta name.
Fix and improve handling on sigint.
Always check for signal presence *before* calling of command,
so it will not call the command when break was hit.
If the command has been finished succesfully there is
no problem to mark the command ok and not report interrupt at all.
Fix cuple related stack; reports and assignments.
The pv resize code required that a lvm_vg_write be done
to commit the change. When the method to add the ability
to list all PVs, including ones that are not assocated with
a VG we had no way for the user to make the change persistent.
Thus additional resize code was move and now liblvm calls into
a resize function that does indeed write the changes out, thus
not requiring the user to explicitly write out he changes.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Code move and changes to support calling code from
command line and from library interface.
V2 Change lock_vol call
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Extend the lv resize parameter structure to contain everything
the re-size functions need so that the command line does not
need to be present for lower level calls when we call from
library functions.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Add thin and thin pool lv creation support to lvm library
This is Mohan's thinp patch, re-worked to include suggestions
from Zdenek and Mohan.
V2: Remove const lvm_lv_params_create_thin
Add const lvm_lv_params_skip_zero_get
V3: Changed get/set to use generic functions like current
property
V4: Corrected macro in properties.c
V5: Fixed a bug in liblvm/lvm_lv.c function lvm_lv_create.
incorrectly used pool instead of lv_name when doing the
find_lv_in_vg call.
Based on work done by M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Tony Asleson <tasleson@redhat.com>
These settins are customizable by profiles:
allocation/thin_pool_zero
allocation/thin_pool_discards
allocation/thin_pool_chunk_size
activation/thin_pool_autoextend_threshold
activation/thin_pool_autoextend_percent
Before, the status of the configuration check (config_def_check fn call)
was saved directly in global configuration definitinion array (as part
of the cfg_def_item_t/flags)
This patch introduces the "struct cft_check_handle" that defines
configuration check parameters as well as separate place to store
the status (status here means CFG_USED and CFG_VALID flags, formerly
saved in cfg_def_item_t/flags). This struct can hold config check
parameters as well as the status for each config tree separately,
thus making it possible to run several instances of config_def_check
without interference.
Just to make it more clear and also not to confuse
config_valid with check against config definition
(and its 'valid' flag within the config defintion tree).
The command to change the profile for existing VG/LV:
"vgchange/lvchange --profile <profile_name>"
The command to detach any existing profile from VG/LV:
"vgchange/lvchange --detachprofile"
If "vgcreate/lvcreate --profile <profile_name>" is used, the profile
name is automatically stored in metadata for making it possible to
load it automatically next time the VG/LV is used.
When placing the profile in a configuration cascade, this sequence is
used exactly:
CONFIG_STRING -> CONFIG_PROFILE -> CONFIG_FILE/MERGED_FILES
So if the profile is used, it overloads the lvm.conf (and any
existing tag configs). However, if "--config" is used to define
a custom configuration on command line, this overloads even the
profile config!
This patch adds --profile arg to lvm cmds and adds config/profile_dir
configuration setting to select the directory where profiles are stored
By default it's /etc/lvm/profile.
The profiles are added by using new "add_profile" fn and then loaded
using the "load_profile" fn. All profiles are stored in a cmd context
within the new "struct profile_params":
struct profile_params {
const char *dir;
struct profile *global_profile;
struct dm_list profiles_to_load;
struct dm_list profiles;
};
...where "dir" is the directory with profiles, "global_profile" is
the profile that is set globally via the --profile arg (IOW, not
set per VG/LV basis based on metadata record) and the "profiles"
is the list with loaded profiles.
A helper type that helps with identification of the configuration source
which makes handling the configuration cascade a bit easier, mainly
removing and adding configuration trees to cascade dynamically.
Currently, the possible types are:
CONFIG_UNDEFINED - configuration is not defined yet (not initialized)
CONFIG_FILE - one file configuration
CONFIG_MERGED_FILES - configuration that is a result of merging more files into one
CONFIG_STRING - configuration string typed on cmd line directly
CONFIG_PROFILE - profile configuration (the new type of configuration, patches will follow...)
Also, generalize existing "remove_overridden_config_tree" to work with
configuration type identification in a cascade. Before, it was just
the CONFIG_STRING we used. Now, we need some more to add in a
cascade (like the CONFIG_PROFILE). So, we have:
struct dm_config_tree *remove_config_tree_by_source(struct cmd_context *cmd, config_source_t source);
config_source_t config_get_source_type(struct dm_config_tree *cft);
... for removing the tree by its source type from the cascade and
simply getting the source type.
If the user would upconvert a linear LV to a mirror without specifying
the segment type ("--type mirror" vs "--type raid1"), the "mirror"
segment type would be chosen without consulting the 'default_mirror_segtype'
setting in lvm.conf. This is now used as the basis for determining
which should be used if left unspecified.
Support vgsplit for VGs with thin pools and thin volumes.
In case the thin data and thin metadata volumes are moved to a new VG,
move there also all related thin volumes and check that external origins
are also present in this new VG.
Fix the usecase when only PV list is specified.
With --poolmetadatasize PV list is used for metadata extents.
Without --poolmetadatasize PV list is used for 100% extension of LV.
Handle the case, when nothing could be resized (i.e. in dmeventd)
Support 'clasic' way of resizing of metadata LV.
Normally we disallow to work with internal 'invisible' devices.
But in this case we can make an exception and if user has some
special needs how to extend thin pool metadata LV - support it.
After resize of metadata LV, the pool will be suspended and resumed,
to be notified of this change.
Add support for lvresize of thin pool metadata device.
lvresize --poolmetadatasize +20 vgname/thinpool_lv
or
lvresize -L +20 vgname/thinpool_lv_tmeta
Where the second one allows all the args for resize (striping...)
and the first option resizes accoding to the last metadata lv segment.
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
Merge duplicate code that was validating lvcreate args
for creation of thin and snapshot.
Keep most of thin checks in _check_thin_parameters().
Update couple error messages.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
Instead of seeing wierd overflows inside the lvm code,
giving false error messages, kill the user experiment in the begining.
Who needs to use more then 16EiB with lvm2 and 64bit anyway...
Check for mounted fs also for vgchange command, not just lvchange.
NOTE: Code is using lv_info() just like lvs_in_vg_opened().
It should be probably converted into lv_is_active_locally().
There are places where 'lv_is_active' was being used where it was
more correct to use 'lv_is_active_locally'. For example, when checking
for the existance of a kernel instance before asking for its status.
Most of the time these would work correctly. (RAID is only allowed on
non-clustered VGs at the moment, which means that 'lv_is_active' and
'lv_is_active_locally' would give the same result.) However, it is
more correct to use the proper variant and it helps with future
scenarios where targets might be allowed exclusively (or clustered) in
a cluster VG.
Accept --yes on all commands, even ones that don't today have prompts,
so that test scripts that don't care about interactive prompts no
longer need to deal with them.
But continue to mention --yes only in the command prototypes that
actually use it.
This is just a temporary fix to support allocation of -l%FREE.
The number of free extent serves to calculate estimated metadata
size. This value is then substracted twice to keep some
free space for recover.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
This reverts commit 0396ade38b.
The original code also handled len==1, which the new code doesn't.
Press <TAB> in the lvm shell to get a list of the possible
flag completions for a single hyphen.
Move common code for changing activation state from
vgchange and lvchange to one function.
Fix the order of checks - so we always implicitelly
activate snapshots and thin volumes in exclusive mode,
and we do not allow local deactivation for them.
New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
lvchange --syncaction {check|repair} vg/raid_lv
RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent. 'lvchange' can
now initaite the two scrubbing operations: "check" and "repair". "check"
will go over the array and recored the number of discrepancies but not
repair them. "repair" will correct the discrepancies as it finds them.
'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'. The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.
Additional reporting has been added for 'lvs' to support the new
operations. Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches". These
can be accessed using the '-o' option to 'lvs', like:
lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing. It can be one of the following:
- idle: All sync operations complete (doing nothing)
- resync: Initializing an array or recovering after a machine failure
- recover: Replacing a device in the array
- check: Looking for array inconsistencies
- repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.
The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.
Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well. The role of the 'p'artial character in the lv_attr report field
as expanded. "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
'm'ismatches: Indicates that there are discrepancies in a RAID
LV. This character is shown after a scrubbing
operation has detected that portions of the RAID
are not coherent.
'r'efresh : Indicates that a device in a RAID array has suffered
a failure and the kernel regards it as failed -
even though LVM can read the device label and
considers the device to be ok. The LV should be
'r'efreshed to notify the kernel that the device is
now available, or the device should be 'r'eplaced
if it is suspected of failing.
Attempting to up-convert an inactive mirror when there is insufficient
space leads to the following message:
Unable to allocate extents for mirror(s).
ABORTING: Failed to remove temporary mirror layer inactive_mimagetmp_3.
Manual cleanup with vgcfgrestore and dmsetup may be required.
This is caused by a failure to execute the 'deactivate_lv' function in
the error condition. The deactivate returns an error because the LV is
already inactive. This patch checks if the LV is activate and calls
deactivate_lv only if it is. This allows the error cleanup code to work
properly in this condition.
It wasn't that big of a deal anyway, since there was no previous vg_commit
that needed to be reverted. IOW, no harm was done if the allocation failed.
The message was scary and useless.
...to not pollute the common and format-independent code in the
abstraction layer above.
The format1 pv_write has common code for writing metadata and
PV header by calling the "write_disks" fn and when rewriting
the header itself only (e.g. just for the purpose of changing
the PV UUID) during the pvchange operation, we had to tweak
this functionality for the format1 case and we had to assign
the PV the orphan state temporarily.
This patch removes the need for this format1 tweak and it calls
the write_disks with appropriate flag indicating whether this is
a PV write call or a VG write call, allowing for metatada update
for the latter one.
Also, a side effect of the former tweak was that it effectively
invalidated the cache (even for the non-format1 PVs) as we
assigned it the orphan state temporarily just for the format1
PV write to pass.
Also, that tweak made it difficult to directly detect whether
a PV was part of a VG or not because the state was incorrect.
Also, it's not necessary to backup and restore some PV fields
when doing a PV write:
orig_pe_size = pv_pe_size(pv);
orig_pe_start = pv_pe_start(pv);
orig_pe_count = pv_pe_count(pv);
...
pv_write(pv)
...
pv->pe_size = orig_pe_size;
pv->pe_start = orig_pe_start;
pv->pe_count = orig_pe_count;
...this is already done by the layer below itself (the _format1_pv_write fn).
So let's have this cleaned up so we don't need to be bothered
about any 'format1 special case for pv_write' anymore.
Before, the find_pv_by_name call always failed if the PV found was orphan.
However, we might use this function even for a PV that is not part of any VG.
This patch adds 'allow_orphan' arg to find_pv_by_name fn that allows that.
Usage of layer was not the best plan here - for proper devices stack
we have to keep correct reference in volume_group structure and
make the new thin pool LV appear as a new volume.
Keep the flag whether given thin pool argument has been given on command
line or it's been 'estimated'
Call of update_pool_params() must not change cmdline given args and
needs to know this info.
Since there is a need to move this update function into /lib, we cannot
use arg_count().
FIXME: we need some generic mechanism here.
This was a regression introduced with e33fd978a8
(libdm v1.02.68/lvm2 v2.02.89) with the introduction of new output
fields blkdevname and blkdevs_used for ls and deps dmsetup commands.
A new common '_process_options' fn was added with that commit, but the
fn was called prematurely which then broke processing of
'dmsetup splitname -o' which should implicitly use '-c' option
and this was failing after the commit:
alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
Option not recognised: lv_name
Couldn't process command line.
The '-c' had to be used for correct operation:
alatyr/~ $ dmsetup splitname -c -o lv_name /dev/mapper/vg_data-test
LV
test
Now fixed to work as it did before:
alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
LV
test
lvm dumpconfig [--ignoreadvanced] [--ignoreunsupported]
--ignoreadvanced causes the advanced configuration options to be left
out on dumpconfig output
--ignoreunsupported causes the options that are not officially supported
to be lef out on dumpconfig output
lvm dumpconfig [--withcomments] [--withversions]
The --withcomments causes the comments to appear on output before each
config node (if they were defined in config_settings.h).
The --withversions causes a one line extra comment to appear on output
before each config node with the version information in which the
configuration setting first appeared.
lvm dumpconfig [--type {current|default|missing|new}] [--atversion] [--validate]
This patch adds above-mentioned args to lvm dumpconfig and it maps them
to creation and writing out a configuration tree of a specific type
(see also previous commit):
- current maps to CFG_TYPE_CURRENT
- default maps to CFG_TYPE_DEFAULT
- missing maps to CFG_TYPE_MISSING
- new maps to CFG_TYPE_NEW
If --type is not defined, dumpconfig defaults to "--type current"
which is the original behaviour of dumpconfig before all these changes.
The --validate option just validates current configuration tree
(lvm.conf/--config) and it writes a simple status message:
"LVM configuration valid" or "LVM configuration invalid"
For example, the old call and reference:
find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)
...now becomes:
find_config_tree_str(cmd, devices_dir_CFG)
So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
To create an Embedding Area during PV creation (pvcreate or as part of
the vgconvert operation), we need to define the Embedding Area size.
The Embedding Area start will be calculated automatically by the tools.
This patch adds --embeddingareasize argument to pvcreate and vgconvert.
The PV header extension information (PV header extension version, flags
and list of Embedding Area locations) is stored just beyond the PV header base.
When calculating the Embedding Area start value (ea_start), the same logic is
used as when calculating the pe_start value for Data Area - the value must
follow exactly the same alignment restrictions for its start value
(the alignment detected automatically or provided via command line using
the --dataalignment and --dataalignmentoffset arguments).
The Embedding Area is placed at the very start of the PV, starting at
ea_start. The Data Area starting at pe_start is placed next. The pe_start is
still properly aligned. Due to the pe_start alignment, it's possible that the
resulting Embedding Area size (ea_size) ends up bigger in size than requested
(but never less than requested).
New tools with PV header extension support will read the extension
if it exists and it's not an error if it does not exist (so old PVs
will still work seamlessly with new tools).
Old tools without PV header extension support will just ignore any
extension.
As for the Embedding Area location information (its start and size),
there are actually two places where this is stored:
- PV header extension
- VG metadata
The VG metadata contains a copy of what's written in the PV header
extension about the Embedding Area location (NULL value is not copied):
physical_volumes {
pv0 {
id = "AkSSRf-difg-fCCZ-NjAN-qP49-1zzg-S0Fd4T"
device = "/dev/sda" # Hint only
status = ["ALLOCATABLE"]
flags = []
dev_size = 262144 # 128 Megabytes
pe_start = 67584
pe_count = 23 # 92 Megabytes
ea_start = 2048
ea_size = 65536 # 32 Megabytes
}
}
The new metadata fields are "ea_start" and "ea_size".
This is mostly useful when restoring the PV by using existing
metadata backups (e.g. pvcreate --restorefile ...).
New tools does not require these two fields to exist in VG metadata,
they're not compulsory. Therefore, reading old VG metadata which doesn't
contain any Embedding Area information will not end up with any kind
of error but only a debug message that the ea_start and ea_size values
were not found.
Old tools just ignore these extra fields in VG metadata.
Extract restorable PV creation parameters from struct pvcreate_params into
a separate struct pvcreate_restorable_params for clarity and also for better
maintainability when adding any new items later.
Add basic support for converting LV into an external origin volume.
Syntax:
lvconvert --thinpool vg/pool --originname renamed_origin -T origin
It will convert volume 'origin' into a thin volume, which will
use 'renamed_origin' as an external read-only origin.
All read/write into origin will go via 'pool'.
renamed_origin volume is read-only volume, that could be activated
only in read-only mode, and cannot be modified.
Do not allow conversion of external origin into writeable LV,
and prohibit changing the external origin size.
If the snapshot origin is also external origin, merge is prohibited.
When there are missing PVs in a volume group, most operations that alter
the LVM metadata are disallowed. It turns out that 'vgimport' is one of
those disallowed operations. This is bad because it creates a circular
dependency. 'vgimport' will complain that the VG is inconsistent and that
'vgreduce --removemissing' must be run. However, 'vgreduce' cannot be run
because it has not been imported. Therefore, 'vgimport' must be one of
the operations allowed to change the metadata when PVs are missing. The
'--force' option is the way to make 'vgimport' happen in spite of the
missing PVs.
If '--mirrors/-m' and '--stripes/-i' are used together when creating
a logical volume, mirrors-over-stripes is currently chosen. The user
can override this by using the '--type raid10' option on creation.
However, we want a place where we can set the default behavior to
'raid10' explicitly - similar to the "mirror" and "raid1" tunable,
mirror_segtype_default.
A follow-on patch should use this new setting to change the default
from "mirror" to "raid10", as this is the preferred segment type.
Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.
Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.
Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes. Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting. We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
Instead of check for lv_is_active() for thin pool LV,
query the whole pool via new pool_is_active().
Fixes a problem when we cannot change discards settings
for active pool device where the actual layer for pool
device was inactive, but thin volumes using thin pool
have been active.
Update the error path after problems with suspend_lv or vg_commit.
It's not exactly well defined what should happen, and this
code seems to appear in many different instancies<F2> in the
whole source code tree - we should probably pick the best version.
Rename lvmetad_warning() to lvmetad_connect_or_warn().
Log all connection attempts on the client side, whether successful or not.
Reduce some nesting and remove a redundant assertion.
We need to call sync_local_dev_names directly as pvscan uses
VG_GLOBAL lock and this one *does not* cause the synchronization
(sync_dev_names) to be called on unlock (VG_GLOBAL is not a real VG):
define unlock_vg(cmd, vol)
do { \
if (is_real_vg(vol)) \
sync_dev_names(cmd); \
(void) lock_vol(cmd, vol, LCK_VG_UNLOCK); \
} while (0)
Without this fix, we end up without udev synchronization for the
pvscan --cache (mainly for -aay that causes the VGs/LVs to be
autoactivated) and also udev synchronization cookies are then left
in the system since they're not managed properly (code before sets
up udev sync cookies, but we have to call dm_udev_wait at least once
after that to do the wait and cleanup).
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
We can also use this for conversion between different mirror segment
types. Each new segment type converter then needs to check itself
whether the --stripes is applicable.
The motivation to grab the global lock is to avoid a scan and metadata parsing
for each PV, but the cost of obtaining metadata is _mostly_ mitigated by having
lvmetad around. Not taking the global lock improves throughput when multiple pvs
or related commands are running in parallel, like in RHEV.
Calling pvscan --cache with -aay on a PV without an MDA would spuriously fail
with an internal error, because of an incorrect assumption that a parsed VG
structure was always available. This is not true and the autoactivation handler
needs to call vg_read to obtain metadata in cases where the PV had no MDAs to
parse. Therefore, we pass vgid into the handler instead of the (possibly NULL)
VG coming from the PV's MDA.
Remove no longer needed warning for unsuppoted discards
for non-power-2 lvcreate commands.
(Missed from the patch for the same update in lvchange made
by commit dde5a6c52b)
Attempting pvmove on RAID LVs replaces the kernel RAID target with
a temporary pvmove target, ultimately destroying the RAID LV. pvmove
must be prevented on RAID LVs for now.
Use 'lvconvert --replace old_pv vg/lv new_pv' if you want to move
an image of the RAID LV.
Support swapping of metadata device if the thin pool already
exists. This way it's easy to i.e. resize metadata or their
repair operation.
User may create some empty LV, replace existing metadata
or dump and restore them into bigger LV.
If udev synchronization is disabled by means of --noudevsync
option, we should disable just the synchronization and nothing else.
The udev fallback (verifying udev operations and fixing the
nodes/symlinks if found incorrect) is orthogonal and controlled
by a separate activation/verify_udev_operations configuration option.
Allow restoring metadata with thin pool volumes.
No validation is done for this case within vgcfgrestore tool -
thus incorrect metadata may lead to destruction of pool content.
Configurable settings for thin pool create
if they are not specified on command line.
New supported lvm.conf options are:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
Similar to the way the 'mirror', 'raid1' and 'raid10' segment types set
the number of mirrors to 2 ('-m 1') if the argument is not specified,
here we set the number of stripes to 2 if not given on the command line
when creating a RAID10 LV.
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.
Target tells us its version, and we may allow different set of options
to be supported with different version of driver.
Idea is to provide individual feature flags and later be
able to query for them.
This patch is intended to fix bug 825323 - FS turns read-only during a double
fault of a mirror leg and mirrored log's leg at the same time. It only
affects a 2-way mirror with a mirrored log. 3+-way mirrors and mirrors
without a mirrored log are not affected.
The problem resulted from the fact that the top level mirror was not
using 'noflush' when suspending before its "down-convert". When a
mirror image fails, the bios are queue until a suspend is recieved. If
it is a 'noflush' suspend, the bios can be safely requeued in the DM
core. If 'noflush' is not used, the bios must be pushed through the
target and if a device is failed for a mirror, that means issuing an
error. When an error is received by a file system, it results in it
turning read-only (depending on the FS).
Part of the problem was is due to the nature of the stacking involved in
using a mirror as a mirror's log. When an image in each fail, the top
level mirror stalls because it is waiting for a log flush. The other
stalls waiting for corrective action. When the repair command is issued,
the entire stacked arrangement is collapsed to a linear LV. The log
flush then fails (somewhat uncleanly) and the top-level mirror is suspended
without 'noflush' because it is a linear device.
This patch allows the log to be repaired first, which in turn allows the
top-level mirror's log flush to complete cleanly. The top-level mirror
is then secondarily reduced to a linear device - at which time this mirror
is suspended properly with 'noflush'.
Use log_warn to print non-fatal warning messages.
Use of log_error would confuse checker for testing
whether proper error has been reported for some real error.
When valgrind usage is desired by user (--enable-valgrind-pool)
skip playing/closing/reopenning with descriptors - it makes
valgridng useless.
Make sleep delay for clvmd start longer.
A while back, the behavior of LVM changed from allowing metadata changes
when PVs were missing to not allowing changes. Until recently, this
change was tolerated by HA-LVM by forcing a 'vgreduce --removemissing'
before trying (again) to add tags to an LV and then activate it. LVM
mirroring requires that failed devices are removed anyway, so this was
largely harmless. However, RAID LVs do not require devices to be removed
from the array in order to be activated. In fact, in an HA-LVM
environment this would be very undesirable. Device failures in such an
environment can often be transient and it would be much better to restore
the device to the array than synchronize an entirely new device.
There are two methods that can be used to setup an HA-LVM environment:
"clvm" or "tagging". For RAID LVs, "clvm" is out of the question because
RAID LVs are not supported in clustered VGs - not even in an exclusively
activated manner. That leaves "tagging". HA-LVM uses tagging - coupled
with 'volume_list' - to ensure that only one machine can have an LV active
at a time. If updates are not allowed when a PV is missing, it is
impossible to add or remove tags to allow for activation. This removes
one of the most basic functionalities of HA-LVM - site redundancy. If
mirroring or RAID is used to replicate the storage in two data centers
and one of them goes down, a server and a storage device are lost. When
the service fails-over to the alternate site, the VG will be "partial".
Unable to add a tag to the VG/LV, the RAID device will be unable to
activate.
The solution is to allow vgchange and lvchange to alter the LVM metadata
for a limited set of options - --[add|del]tag included. The set of
allowable options are ones that do not cause changes to the DM kernel
target (like --resync would) or could alter the structure of the LV
(like allocation or conversion).
Compared to names, UUIDs can't be renamed once they are created
for a device. The 'mangle' command will just issue an error message
about a need for manual intervention in this case - reactivating the
device (remove + create) does the job as the defualt mangling mode
used is "auto" and that will assign a correct mangled form the UUID.
For now this convertions is not supported, thus disabled.
The only supported conversion for now is to create mirrored thin pools
from mirrored devices.
Update code for lvconvert.
Change the lvconvert user interface a bit - now we require 2 specifiers
--thinpool takes LV name for data device (and makes the name)
--poolmetadata takes LV name for metadata device.
Fix type in thin help text -z -> -Z.
Supported is also new flag --discards for thinpools.
When reformatting the 'lvchange_resync' code in commit
05131f5853, a '!' should have been removed
from the condition that checks for the LV_NOTSYNCED flag on a corelog
mirror LV. The presence of this '!' caused the LV_NOTSYNCED flag to be
cleared when it wasn't present and left when it was present.
It is not allowed to add images to a 'mirror' or 'raid1' LV if the
LV_NOTSYNCED flag is set. We add some up-convert tests to ensure this
behavior is being enforced and that the LV_NOTSYNCED flag is being
properly cleared by 'lvchange --resync'.
(Not updating WHATS_NEW because this is intrarelease.)
Don't try to issue discards to a missing PV to avoid segfault.
Prevent lvremove from removing LVs that have any part missing.
https://bugzilla.redhat.com/857554
Using 'activation/auto_activation_volume_list = [ "vg/lvol1" ]'.
Before this patch:
3 logical volume(s) in volume group "vg" now active
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
lvol0 vg -wi----- 4.00m
lvol1 vg -wi-a--- 4.00m
lvol2 vg -wi-a--- 4.00m
lvol3 vg -wi-a--- 4.00m
(vg/lvol1 activated as it passes the list and all subsequent volumes too - wrong!)
With this patch:
1 logical volume(s) in volume group "vg" now active
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
lvol0 vg -wi----- 4.00m
lvol1 vg -wi-a--- 4.00m
lvol2 vg -wi----- 4.00m
lvol3 vg -wi----- 4.00m
(only vg/lvol1 activated as it passes the list and no other - correct!)
Issuing a 'lvchange --resync <VG>/<RAID_LV>' had no effect. This is
because the code to handle RAID LVs was not present. This patch adds
the code that will clear the metadata areas of RAID LVs - causing them
to resync upon activation.
When an LV is to be resynced, the metadata areas are cleared and the
LV is reactivated. This is true for mirroring and will also be true
for RAID LVs. We restructure the code in lvchange_resync() so that we
keep all the common steps necessary (validation of ability to resync,
deactivation, activation of meta/log devices, clearing of those devices,
etc) and place the code that will be divergent in separate functions:
detach_metadata_devices()
attach_metadata_devices()
The common steps will be processed on lists of metadata devices. Before
RAID capability is added, this will simply be the mirror log device (if
found).
This patch lays the ground-work for adding resync of RAID LVs.
By changing the conditional for resyncing mirrors with core-logs a
bit, we can short-circuit the rest of the function for that case
and reduce the amount of indenting in the rest of the function.
This cleanup will simplify future patches aimed at properly handling
the resync of RAID LVs.
When printing a message for the user and the lv_segment pointer is available,
use segtype->ops->name() instead of segtype->name. This gives a better
user-readable name for the segment. This is especially true for the
'striped' segment type, which prints "linear" if there is an area_count of
one.
We should check whether the fd is opened before trying to reopen it.
For example, the stdin is closed in test/lib/harness.c causing the
test suite to fail.
Accept -q as the short form of --quiet.
Suppress non-essential standard output if -q is given twice.
Treat log/silent in lvm.conf as equivalent to -qq.
Review all log_print messages and change some to
log_print_unless_silent.
When silent, the following commands still produce output:
dumpconfig, lvdisplay, lvmdiskscan, lvs, pvck, pvdisplay,
pvs, version, vgcfgrestore -l, vgdisplay, vgs.
[Needs checking.]
Non-essential messages are shifted from log level 4 to log level 5
for syslog and lvm2_log_fn purposes.
This patch adds support for RAID10. It is not the default at this
stage. The user needs to specify '--type raid10' if they would like
RAID10 instead of stacked mirror over stripe.
Adding couple INTERNAL_ERROR reports for unwanted parameters:
Ensure the 'top' metadata node cannot be NULL for lvmetad.
Make obvious vginfo2 cannot be NULL.
Report internal error if handler and vg is undefined.
Check for handle in poll_vg().
Ensure seg is not NULL in dev_manager_transient().
Report missing read_ahead for _lv_read_ahead_single().
Check for report handler in dm_report_object().
Check missing VG in _vgreduce_single().
Remove the limit for major and minor number arguments used while specifying
persistent numbers via -My --major <major> --minor <minor> option which
was set to 255 before. Follow the kernel limit instead which is 12 bits
for major and 20 bits for minor number (kernel >= 2.6 and LVM formats
that does not have FMT_RESTRICTED_LVIDS - so still keep the old limit
of 255 for lvm1 format).
Allowing people to add devices to a VG that has PVs missing helps
people avoid the inability to repair RAID LVs in certain cases.
For example, if a user creates a RAID 4/5/6 LV using all of the
available devices in a VG, there will be no spare devices to
repair the LV with if a device should fail. Further, because the
VG is missing a device, new devices cannot be added to allow the
repair. If 'vgreduce --removemissing' were attempted, the
"MISSING" PV could not be removed without also destroying the RAID
LV.
Allowing vgextend to operate solves the circular dependency.
When the PV is added by a vgextend operation, the sequence number is
incremented and the 'MISSING' flag is put on the PVs which are missing.
Update lvchange to allow change of 'zero' flag for thinpool.
Add support for changing discard handling.
N.B. from/to ignore could be only changed for inactive pool.
Add arg support for discard.
Add discard ignore, nopassdown, passdown (=default) support.
Flags could be set per pool.
lvcreate [--discard {ignore|no_passdown|passdown}] vg/thinlv
When --sysinit -a ay is used with vg/lvchange and lvmetad is up and running,
we should skip manual activation as that would be a useless step - all volumes
are autoactivated once all the PVs for a VG are present.
If lvmetad is not active at the time of the vgchange --sysinit -a ay
call, the activation proceeds in standard 'manual' way.
This way, we can still have vg/lvchange --sysinit -a ay called
unconditionally in system initialization scripts no matter if lvmetad
is used or not.
Reducing a RAID 4/5/6 LV or extending it with a different number of
stripes is still not implemented. This patch covers the "simple" case
where the LV is extended with the same number of stripes as the orginal.
In process_each_pv() if we haven't yet scanned and the PV appears
to be an orphan, we must scan the other PVs looking for mdas that
reference it to find out what VG it is in.
1. If the PV has no mdas, we must scan.
2. If the PV has an mda that is not ignored we do not need to scan.
3. If the PV has an mda that is ignored, we do need to scan.
This patch fixes case 3.
> pvs -o +mda_count,vg_mda_count /dev/loop[0123]
PV VG Fmt Attr PSize PFree #PMda #VMda
/dev/loop0 vg3 lvm2 a- 96.00m 96.00m 0 1
/dev/loop1 vg3 lvm2 a- 96.00m 96.00m 1 1
/dev/loop2 vg2 lvm2 a- 96.00m 96.00m 1 2
/dev/loop3 vg2 lvm2 a- 28.00m 28.00m 1 2
Before:
> pvs /dev/loop2 /dev/loop3 /dev/loop0 /dev/loop1 --unbuffered
PV VG Fmt Attr PSize PFree
/dev/loop2 lvm2 a-- 100.00m 100.00m
/dev/loop3 vg2 lvm2 a-- 28.00m 28.00m
/dev/loop0 lvm2 a-- 100.00m 100.00m
/dev/loop1 vg3 lvm2 a-- 96.00m 96.00m
After:
> pvs /dev/loop2 /dev/loop3 /dev/loop0 /dev/loop1 --unbuffered
PV VG Fmt Attr PSize PFree
/dev/loop2 vg2 lvm2 a-- 96.00m 96.00m
/dev/loop3 vg2 lvm2 a-- 28.00m 28.00m
/dev/loop0 vg3 lvm2 a-- 96.00m 96.00m
/dev/loop1 vg3 lvm2 a-- 96.00m 96.00m
One can use "lvcreate --aay" to have the newly created volume
activated or not activated based on the activation/auto_activation_volume_list
this way.
Note: -Z/--zero is not compatible with -aay, zeroing is not used in this case!
When using lvcreate -aay, a default warning message is also issued that zeroing
is not done.
Define auto_activation_handler that activates VGs/LVs automatically
based on the activation/auto_activation_volume_list (activating all
volumes by default if the list is not defined).
The autoactivation is done within the pvscan call in 69-dm-lvmetad.rules
that watches for udev events (device appearance/removal).
For now, this works for non-clustered and complete VGs only.
Normally, the 'vgchange -ay' activates all volume groups (that pass
the activation/volume_list filter if set).
This call can appear in two scenarios:
- system boot (so activation within a script in general)
- manual call on command line (so activaton on user's direct request)
For the former one, we would like to select which VGs should be actually
activated. One can define the list of VGs directly to do that. But that
would require the same list to be provided in all the scripts.
The 'vgchange -aay' will check for the activation/auto_activation_volume_list
in adition and it will activate only those VGs/LVs that pass this
filter (assuming all to be activated if the list is not defined - the
same logic we already have for activation/volume_list).
Init/boot scripts should use this form of activation primarily
(which, anyway, becomes only a fallback now with autoactivation done
on PV appearance in tandem with lvmetad in place).
Define an 'activation_handler' that gets called automatically on
PV appearance/disappearance while processing the lvmetad_pv_found
and lvmetad_pv_gone functions that are supposed to update the
lvmetad state based on PV availability state. For now, the actual
support is for PV appearance only, leaving room for PV disappearance
support as well (which is a more complex problem to solve as this
needs to count with possible device stack).
Add a new activation change mode - CHANGE_AAY exposed as
'--activate ay/-aay' argument ('activate automatically').
Factor out the vgchange activation functionality for use in other
tools (like pvscan...).
We're refererring to 'activation' all over the code and we're talking
about 'LVs being activated' all the time so let's use 'activation/activate'
everywhere for clarity and consistency (still providing the old
'available' keyword as a synonym for backward compatibility with
existing environments).
With latest changes in the udev, some deprecated functions were removed
from libudev amongst which there was the "udev_get_dev_path" function
we used to compare a device directory used in udev and directore set in
libdevmapper. The "/dev" is hardcoded in udev now (udev version >= 183).
Amongst other changes and from packager's point of view, it's also
important to note that the libudev development library ("libudev-devel")
could now be a part of the systemd development library ("systemd-devel")
because of the udev + systemd merge.
Support has many limitations and lots of FIXMEs inside,
however it makes initial task when user creates a separate LV for
thin pool data and thin metadata already usable, so let's enable
it for testing.
Easiest API:
lvconvert --chunksize XX --thinpool data_lv metadata_lv
More functionality extensions will follow up.
TODO: Code needs some rework since a lot of same code is getting copied.
Just to make it clearer since there is the "dmsetup info -c -o blkdevname"
as well that shows the "block device name for this mapping", having a
"BlkDevName" header on output.
It's a bit confusing then if the "dmsetup info -c -o devs_used,blkdevs_used"
is named with a plural "DevNames"/"BlkDevNames" but at the same time having
a totally different meaning than the singular form "BlkDevName".
DevNames --> DevNamesUsed
BlkDevNames --> BlkDevNamesUsed
...makes it much more comprehensible.
When resizing thin pool - we need to use strip info from _tdata volume.
In future more generic solution will be necessary once we start to support
lvconvert (resize of stacked devices and stay properly aligned).
For now we just allow striped or linear LV so this code will work.
When given lvresize new size - round upward for stripes - unless we use % and
we are at the border of free extents.
This patch is not a complete fix and few more cases will need special care.
When vg_read fails, it internally unlocks VG if it's been locked,
so in error path we should skip unlock_vg for this case.
(user would see ugly internal warning)