IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch adds --profile arg to lvm cmds and adds config/profile_dir
configuration setting to select the directory where profiles are stored
By default it's /etc/lvm/profile.
The profiles are added by using new "add_profile" fn and then loaded
using the "load_profile" fn. All profiles are stored in a cmd context
within the new "struct profile_params":
struct profile_params {
const char *dir;
struct profile *global_profile;
struct dm_list profiles_to_load;
struct dm_list profiles;
};
...where "dir" is the directory with profiles, "global_profile" is
the profile that is set globally via the --profile arg (IOW, not
set per VG/LV basis based on metadata record) and the "profiles"
is the list with loaded profiles.
A helper type that helps with identification of the configuration source
which makes handling the configuration cascade a bit easier, mainly
removing and adding configuration trees to cascade dynamically.
Currently, the possible types are:
CONFIG_UNDEFINED - configuration is not defined yet (not initialized)
CONFIG_FILE - one file configuration
CONFIG_MERGED_FILES - configuration that is a result of merging more files into one
CONFIG_STRING - configuration string typed on cmd line directly
CONFIG_PROFILE - profile configuration (the new type of configuration, patches will follow...)
Also, generalize existing "remove_overridden_config_tree" to work with
configuration type identification in a cascade. Before, it was just
the CONFIG_STRING we used. Now, we need some more to add in a
cascade (like the CONFIG_PROFILE). So, we have:
struct dm_config_tree *remove_config_tree_by_source(struct cmd_context *cmd, config_source_t source);
config_source_t config_get_source_type(struct dm_config_tree *cft);
... for removing the tree by its source type from the cascade and
simply getting the source type.
If the user would upconvert a linear LV to a mirror without specifying
the segment type ("--type mirror" vs "--type raid1"), the "mirror"
segment type would be chosen without consulting the 'default_mirror_segtype'
setting in lvm.conf. This is now used as the basis for determining
which should be used if left unspecified.
Support vgsplit for VGs with thin pools and thin volumes.
In case the thin data and thin metadata volumes are moved to a new VG,
move there also all related thin volumes and check that external origins
are also present in this new VG.
Fix the usecase when only PV list is specified.
With --poolmetadatasize PV list is used for metadata extents.
Without --poolmetadatasize PV list is used for 100% extension of LV.
Handle the case, when nothing could be resized (i.e. in dmeventd)
Support 'clasic' way of resizing of metadata LV.
Normally we disallow to work with internal 'invisible' devices.
But in this case we can make an exception and if user has some
special needs how to extend thin pool metadata LV - support it.
After resize of metadata LV, the pool will be suspended and resumed,
to be notified of this change.
Add support for lvresize of thin pool metadata device.
lvresize --poolmetadatasize +20 vgname/thinpool_lv
or
lvresize -L +20 vgname/thinpool_lv_tmeta
Where the second one allows all the args for resize (striping...)
and the first option resizes accoding to the last metadata lv segment.
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
Merge duplicate code that was validating lvcreate args
for creation of thin and snapshot.
Keep most of thin checks in _check_thin_parameters().
Update couple error messages.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
Instead of seeing wierd overflows inside the lvm code,
giving false error messages, kill the user experiment in the begining.
Who needs to use more then 16EiB with lvm2 and 64bit anyway...
Check for mounted fs also for vgchange command, not just lvchange.
NOTE: Code is using lv_info() just like lvs_in_vg_opened().
It should be probably converted into lv_is_active_locally().
There are places where 'lv_is_active' was being used where it was
more correct to use 'lv_is_active_locally'. For example, when checking
for the existance of a kernel instance before asking for its status.
Most of the time these would work correctly. (RAID is only allowed on
non-clustered VGs at the moment, which means that 'lv_is_active' and
'lv_is_active_locally' would give the same result.) However, it is
more correct to use the proper variant and it helps with future
scenarios where targets might be allowed exclusively (or clustered) in
a cluster VG.
Accept --yes on all commands, even ones that don't today have prompts,
so that test scripts that don't care about interactive prompts no
longer need to deal with them.
But continue to mention --yes only in the command prototypes that
actually use it.
This is just a temporary fix to support allocation of -l%FREE.
The number of free extent serves to calculate estimated metadata
size. This value is then substracted twice to keep some
free space for recover.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
This reverts commit 0396ade38b.
The original code also handled len==1, which the new code doesn't.
Press <TAB> in the lvm shell to get a list of the possible
flag completions for a single hyphen.
Move common code for changing activation state from
vgchange and lvchange to one function.
Fix the order of checks - so we always implicitelly
activate snapshots and thin volumes in exclusive mode,
and we do not allow local deactivation for them.
New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
lvchange --syncaction {check|repair} vg/raid_lv
RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent. 'lvchange' can
now initaite the two scrubbing operations: "check" and "repair". "check"
will go over the array and recored the number of discrepancies but not
repair them. "repair" will correct the discrepancies as it finds them.
'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'. The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.
Additional reporting has been added for 'lvs' to support the new
operations. Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches". These
can be accessed using the '-o' option to 'lvs', like:
lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing. It can be one of the following:
- idle: All sync operations complete (doing nothing)
- resync: Initializing an array or recovering after a machine failure
- recover: Replacing a device in the array
- check: Looking for array inconsistencies
- repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.
The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.
Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well. The role of the 'p'artial character in the lv_attr report field
as expanded. "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
'm'ismatches: Indicates that there are discrepancies in a RAID
LV. This character is shown after a scrubbing
operation has detected that portions of the RAID
are not coherent.
'r'efresh : Indicates that a device in a RAID array has suffered
a failure and the kernel regards it as failed -
even though LVM can read the device label and
considers the device to be ok. The LV should be
'r'efreshed to notify the kernel that the device is
now available, or the device should be 'r'eplaced
if it is suspected of failing.
Attempting to up-convert an inactive mirror when there is insufficient
space leads to the following message:
Unable to allocate extents for mirror(s).
ABORTING: Failed to remove temporary mirror layer inactive_mimagetmp_3.
Manual cleanup with vgcfgrestore and dmsetup may be required.
This is caused by a failure to execute the 'deactivate_lv' function in
the error condition. The deactivate returns an error because the LV is
already inactive. This patch checks if the LV is activate and calls
deactivate_lv only if it is. This allows the error cleanup code to work
properly in this condition.
It wasn't that big of a deal anyway, since there was no previous vg_commit
that needed to be reverted. IOW, no harm was done if the allocation failed.
The message was scary and useless.
...to not pollute the common and format-independent code in the
abstraction layer above.
The format1 pv_write has common code for writing metadata and
PV header by calling the "write_disks" fn and when rewriting
the header itself only (e.g. just for the purpose of changing
the PV UUID) during the pvchange operation, we had to tweak
this functionality for the format1 case and we had to assign
the PV the orphan state temporarily.
This patch removes the need for this format1 tweak and it calls
the write_disks with appropriate flag indicating whether this is
a PV write call or a VG write call, allowing for metatada update
for the latter one.
Also, a side effect of the former tweak was that it effectively
invalidated the cache (even for the non-format1 PVs) as we
assigned it the orphan state temporarily just for the format1
PV write to pass.
Also, that tweak made it difficult to directly detect whether
a PV was part of a VG or not because the state was incorrect.
Also, it's not necessary to backup and restore some PV fields
when doing a PV write:
orig_pe_size = pv_pe_size(pv);
orig_pe_start = pv_pe_start(pv);
orig_pe_count = pv_pe_count(pv);
...
pv_write(pv)
...
pv->pe_size = orig_pe_size;
pv->pe_start = orig_pe_start;
pv->pe_count = orig_pe_count;
...this is already done by the layer below itself (the _format1_pv_write fn).
So let's have this cleaned up so we don't need to be bothered
about any 'format1 special case for pv_write' anymore.
Before, the find_pv_by_name call always failed if the PV found was orphan.
However, we might use this function even for a PV that is not part of any VG.
This patch adds 'allow_orphan' arg to find_pv_by_name fn that allows that.
Usage of layer was not the best plan here - for proper devices stack
we have to keep correct reference in volume_group structure and
make the new thin pool LV appear as a new volume.
Keep the flag whether given thin pool argument has been given on command
line or it's been 'estimated'
Call of update_pool_params() must not change cmdline given args and
needs to know this info.
Since there is a need to move this update function into /lib, we cannot
use arg_count().
FIXME: we need some generic mechanism here.
This was a regression introduced with e33fd978a8
(libdm v1.02.68/lvm2 v2.02.89) with the introduction of new output
fields blkdevname and blkdevs_used for ls and deps dmsetup commands.
A new common '_process_options' fn was added with that commit, but the
fn was called prematurely which then broke processing of
'dmsetup splitname -o' which should implicitly use '-c' option
and this was failing after the commit:
alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
Option not recognised: lv_name
Couldn't process command line.
The '-c' had to be used for correct operation:
alatyr/~ $ dmsetup splitname -c -o lv_name /dev/mapper/vg_data-test
LV
test
Now fixed to work as it did before:
alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
LV
test
lvm dumpconfig [--ignoreadvanced] [--ignoreunsupported]
--ignoreadvanced causes the advanced configuration options to be left
out on dumpconfig output
--ignoreunsupported causes the options that are not officially supported
to be lef out on dumpconfig output
lvm dumpconfig [--withcomments] [--withversions]
The --withcomments causes the comments to appear on output before each
config node (if they were defined in config_settings.h).
The --withversions causes a one line extra comment to appear on output
before each config node with the version information in which the
configuration setting first appeared.
lvm dumpconfig [--type {current|default|missing|new}] [--atversion] [--validate]
This patch adds above-mentioned args to lvm dumpconfig and it maps them
to creation and writing out a configuration tree of a specific type
(see also previous commit):
- current maps to CFG_TYPE_CURRENT
- default maps to CFG_TYPE_DEFAULT
- missing maps to CFG_TYPE_MISSING
- new maps to CFG_TYPE_NEW
If --type is not defined, dumpconfig defaults to "--type current"
which is the original behaviour of dumpconfig before all these changes.
The --validate option just validates current configuration tree
(lvm.conf/--config) and it writes a simple status message:
"LVM configuration valid" or "LVM configuration invalid"
For example, the old call and reference:
find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)
...now becomes:
find_config_tree_str(cmd, devices_dir_CFG)
So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
To create an Embedding Area during PV creation (pvcreate or as part of
the vgconvert operation), we need to define the Embedding Area size.
The Embedding Area start will be calculated automatically by the tools.
This patch adds --embeddingareasize argument to pvcreate and vgconvert.
The PV header extension information (PV header extension version, flags
and list of Embedding Area locations) is stored just beyond the PV header base.
When calculating the Embedding Area start value (ea_start), the same logic is
used as when calculating the pe_start value for Data Area - the value must
follow exactly the same alignment restrictions for its start value
(the alignment detected automatically or provided via command line using
the --dataalignment and --dataalignmentoffset arguments).
The Embedding Area is placed at the very start of the PV, starting at
ea_start. The Data Area starting at pe_start is placed next. The pe_start is
still properly aligned. Due to the pe_start alignment, it's possible that the
resulting Embedding Area size (ea_size) ends up bigger in size than requested
(but never less than requested).
New tools with PV header extension support will read the extension
if it exists and it's not an error if it does not exist (so old PVs
will still work seamlessly with new tools).
Old tools without PV header extension support will just ignore any
extension.
As for the Embedding Area location information (its start and size),
there are actually two places where this is stored:
- PV header extension
- VG metadata
The VG metadata contains a copy of what's written in the PV header
extension about the Embedding Area location (NULL value is not copied):
physical_volumes {
pv0 {
id = "AkSSRf-difg-fCCZ-NjAN-qP49-1zzg-S0Fd4T"
device = "/dev/sda" # Hint only
status = ["ALLOCATABLE"]
flags = []
dev_size = 262144 # 128 Megabytes
pe_start = 67584
pe_count = 23 # 92 Megabytes
ea_start = 2048
ea_size = 65536 # 32 Megabytes
}
}
The new metadata fields are "ea_start" and "ea_size".
This is mostly useful when restoring the PV by using existing
metadata backups (e.g. pvcreate --restorefile ...).
New tools does not require these two fields to exist in VG metadata,
they're not compulsory. Therefore, reading old VG metadata which doesn't
contain any Embedding Area information will not end up with any kind
of error but only a debug message that the ea_start and ea_size values
were not found.
Old tools just ignore these extra fields in VG metadata.
Extract restorable PV creation parameters from struct pvcreate_params into
a separate struct pvcreate_restorable_params for clarity and also for better
maintainability when adding any new items later.
Add basic support for converting LV into an external origin volume.
Syntax:
lvconvert --thinpool vg/pool --originname renamed_origin -T origin
It will convert volume 'origin' into a thin volume, which will
use 'renamed_origin' as an external read-only origin.
All read/write into origin will go via 'pool'.
renamed_origin volume is read-only volume, that could be activated
only in read-only mode, and cannot be modified.
Do not allow conversion of external origin into writeable LV,
and prohibit changing the external origin size.
If the snapshot origin is also external origin, merge is prohibited.
When there are missing PVs in a volume group, most operations that alter
the LVM metadata are disallowed. It turns out that 'vgimport' is one of
those disallowed operations. This is bad because it creates a circular
dependency. 'vgimport' will complain that the VG is inconsistent and that
'vgreduce --removemissing' must be run. However, 'vgreduce' cannot be run
because it has not been imported. Therefore, 'vgimport' must be one of
the operations allowed to change the metadata when PVs are missing. The
'--force' option is the way to make 'vgimport' happen in spite of the
missing PVs.
If '--mirrors/-m' and '--stripes/-i' are used together when creating
a logical volume, mirrors-over-stripes is currently chosen. The user
can override this by using the '--type raid10' option on creation.
However, we want a place where we can set the default behavior to
'raid10' explicitly - similar to the "mirror" and "raid1" tunable,
mirror_segtype_default.
A follow-on patch should use this new setting to change the default
from "mirror" to "raid10", as this is the preferred segment type.
Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.
Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.
Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes. Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting. We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
Instead of check for lv_is_active() for thin pool LV,
query the whole pool via new pool_is_active().
Fixes a problem when we cannot change discards settings
for active pool device where the actual layer for pool
device was inactive, but thin volumes using thin pool
have been active.
Update the error path after problems with suspend_lv or vg_commit.
It's not exactly well defined what should happen, and this
code seems to appear in many different instancies<F2> in the
whole source code tree - we should probably pick the best version.
Rename lvmetad_warning() to lvmetad_connect_or_warn().
Log all connection attempts on the client side, whether successful or not.
Reduce some nesting and remove a redundant assertion.
We need to call sync_local_dev_names directly as pvscan uses
VG_GLOBAL lock and this one *does not* cause the synchronization
(sync_dev_names) to be called on unlock (VG_GLOBAL is not a real VG):
define unlock_vg(cmd, vol)
do { \
if (is_real_vg(vol)) \
sync_dev_names(cmd); \
(void) lock_vol(cmd, vol, LCK_VG_UNLOCK); \
} while (0)
Without this fix, we end up without udev synchronization for the
pvscan --cache (mainly for -aay that causes the VGs/LVs to be
autoactivated) and also udev synchronization cookies are then left
in the system since they're not managed properly (code before sets
up udev sync cookies, but we have to call dm_udev_wait at least once
after that to do the wait and cleanup).
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
We can also use this for conversion between different mirror segment
types. Each new segment type converter then needs to check itself
whether the --stripes is applicable.