IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
When cache creation fails on table reload path, implemen more
advanced revert solution, that tries to restore state of LVM
metadata into is look before actual caching started.
Loading invalid MQ/SMQ policy settings table line cause immediate
rejection - to prevent such failure, automatically filter valid
settins before it is being uploaded by lvm2.
For invalid setting issue a warnning informing user how to remove
them.
This solution is used to keep running cached LVs that might had
been created in the past with invalid settings that have been actually
unused due to another code bug.
New versions of kvdo module exposes statistics at new location:
/sys/block/dm-XXX/vdo/statistics/...
Enhance lvm2 to access this location first.
Also if the statistic info is missing - make it 'debug' level info,
so it is not failing 'lvs' command.
Since VDO is always returns 'zero' on unprovisioned read
and every provisioned block is always 'zeroed' on partial writes,
we can avoid 'zeroing' of such LVs.
Some device id types can only be used with specific device major
numbers, so use this restriction to avoid some comparisions.
This is more efficient, and can avoid some incorrect matches.
pvid and vgid are sometimes a null-terminated string, and
other times a 'struct id', and the two types were often
cast between each other. When a struct id was cast to a char
pointer, the resulting string would not necessarily be null
terminated. Casting a null-terminated string id to a
struct id is fine, but is still avoided when possible.
A struct id is: int8_t uuid[ID_LEN]
A string id is: char pvid[ID_LEN + 1]
A convention is introduced to help distinguish them:
- variables and struct fields named "pvid" or "vgid"
should be null-terminated strings.
- variables and struct fields named "pv_id" or "vg_id"
should be struct id's.
- examples:
char pvid[ID_LEN + 1];
char vgid[ID_LEN + 1];
struct id pv_id;
struct id vg_id;
Function names also attempt to follow this convention.
Avoid casting between the two types as much as possible,
with limited exceptions when known to be safe and clearly
commented.
Avoid using variations of strcpy and strcmp, and instead
use memcpy/memcmp with ID_LEN (with similar limited
exceptions possible.)
When splitting VG with thin/cache pool volume, handle pmspare during
such split and allocate new pmspare in new VG or extend existing pmspare
there and eventually drop pmspare in original VG if is no longer needed
there.
When check active componet of thinLV with external origin,
we need to check if the external origin isn't already active.
For this however we need to use layered check for -real device.
sysfs-based multipath component detection quit if a
device had multiple holders, and in this case would
fail to detect a device was an mpath component.
For the pv_min_size check, always use dev_get_size()
which is commonly used elsewhere, and don't bother
asking libudev for the device size when
external_device_info_source=udev.
related to config settings:
obtain_device_info_from_udev (controls if lvm gets
a list of devices from readdir /dev or from libudev)
external_device_info_source (controls if lvm asks
libudev for device information)
. Make the obtain_device_list_from_udev setting
affect only the choice of readdir /dev vs libudev.
The setting no longer controls if udev is used for
device type checks.
. Change obtain_device_list_from_udev default to 0.
This helps avoid boot timeouts due to slow libudev
queries, avoids reported failures from
udev_enumerate_scan_devices, and avoids delays from
"device not initialized in udev database" errors.
Even without errors, for a system booting with 1024 PVs,
lvm2-pvscan times improve from about 100 sec to 15 sec,
and the pvscan command from about 64 sec to about 4 sec.
. For external_device_info_source="none", remove all
libudev device info queries, and use only lvm
native device info.
. For external_device_info_source="udev", first check
lvm native device info, then check libudev info.
. Remove sleep/retry loop when attempting libudev
queries for device info. udev info will simply
be skipped if it's not immediately available.
. Only set up a libdev connection if it will be used by
obtain_device_list_from_udev/external_device_info_source.
. For native multipath component detection, use
/etc/multipath/wwids. If a device has a wwid
matching an entry in the wwids file, then it's
considered a multipath component. This is
necessary to natively detect multipath
components when the mpath device is not set up.
How to trigger:
```
~ # export LVM_SYSTEM_DIR=_
~ # pvscan
No matching physical volumes found
double free or corruption (!prev)
Aborted (core dumped)
```
when LVM_SYSTEM_DIR is empty, _load_config_file() won't be called.
when LVM_SYSTEM_DIR is not empty, cfl->cft links into cmd->config_files
by _load_config_file()@lib/commands/toolcontext.c
core dumped code: _destroy_config()@lib/commands/toolcontext.c
```
/* CONFIG_FILE/CONFIG_MERGED_FILES */
if ((cft = remove_config_tree_by_source(cmd, CONFIG_MERGED_FILES)))
config_destroy(cft);
else if ((cft = remove_config_tree_by_source(cmd, CONFIG_FILE)))
config_destroy(cft); <=== first free the cft
dm_list_iterate_items(cfl, &cmd->config_files)
config_destroy(cfl->cft); <=== double free the cft
```
Fixes: c43f2f8ae0
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
expands commit d5a06f9a7d
"pvscan: skip indexing devices used by LVs"
The dev cache index is expensive and slow, so limit it
to commands that are used to observe the state of lvm.
The index is only used to print warnings about incorrect
device use by active LVs, e.g. if an LV is using a
multipath component device instead of the multipath
device. Commands that continue to use the index and
print the warnings:
fullreport, lvmdiskscan, vgs, lvs, pvs,
vgdisplay, lvdisplay, pvdisplay,
vgscan, lvscan, pvscan (excluding --cache)
A couple other commands were borrowing the DEV_USED_FOR_LV
flag to just check if a device was actively in use by LVs.
These are converted to the new dev_is_used_by_active_lv().
dev_cache_index_devs() is taking a large amount of time
when there are many PVs. The index keeps track of
devices that are currently in use by active LVs. This
info is used to print warnings for users in some limited
cases.
The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.
This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
There have been two separate checks for metadata
validity: first that the metadata text begins with
a valid VG name, and second the checksum of the
metadata text. These happen in different places,
which means there have been two separate error paths
for invalid metadata. This also causes large metadata
to be read in multiple parts, the first part is read
just to check the vgname, and then remaining parts are
read later when the full metadata is needed.
This patch moves the vg name verification so it's
done just before the checksum verification, which
results in a single error path for invalid metadata,
and causes the entire metadata to be read together
rather that in parts from different parts of the code.
If label_scan encounters bad vg metadata, invalidate
bcache data for the device and reread the mda_header
and metadata text back to back. With concurrent commands
modifying large metadata, it's possible that the entire
metadata area can be rewritten in the time between a
command reading the mda_header and reading the metadata
text that the header points to. Since the label_scan
is just assembling an initial overview of devices, it
doesn't use locking to serialize with other commands
that may be modifying the vg metadata at the same time.
Add profilable configurable setting for vdo pool header size, that is
used as 'extra' empty space at the front and end of vdo-pool device
to avoid having a disk in the system the may have same data is real
vdo LV.
For some conversion cases however we may need to allow using '0' header size.
TODO: in this case we may eventually avoid adding 'linear' mapping layer
in future - but this requires further modification over lvm code base.
When adding a device to the devices file with --adddev, lvm
by default chooses the best device ID type for the new device.
The new --deviceidtype option allows the user to override the
built in preference. This is useful if there's a problem with
the default type, or if a secondary type is preferrable.
If the specified deviceidtype does not produce a device ID,
then lvm falls back to the preference it would otherwise use.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
Instead of calling explicit archive with command processing logic,
move this step towards 1st. vg_write() call, which will automatically
store archive of committed metadata.
This slightly changes some error path where the error in archiving
was detected earlier in the command, while now some on going command
'actions' might have been, but will be simply scratched in case
of error (since even new metadata would not have been even written).
So general effect should be only some command message ordering.
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
error reading dev and no pvid on dev were both
returning 0. make it easier for callers to
know which, if they care.
return 1 if the device could be read, regardless
of whether a pvid was found or not.
set has_pvid=1 if a pvid is found and 0 if no
pvid is found.
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path not match.
We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
With commit 0b18c25d93 there
was introduced 'zalloc()' for allocation of outdates pvs,
but no matching 'free()' is present.
Switch to use cmd mempool instead of adding free() code into
several places.
The autoactivation property can be specified in lvcreate
or vgcreate for new LVs/VGs, and the property can be changed
by lvchange or vgchange for existing LVs/VGs.
--setautoactivation y|n
enables|disables autoactivation of a VG or LV.
Autoactivation is enabled by default, which is consistent with
past behavior. The disabled state is stored as a new flag
in the VG metadata, and the absence of the flag allows
autoactivation.
If autoactivation is disabled for the VG, then no LVs in the VG
will be autoactivated (the LV autoactivation property will have
no effect.) When autoactivation is enabled for the VG, then
autoactivation can be controlled on individual LVs.
The state of this property can be reported for LVs/VGs using
the "-o autoactivation" option in lvs/vgs commands, which will
report "enabled", or "" for the disabled state.
Previous versions of lvm do not recognize this property. Since
autoactivation is enabled by default, the disabled setting will
have no effect in older lvm versions. If the VG is modified by
older lvm versions, the disabled state will also be dropped from
the metadata.
The autoactivation property is an alternative to using the lvm.conf
auto_activation_volume_list, which is still applied to to VGs/LVs
in addition to the new property.
If VG or LV autoactivation is disabled either in metadata or in
auto_activation_volume_list, it will not be autoactivated.
An autoactivation command will silently skip activating an LV
when the autoactivation property is disabled.
To determine the effective autoactivation behavior for a specific
LV, multiple settings would need to be checked:
the VG autoactivation property, the LV autoactivation property,
the auto_activation_volume_list. The "activation skip" property
would also be relevant, since it applies to both normal and auto
activation.
If we are signaled with SIGTERM it should be at least as good
as with SIGINT - as the command should stop ASAP.
So when lvm2 command allows signal handling we also
enable SIGTERM handling. If there are some other signals
we should handle equally - we could just extend array.
Not all libc (like musl, uclibc dietlibc) libraries support full symbol
version resolution in runtime like glibc.
Add support to not generate symbol versions when compiling against them.
Additionally libdevmapper.so was broken when compiled against
uclibc. Runtime linker loader caused calling dm_task_get_info_base()
function recursively, leading to segmentation fault.
Introduce --with-symvers=STYLE option, which allows to choose
between gnu and disabled symbol versioning. By default gnu symbol
versioning is used.
__GNUC__ check is replaced now with GNU_SYMVER.
Additionally ld version script is included only in
case of gnu option, which slightly reduces output size.
Providing --without-symvers to configure script when building against
uclibc library fixes segmentation fault error described above, due to
lack of several versions of the same symbol in libdevmapper.so
library.
Based on:
https://patchwork.kernel.org/project/dm-devel/patch/20180831144817.31207-1-m.niestroj@grinn-global.com/
Suggested-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Function is not having the best name since it does check
no just raid LVs to be in sync.
Restore the mirror percentage checking - although without retries,
since only raid target is currently known to need it - for other
types it would be ATM a bug to get inconsistent result.
Whiel waiting for raid to return consistent status,
use interruptible sleep - so command can break quickly.
Use lv_raid_status() to get percentage easily from status.
Enabled extension/mixing of stripes/linears, error and zero
segtype LVs with stripes/linear, error and zero segtypes.
It is not very useful in practice, as the user cannot store any real
data on error or zero segtypes, but it may get some uses in
some scenarios where i.e. some portion of the device should not be
readable. Mixing of types happens on 'extent_size' level:
lvcreate -L1 -n lv vg
lvextend --type error -L+1 vg/lv
lvextend --type zero -L+1 vg/lv
lvextend --type linear -L+1 vg/lv
lvextend --type striped -L+1 vg/lv
lvs -o+segtype,seg_size vg
Note: when the type is not specified, the last segment type is
automatically selected.
It's also a small 'can of worms' since we can't tell LVs if
the LV is linear/error/zero or their mixtures. So the meaning behind
them may need some updates.
We already have this types of LV created i.e by:
vgreduce --removemissing --force
where missing LV segments have been replaced by either
error or zero segtype (lvm.conf).
TODO: it might be worth adding a message while such device is activated.
Add some extra code to handle differently sized thin-pool
from thin-pool data volume.
ATM this can't really happen, but once we start to use multiple
commits while resizing stacked LV, we may actually get into
the position, where data LV has been already resized,
but thin-pool stayed with old size.
But for now - report difference as internal error.
Devices made only from 'error' target cannot be used,
but if the device is also combined from 'zero' target
the same rule can be applied as such device cannot be used.
Just like with other segtype use this function to get whole
raid status info available per a single ioctl call.
Also it nicely simplifies read of percentage info about
in_sync portion of raid volume.
TODO: drop use of other calls then lv_raid_status call,
since all such calls could already use status - so it just
adds unnecessary duplication.
Reduce ioctl count and avoid separate info check,
when we can get the same info from status ioctl.
When devmanager calls return 0, then the exists value 0
means the reason of failure is missing device in table.
In such case we avoid stack trace.
Swap the flush parameter for the vdo status function
to match thin pool status.
Reuse similar 'acceleration' as used for dependent volumes also
for snapshot - so when origin is being removed with all thick
snapshots, don't bother with individual 'COW' detachments
and write&commits, and when possible handle this all within
a single commit.
Move code for prompting about removed LV to a single function
and use it also to prompt for removal of origin and all its thick
snapshots and also when removing merging origin.
Function does handle postponed write_and_commit so there is
no 'in-flight' operation while waiting on [y|n] answer.
Compat code and handle unusual case, where
thin snapshot is also a 'thick snapshot origin' and such
snapshot gets merged into a thin origin.
However since now lv_is_visible() (which is complex function)
replaced &VISIBLE_LV check, the whole this check seems to be
no longer useful as sum of all 3 will always match??
Since cached LV is going to be removed together with its cache,
there is not much to gain if we try to flush cache first.
User may use 'vgcfgrestore' to get back origin + cache.
Assuming user is not using issue_discards.
When data are discarded after remove there is nothing to restore!
This change allows to futher reduce number of commits
during lvremove/vgremove.
When lvremove/vgremove removes thin volumes with its thin-pool as well,
try to skip any updates of such thin-pool, so when everything properly
deactivates, there is no message send to this thin-pool and whole
thin-pool is removed with a single commit.
Returning NULL for lv_committed is basically instant crash,
so instead try with passed LV instead.
It shouldn't matter as this is internall error path anyway,
but coverity should be happier.
Another step towards better automatic handling of backup,
and automatically setup needs_backup after commit.
In some next step we should reduce number of backups and takem
then only at the command finish with vg_committed content.
With commit b44db5d1a7
needs to check allocated pointer for failed malloc().
Existing check was actually no checking anything so failing
malloc here would result in segfault (although with very
low chance to ever happen).
Match VG uuid just once per list of all LVs in VG.
TODO: maybe some more efficeint tree or hash could be better here,
but since it's used not so often, the total benefit is not so great,
so ATM just reducing amount of checked bytes.