IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
Description stolen from linux d/b/rbd.c L3:
rbd.c -- Export ceph rados objects as a Linux block device
16 partitions seem to make sense according to L90:
#define RBD_SINGLE_MAJOR_PART_SHIFT 4
Running *scan -vvvvvvdddddd yields
#filters/filter-type.c:28 /dev/rbd1p5: Skipping: Unrecognised LVM device type 252
#filters/filter-persistent.c:131 filter caching bad /dev/rbd1p5
right now, and adding
types = ["rbd", 252]
to /e/l/lvm.conf (with the matching "252 rbd" in /p/devices) works as a
per-machine fix:
rbd1 252:16 0 1T 1 disk
|-rbd1p1 252:17 0 243M 1 part
|-rbd1p2 252:18 0 1K 1 part
`-rbd1p5 252:21 0 1023.8G 1 part
`-dev01--vg-root 253:0 0 1023.8G 0 lvm
but rbd is supported by upstream so it'd be nice to have it work OOB
Improve handling of md components that get through the
filter, like the previous improvement for multipath.
If md components get through the filter and trigger
duplicate PV code, then eliminate any devs entirely
that are not an md device.
If multipath component devices get through the filter and
cause lvm to see duplicate PVs, then check the wwid of the
devs and drop the component devices as if they had been
filtered. If a dm mpath device was found among the duplicates
then use that as the PV, otherwise do not use any of the
components as the PV.
"duplicate PVs" associated with multipath configs will no
longer stop commands from working.
Remove the searched_devnames file in a couple more places:
. When hints need refreshing it's possible that a missing
devices file entry could be found by searching devices
again.
. When a devices file entry devname is first found to be
incorrect, a new search for missing entries may be
useful.
When devnames are used as device ids and devnames change,
then new devices need to be located for the PVs. If the old
devname is now used by a filtered device, this was preventing
the code from searching for the new device, so the PV was
reported as missing.
If the optimized label scan fails (using online files),
then clear the device state prior to falling back to the
standard label_scan. This avoids printing output about
unexpected state.
Copy another optimization from pvscan -aay to vgchange -aay.
When using the optimized label scan for only one VG, acquire the
VG lock prior to the scan. This allows vg_read to then skip the
repeated label scan that normally happens after locking the vg.
Include the device name in the /run/lvm/pvs_online/pvid files.
Commands using the pvid file can use the devname to more quickly
find the correct device, vs finding the device using the
major:minor number. If the devname in the pvid file is missing
or incorrect, fall back to using the devno.
For completeness and consistency, adjust the behavior
for some variations of:
vgchange -aay --autoactivation event [vgname]
The current standard use is with a VG name arg, and the
command is only called when all pvs_online files exist.
This is the optimal case, in which only pvs_online devs
are read. This remains the same.
Clean up behaviors for some other unexpected uses of the
command:
. With no VG name arg, the command activates any VGs
that are complete according to pvs_online. If no
pvs_online files exist, it does nothing.
. If a VG name is used but no PVs online files exist for
the VG, or the PVs online files are incomplete, then
consider there could be a problem with the pvs_online
files, and fall back to a full label scan prior to
attempting the activation.
Part of the optimization to avoid a full dev_cache_scan requires
translating major:minor numbers to a device name. If this devno
translation fails, then fall back to doing a full dev_cache_scan
which is slower but certain to provide the info. This preserves
the most important part of the label scanning optimization in the
vgchange aay (avoiding dev_cache_scan is a relatively small part
of the optimized activation compared to label scanning.)
Port another optimization from pvscan -aay to vgchange -aay:
"pvscan: only add device args to dev cache"
This optimization avoids doing a full dev_cache_scan, and
instead populates dev-cache with only the devices in the
VG being activated.
This involves shifting the use of pvs_online files from
the hints interface up to the higher level label_scan
interface. This specialized label_scan is structured
around creating a list of devices from the pvs_online
files. Previously, a list of all devices was created
first, and then reduced based on the pvs_online files.
The initial step of listing all devices was slow when
thousands of devices are present on the system.
This optimization extends the previous optimization that
used pvs_online files to limit the devices that were
actually scanned (i.e. reading to identify the device):
"vgchange -aay: optimize device scan using pvs_online files"
The information in /run/lvm/pvs_online/<pvid> files can
be used to build a list of devices for a given VG.
The pvscan -aay command has long used this information to
activate a VG while scanning only devices in that VG, which
is an important optimization for autoactivation.
This patch implements the same thing through the existing
device hints interface, so that the optimization can be
applied elsewhere. A future patch will take advantage of
this optimization in vgchange -aay, which is now used in
place of pvscan -aay for event activation.
When a device id is set for a device, using an idtype other
than devname, it means that sysfs has been used with the device
to match the device id. So, checking for a sysfs entry for the
device in filter-sysfs is redundant. For any other cases not
covered by this (e.g. devname ids), have filter-sysfs simply
stat /sys/dev/block/major:minor to test if the device exists
in sysfs.
The extensive processing done by filter-sysfs init is removed.
It was taking an immense amount of time with many devices, e.g.
. 1024 PVs in 520 VGs
. 520 concurrent vgchange -ay <vgname> commands
. vgchange scans only PVs in the named VG (based on pvs_online
files from a pending patch)
A large number of the vgchange commands were taking over 1 min,
and nearly half of that time was used by filter-sysfs init.
With this patch, the vgchange commands take about half the time.
When scanning configured /dev dir, avoid entring
directories with different filesystem.
This minimizes risk we will block on i.e. entring
directory with mount point.
Resolve event_activation configure option just once.
Do not print debug_devs about 'bad' filtering, when
actually filter already printed reason for skipping
Do not trace more then once about backup being disabled.
No debug when unlinked file does not exists in pvscan.
Reporting non-PVs / "all devices" is only done by
pvs -a or pvdisplay -a, so avoid the work managing
a list of all devices in process_each_pv.
In the case when it's needed, use the results of
label_scan which already determines which devs
are not PVs.
Just setting lvm.conf level=N should not send messages to
syslog (now the journal by default.)
Sending messages to syslog should require setting lvm.conf
log { syslog=1 level=N }.
Configure via lvm.conf log/journal or command line --journal.
Possible values:
"command" records command information.
"output" records default command output.
"debug" records full command debugging.
Multiple values can be set in lvm.conf as an array.
One value can be set in --journal which is added to
values set in lvm.conf
pvscan --cache <dev>
. read only dev
. create online file for dev
pvscan --listvg <dev>
. read only dev
. list VG using dev
pvscan --listlvs <dev>
. read only dev
. list VG using dev
. list LVs using dev
pvscan --cache --listvg [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. [check online files and report if VG is complete]
pvscan --cache --listlvs [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. list LVs using dev
. [check online files and report if VG is complete]
. [check online files and report if LVs are complete]
[--vgonline]
can be used with --checkcomplete, to enable use of a vg online
file. This results in only the first pvscan command to see
the complete VG to report 'VG complete', and others will report
'VG finished'. This allows the caller to easily run a single
activation of the VG.
[--udevoutput]
can be used with --cache --listvg --checkcomplete, to enable
an output mode that prints LVM_VG_NAME_COMPLETE='vgname' that
a udev rule can import, and prevents other output from the
command (other output causes udev to ignore the command.)
The list of complete LVs is meant to be passed to lvchange -aay,
or the complete VG used with vgchange -aay.
When --checkcomplete is used, lvm assumes that that the output
will be used to trigger event-based autoactivation, so the pvscan
does nothing if event_activation=0 and --checkcomplete is used.
Example of listlvs
------------------
$ lvs -a vg -olvname,devices
LV Devices
lv_a /dev/loop0(0)
lv_ab /dev/loop0(1),/dev/loop1(1)
lv_abc /dev/loop0(3),/dev/loop1(3),/dev/loop2(1)
lv_b /dev/loop1(0)
lv_c /dev/loop2(0)
$ pvscan --cache --listlvs --checkcomplete /dev/loop0
pvscan[35680] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
LV vg/lv_a complete
LV vg/lv_ab incomplete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop1
pvscan[35681] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
LV vg/lv_b complete
LV vg/lv_ab complete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop2
pvscan[35682] PV /dev/loop2 online, VG vg is complete.
VG vg complete
LV vg/lv_c complete
LV vg/lv_abc complete
Example of listvg
-----------------
$ pvscan --cache --listvg --checkcomplete /dev/loop0
pvscan[35684] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop1
pvscan[35685] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop2
pvscan[35686] PV /dev/loop2 online, VG vg is complete.
VG vg complete
The new system_id_source="appmachineid" will cause
lvm to use an lvm-specific derivation of the machine-id,
instead of the machine-id directly. This is now
recommended in place of using machineid.
Do not store full path with each archived name reduces memory usage if
the directory has thousands of entries and just add 'dir' path when
needed.
Also emit info print message to a user if the total size of archived
files for a VG is more then 128MiB or 8192 files.
TODO: Consider wheather adding a new 'lvm.conf archive{option}' to support
trimming these wild archive sizes can make situation better.
We already support retain_min && retain_days - but if user is
generating too many and too large archives with minutes - maybe archiving
should be disabled by a user - as it's not producing anything largely usable
and just slows-down command ??
If we add 'retain_max & retain_max_size' the condition will go against
each other and we need to chose priorities.
mm
Consider missing config tree from vg read to be an internal error
since we do not want to 'regenerate' this one in expesive parsing way.
Also if there is any failure on recreating committed VG, make it also
a 'vg_write' error.
Corrupt metadata text (with good mda header) was being handled
in the label_scan phase, but not in the vg_read phase. This
was sufficient because metadata areas would always be read and
checksummed during label_scan (metadata parsing was skipped
previously as an optimization.)
This changed with the optimization in
commit 61a6f9905e
"metadata: optimize reading metadata copies in scan"
Now, some metadata areas will not be read and checksummed
at all during the label_scan phase, only during the vg_read
phase. This means that bad metadata text may first be detected
in the vg_read phase. So, add equivalent bad metadata handling
to the vg_read path to match the label_scan path.
While being in lockless scanning phase, we can avoid reading and checking
matching metadata copies if we already know them from other PV
and just rely on matching metadata header information.
These copies will be examined later during locked metadata read/write
access.
This patch may postpone discovering some read failures to locked phase.
When creating lvm2 metadata for VG, lvm2 allocate some buffer,
and if buffer is not big enough, the buffer is 'reallocated' bigger,
and whole metadata creation is repeated until metadata fits.
We can try to use 'previous' metadata size as hint to reduce looping
here.
Preserve computed crc32 check from first written PV, just like we
preserve generated metadata.
Also there is no need to call crc32 twice on wrapping buffer with 2 calcs,
result must be always the same as with single crc32 checking.
sysconf() may also return -1 although rather theoretically.
Default to 4K when such case would happen.
Also in function call it just once and keep as static variable.
Ensure only nonNULL 'du' pointer is dereference altough the comment
to the last assign 'du' pointer already suggest 'NULL' case should not happen.
So just being explicit.
mer du
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
When cache creation fails on table reload path, implemen more
advanced revert solution, that tries to restore state of LVM
metadata into is look before actual caching started.
Loading invalid MQ/SMQ policy settings table line cause immediate
rejection - to prevent such failure, automatically filter valid
settins before it is being uploaded by lvm2.
For invalid setting issue a warnning informing user how to remove
them.
This solution is used to keep running cached LVs that might had
been created in the past with invalid settings that have been actually
unused due to another code bug.
New versions of kvdo module exposes statistics at new location:
/sys/block/dm-XXX/vdo/statistics/...
Enhance lvm2 to access this location first.
Also if the statistic info is missing - make it 'debug' level info,
so it is not failing 'lvs' command.
Since VDO is always returns 'zero' on unprovisioned read
and every provisioned block is always 'zeroed' on partial writes,
we can avoid 'zeroing' of such LVs.
Some device id types can only be used with specific device major
numbers, so use this restriction to avoid some comparisions.
This is more efficient, and can avoid some incorrect matches.
pvid and vgid are sometimes a null-terminated string, and
other times a 'struct id', and the two types were often
cast between each other. When a struct id was cast to a char
pointer, the resulting string would not necessarily be null
terminated. Casting a null-terminated string id to a
struct id is fine, but is still avoided when possible.
A struct id is: int8_t uuid[ID_LEN]
A string id is: char pvid[ID_LEN + 1]
A convention is introduced to help distinguish them:
- variables and struct fields named "pvid" or "vgid"
should be null-terminated strings.
- variables and struct fields named "pv_id" or "vg_id"
should be struct id's.
- examples:
char pvid[ID_LEN + 1];
char vgid[ID_LEN + 1];
struct id pv_id;
struct id vg_id;
Function names also attempt to follow this convention.
Avoid casting between the two types as much as possible,
with limited exceptions when known to be safe and clearly
commented.
Avoid using variations of strcpy and strcmp, and instead
use memcpy/memcmp with ID_LEN (with similar limited
exceptions possible.)
When splitting VG with thin/cache pool volume, handle pmspare during
such split and allocate new pmspare in new VG or extend existing pmspare
there and eventually drop pmspare in original VG if is no longer needed
there.
When check active componet of thinLV with external origin,
we need to check if the external origin isn't already active.
For this however we need to use layered check for -real device.
sysfs-based multipath component detection quit if a
device had multiple holders, and in this case would
fail to detect a device was an mpath component.
For the pv_min_size check, always use dev_get_size()
which is commonly used elsewhere, and don't bother
asking libudev for the device size when
external_device_info_source=udev.
related to config settings:
obtain_device_info_from_udev (controls if lvm gets
a list of devices from readdir /dev or from libudev)
external_device_info_source (controls if lvm asks
libudev for device information)
. Make the obtain_device_list_from_udev setting
affect only the choice of readdir /dev vs libudev.
The setting no longer controls if udev is used for
device type checks.
. Change obtain_device_list_from_udev default to 0.
This helps avoid boot timeouts due to slow libudev
queries, avoids reported failures from
udev_enumerate_scan_devices, and avoids delays from
"device not initialized in udev database" errors.
Even without errors, for a system booting with 1024 PVs,
lvm2-pvscan times improve from about 100 sec to 15 sec,
and the pvscan command from about 64 sec to about 4 sec.
. For external_device_info_source="none", remove all
libudev device info queries, and use only lvm
native device info.
. For external_device_info_source="udev", first check
lvm native device info, then check libudev info.
. Remove sleep/retry loop when attempting libudev
queries for device info. udev info will simply
be skipped if it's not immediately available.
. Only set up a libdev connection if it will be used by
obtain_device_list_from_udev/external_device_info_source.
. For native multipath component detection, use
/etc/multipath/wwids. If a device has a wwid
matching an entry in the wwids file, then it's
considered a multipath component. This is
necessary to natively detect multipath
components when the mpath device is not set up.
How to trigger:
```
~ # export LVM_SYSTEM_DIR=_
~ # pvscan
No matching physical volumes found
double free or corruption (!prev)
Aborted (core dumped)
```
when LVM_SYSTEM_DIR is empty, _load_config_file() won't be called.
when LVM_SYSTEM_DIR is not empty, cfl->cft links into cmd->config_files
by _load_config_file()@lib/commands/toolcontext.c
core dumped code: _destroy_config()@lib/commands/toolcontext.c
```
/* CONFIG_FILE/CONFIG_MERGED_FILES */
if ((cft = remove_config_tree_by_source(cmd, CONFIG_MERGED_FILES)))
config_destroy(cft);
else if ((cft = remove_config_tree_by_source(cmd, CONFIG_FILE)))
config_destroy(cft); <=== first free the cft
dm_list_iterate_items(cfl, &cmd->config_files)
config_destroy(cfl->cft); <=== double free the cft
```
Fixes: c43f2f8ae0
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
expands commit d5a06f9a7d
"pvscan: skip indexing devices used by LVs"
The dev cache index is expensive and slow, so limit it
to commands that are used to observe the state of lvm.
The index is only used to print warnings about incorrect
device use by active LVs, e.g. if an LV is using a
multipath component device instead of the multipath
device. Commands that continue to use the index and
print the warnings:
fullreport, lvmdiskscan, vgs, lvs, pvs,
vgdisplay, lvdisplay, pvdisplay,
vgscan, lvscan, pvscan (excluding --cache)
A couple other commands were borrowing the DEV_USED_FOR_LV
flag to just check if a device was actively in use by LVs.
These are converted to the new dev_is_used_by_active_lv().
dev_cache_index_devs() is taking a large amount of time
when there are many PVs. The index keeps track of
devices that are currently in use by active LVs. This
info is used to print warnings for users in some limited
cases.
The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.
This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
There have been two separate checks for metadata
validity: first that the metadata text begins with
a valid VG name, and second the checksum of the
metadata text. These happen in different places,
which means there have been two separate error paths
for invalid metadata. This also causes large metadata
to be read in multiple parts, the first part is read
just to check the vgname, and then remaining parts are
read later when the full metadata is needed.
This patch moves the vg name verification so it's
done just before the checksum verification, which
results in a single error path for invalid metadata,
and causes the entire metadata to be read together
rather that in parts from different parts of the code.
If label_scan encounters bad vg metadata, invalidate
bcache data for the device and reread the mda_header
and metadata text back to back. With concurrent commands
modifying large metadata, it's possible that the entire
metadata area can be rewritten in the time between a
command reading the mda_header and reading the metadata
text that the header points to. Since the label_scan
is just assembling an initial overview of devices, it
doesn't use locking to serialize with other commands
that may be modifying the vg metadata at the same time.
Add profilable configurable setting for vdo pool header size, that is
used as 'extra' empty space at the front and end of vdo-pool device
to avoid having a disk in the system the may have same data is real
vdo LV.
For some conversion cases however we may need to allow using '0' header size.
TODO: in this case we may eventually avoid adding 'linear' mapping layer
in future - but this requires further modification over lvm code base.