IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use dev_cache_get_existing() in a few common, high level
locations where it's obvious that only existing dev-cache
entries are wanted. This can be expanded and used in more
locations (or dev_cache_get can stop creating new entries.)
along with some basic checks for cases when a device
has no aliases.
lvm itself creates many situations where a struct device
has no valid paths, when it activates and opens an LV,
does something with it, e.g. zeroing, and then closes
and deactivates it. (dev-cache is intended for PVs, and
the use of LVs should be moved out of dev-cache in a
future patch.)
In a certain disconnected state, a block device is present on
the system, can be opened, reports a valid size, reports the
correct device id (wwid), and matches a devices file entry.
But, reading the device can still fail. In this case,
device_ids_validate() was misinterpreting the read error as
the device having no data/label on it (and no PVID).
The validate function would then clear the PVID from the
devices file entry for the device, thinking that it was
fixing the devices file (making it consistent with the on disk
state.) Fix this by not attempting to check and correct a
devices file entry that cannot be read. Also make this case
explicit in the hints validation code (which was doing the
right thing but indirectly.)
Removes some incorrect and unnecessary checks for other entries
when adding a new devices. The removed checks and corrections were
mostly redundant with what is already done by device id matching.
Other checking is reworked so the warnings are a bit different.
When a device has a wwid (from sysfs), but lvm ignores the wwid,
e.g. because it contains an unreliable "QEMU" value, then lvm
falls back to using IDTYPE=devname (the device name) as the
device id. If the device name changes after reboot, then lvm
automatically searches for the PV on other devices to find the
new device name and correct system.devices. When searching for
the PV, lvm skips looking at devices that would use other id types,
e.g. if a device would use a wwid and not a devname, then it
skips checking it. However, it failed to account for the fact
that a device may have wwid that was ignored, in which case it
should be checked.
. error exit means that lvmdevices --update would make a change.
. remove check of PART field from --check because it isn't used.
. unlink searched_devnames file to ensure check|update will search
Description stolen from linux d/b/rbd.c L3:
rbd.c -- Export ceph rados objects as a Linux block device
16 partitions seem to make sense according to L90:
#define RBD_SINGLE_MAJOR_PART_SHIFT 4
Running *scan -vvvvvvdddddd yields
#filters/filter-type.c:28 /dev/rbd1p5: Skipping: Unrecognised LVM device type 252
#filters/filter-persistent.c:131 filter caching bad /dev/rbd1p5
right now, and adding
types = ["rbd", 252]
to /e/l/lvm.conf (with the matching "252 rbd" in /p/devices) works as a
per-machine fix:
rbd1 252:16 0 1T 1 disk
|-rbd1p1 252:17 0 243M 1 part
|-rbd1p2 252:18 0 1K 1 part
`-rbd1p5 252:21 0 1023.8G 1 part
`-dev01--vg-root 253:0 0 1023.8G 0 lvm
but rbd is supported by upstream so it'd be nice to have it work OOB
If multipath component devices get through the filter and
cause lvm to see duplicate PVs, then check the wwid of the
devs and drop the component devices as if they had been
filtered. If a dm mpath device was found among the duplicates
then use that as the PV, otherwise do not use any of the
components as the PV.
"duplicate PVs" associated with multipath configs will no
longer stop commands from working.
Remove the searched_devnames file in a couple more places:
. When hints need refreshing it's possible that a missing
devices file entry could be found by searching devices
again.
. When a devices file entry devname is first found to be
incorrect, a new search for missing entries may be
useful.
When devnames are used as device ids and devnames change,
then new devices need to be located for the PVs. If the old
devname is now used by a filtered device, this was preventing
the code from searching for the new device, so the PV was
reported as missing.
Include the device name in the /run/lvm/pvs_online/pvid files.
Commands using the pvid file can use the devname to more quickly
find the correct device, vs finding the device using the
major:minor number. If the devname in the pvid file is missing
or incorrect, fall back to using the devno.
For completeness and consistency, adjust the behavior
for some variations of:
vgchange -aay --autoactivation event [vgname]
The current standard use is with a VG name arg, and the
command is only called when all pvs_online files exist.
This is the optimal case, in which only pvs_online devs
are read. This remains the same.
Clean up behaviors for some other unexpected uses of the
command:
. With no VG name arg, the command activates any VGs
that are complete according to pvs_online. If no
pvs_online files exist, it does nothing.
. If a VG name is used but no PVs online files exist for
the VG, or the PVs online files are incomplete, then
consider there could be a problem with the pvs_online
files, and fall back to a full label scan prior to
attempting the activation.
Port another optimization from pvscan -aay to vgchange -aay:
"pvscan: only add device args to dev cache"
This optimization avoids doing a full dev_cache_scan, and
instead populates dev-cache with only the devices in the
VG being activated.
This involves shifting the use of pvs_online files from
the hints interface up to the higher level label_scan
interface. This specialized label_scan is structured
around creating a list of devices from the pvs_online
files. Previously, a list of all devices was created
first, and then reduced based on the pvs_online files.
The initial step of listing all devices was slow when
thousands of devices are present on the system.
This optimization extends the previous optimization that
used pvs_online files to limit the devices that were
actually scanned (i.e. reading to identify the device):
"vgchange -aay: optimize device scan using pvs_online files"
When scanning configured /dev dir, avoid entring
directories with different filesystem.
This minimizes risk we will block on i.e. entring
directory with mount point.
Reporting non-PVs / "all devices" is only done by
pvs -a or pvdisplay -a, so avoid the work managing
a list of all devices in process_each_pv.
In the case when it's needed, use the results of
label_scan which already determines which devs
are not PVs.
sysconf() may also return -1 although rather theoretically.
Default to 4K when such case would happen.
Also in function call it just once and keep as static variable.
Ensure only nonNULL 'du' pointer is dereference altough the comment
to the last assign 'du' pointer already suggest 'NULL' case should not happen.
So just being explicit.
mer du
Some device id types can only be used with specific device major
numbers, so use this restriction to avoid some comparisions.
This is more efficient, and can avoid some incorrect matches.
pvid and vgid are sometimes a null-terminated string, and
other times a 'struct id', and the two types were often
cast between each other. When a struct id was cast to a char
pointer, the resulting string would not necessarily be null
terminated. Casting a null-terminated string id to a
struct id is fine, but is still avoided when possible.
A struct id is: int8_t uuid[ID_LEN]
A string id is: char pvid[ID_LEN + 1]
A convention is introduced to help distinguish them:
- variables and struct fields named "pvid" or "vgid"
should be null-terminated strings.
- variables and struct fields named "pv_id" or "vg_id"
should be struct id's.
- examples:
char pvid[ID_LEN + 1];
char vgid[ID_LEN + 1];
struct id pv_id;
struct id vg_id;
Function names also attempt to follow this convention.
Avoid casting between the two types as much as possible,
with limited exceptions when known to be safe and clearly
commented.
Avoid using variations of strcpy and strcmp, and instead
use memcpy/memcmp with ID_LEN (with similar limited
exceptions possible.)
sysfs-based multipath component detection quit if a
device had multiple holders, and in this case would
fail to detect a device was an mpath component.
related to config settings:
obtain_device_info_from_udev (controls if lvm gets
a list of devices from readdir /dev or from libudev)
external_device_info_source (controls if lvm asks
libudev for device information)
. Make the obtain_device_list_from_udev setting
affect only the choice of readdir /dev vs libudev.
The setting no longer controls if udev is used for
device type checks.
. Change obtain_device_list_from_udev default to 0.
This helps avoid boot timeouts due to slow libudev
queries, avoids reported failures from
udev_enumerate_scan_devices, and avoids delays from
"device not initialized in udev database" errors.
Even without errors, for a system booting with 1024 PVs,
lvm2-pvscan times improve from about 100 sec to 15 sec,
and the pvscan command from about 64 sec to about 4 sec.
. For external_device_info_source="none", remove all
libudev device info queries, and use only lvm
native device info.
. For external_device_info_source="udev", first check
lvm native device info, then check libudev info.
. Remove sleep/retry loop when attempting libudev
queries for device info. udev info will simply
be skipped if it's not immediately available.
. Only set up a libdev connection if it will be used by
obtain_device_list_from_udev/external_device_info_source.
. For native multipath component detection, use
/etc/multipath/wwids. If a device has a wwid
matching an entry in the wwids file, then it's
considered a multipath component. This is
necessary to natively detect multipath
components when the mpath device is not set up.
expands commit d5a06f9a7d
"pvscan: skip indexing devices used by LVs"
The dev cache index is expensive and slow, so limit it
to commands that are used to observe the state of lvm.
The index is only used to print warnings about incorrect
device use by active LVs, e.g. if an LV is using a
multipath component device instead of the multipath
device. Commands that continue to use the index and
print the warnings:
fullreport, lvmdiskscan, vgs, lvs, pvs,
vgdisplay, lvdisplay, pvdisplay,
vgscan, lvscan, pvscan (excluding --cache)
A couple other commands were borrowing the DEV_USED_FOR_LV
flag to just check if a device was actively in use by LVs.
These are converted to the new dev_is_used_by_active_lv().
dev_cache_index_devs() is taking a large amount of time
when there are many PVs. The index keeps track of
devices that are currently in use by active LVs. This
info is used to print warnings for users in some limited
cases.
The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.
This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
When adding a device to the devices file with --adddev, lvm
by default chooses the best device ID type for the new device.
The new --deviceidtype option allows the user to override the
built in preference. This is useful if there's a problem with
the default type, or if a secondary type is preferrable.
If the specified deviceidtype does not produce a device ID,
then lvm falls back to the preference it would otherwise use.
error reading dev and no pvid on dev were both
returning 0. make it easier for callers to
know which, if they care.
return 1 if the device could be read, regardless
of whether a pvid was found or not.
set has_pvid=1 if a pvid is found and 0 if no
pvid is found.
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path not match.
We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
With commit b44db5d1a7
needs to check allocated pointer for failed malloc().
Existing check was actually no checking anything so failing
malloc here would result in segfault (although with very
low chance to ever happen).
Use different 'hint' size for dm_hash_create() call - so
when debug info about hash is printed we can recognize which
hash was in use.
This patch doesn't change actual used size since that is always
rounded to be power of 2 and >=16 - so as such is only a
help to developer.
We could eventually use 'name' arg, but since this would have changed
API and this patchset will be routed to libdm & stable - we will
just use this small trick.
Use 'C' for alphasort - there is no need to use localized and slower
sorting for internal directory scanning.
Ensure on all code paths allocated dirent entries are released.
Optimize full path construction.
The LVM devices file lists devices that lvm can use. The default
file is /etc/lvm/devices/system.devices, and the lvmdevices(8)
command is used to add or remove device entries. If the file
does not exist, or if lvm.conf includes use_devicesfile=0, then
lvm will not use a devices file. When the devices file is in use,
the regex filter is not used, and the filter settings in lvm.conf
or on the command line are ignored.
LVM records devices in the devices file using hardware-specific
IDs, such as the WWID, and attempts to use subsystem-specific
IDs for virtual device types. These device IDs are also written
in the VG metadata. When no hardware or virtual ID is available,
lvm falls back using the unstable device name as the device ID.
When devnames are used, lvm performs extra scanning to find
devices if their devname changes, e.g. after reboot.
When proper device IDs are used, an lvm command will not look
at devices outside the devices file, but when devnames are used
as a fallback, lvm will scan devices outside the devices file
to locate PVs on renamed devices. A config setting
search_for_devnames can be used to control the scanning for
renamed devname entries.
Related to the devices file, the new command option
--devices <devnames> allows a list of devices to be specified for
the command to use, overriding the devices file. The listed
devices act as a sort of devices file in terms of limiting which
devices lvm will see and use. Devices that are not listed will
appear to be missing to the lvm command.
Multiple devices files can be kept in /etc/lvm/devices, which
allows lvm to be used with different sets of devices, e.g.
system devices do not need to be exposed to a specific application,
and the application can use lvm on its own set of devices that are
not exposed to the system. The option --devicesfile <filename> is
used to select the devices file to use with the command. Without
the option set, the default system devices file is used.
Setting --devicesfile "" causes lvm to not use a devices file.
An existing, empty devices file means lvm will see no devices.
The new command vgimportdevices adds PVs from a VG to the devices
file and updates the VG metadata to include the device IDs.
vgimportdevices -a will import all VGs into the system devices file.
LVM commands run by dmeventd not use a devices file by default,
and will look at all devices on the system. A devices file can
be created for dmeventd (/etc/lvm/devices/dmeventd.devices) If
this file exists, lvm commands run by dmeventd will use it.
Internal implementaion:
- device_ids_read - read the devices file
. add struct dev_use (du) to cmd->use_devices for each devices file entry
- dev_cache_scan - get /dev entries
. add struct device (dev) to dev_cache for each device on the system
- device_ids_match - match devices file entries to /dev entries
. match each du on cmd->use_devices to a dev in dev_cache, using device ID
. on match, set du->dev, dev->id, dev->flags MATCHED_USE_ID
- label_scan - read lvm headers and metadata from devices
. filters are applied, those that do not need data from the device
. filter-deviceid skips devs without MATCHED_USE_ID, i.e.
skips /dev entries that are not listed in the devices file
. read lvm label from dev
. filters are applied, those that use data from the device
. read lvm metadata from dev
. add info/vginfo structs for PVs/VGs (info is "lvmcache")
- device_ids_find_renamed_devs - handle devices with unstable devname ID
where devname changed
. this step only needed when devs do not have proper device IDs,
and their dev names change, e.g. after reboot sdb becomes sdc.
. detect incorrect match because PVID in the devices file entry
does not match the PVID found when the device was read above
. undo incorrect match between du and dev above
. search system devices for new location of PVID
. update devices file with new devnames for PVIDs on renamed devices
. label_scan the renamed devs
- continue with command processing
Since lvm2 normally block signals during protected
phase where it does not want to be interrupted.
Support interruptible processing when allowed
in section between sigint_allow() ... sigint_restore())
and let the 'io_getenvents()' finish with EINTR.
When bcache tries to write data to a faulty device,
it may get out of caching blocks and then just busy-loops
on a CPU - so this check protects this by checking
if there is already max_io (~64) errored blocks.
Call _wait_all() which does check whether there is still
some pending IO before sleep. Otherwise it may happen
our submitted IO operations have been already dispatched
and this call then endlessly waits for IO which are all done.
This can be reproduced when device returns quickly errors
on write requests.
Add a "device index" (di) for each device, and use this
in the bcache api to the rest of lvm. This replaces the
file descriptor (fd) in the api. The rest of lvm uses
new functions bcache_set_fd(), bcache_clear_fd(), and
bcache_change_fd() to control which fd bcache uses for
io to a particular device.
. lvm opens a dev and gets and fd.
fd = open(dev);
. lvm passes fd to the bcache layer and gets a di
to use in the bcache api for the dev.
di = bcache_set_fd(fd);
. lvm uses bcache functions, passing di for the dev.
bcache_write_bytes(di, ...), etc.
. bcache translates di to fd to do io.
. lvm closes the device and clears the di/fd bcache state.
close(fd);
bcache_clear_fd(di);
In the bcache layer, a di-to-fd translation table
(int *_fd_table) is added. When bcache needs to
perform io on a di, it uses _fd_table[di].
In the following commit, lvm will make use of the new
bcache_change_fd() function to change the fd that
bcache uses for the dev, without dropping cached blocks.
Switch remaining zero sized struct to flexible arrays to be C99
complient.
These simple rules should apply:
- The incomplete array type must be the last element within the structure.
- There cannot be an array of structures that contain a flexible array member.
- Structures that contain a flexible array member cannot be used as a member of another structure.
- The structure must contain at least one named member in addition to the flexible array member.
Although some of the code pieces should be still improved.
When initiated larger write request, it may have happened, bcache
got out of free chunks - fix the loop, that is supposed to wait
until next free chunk becomes avain available.
It's possible for a dev-cache entry to remain after all
paths for it have been removed, and other parts of the
code expect that a dev always has a name. A better fix
may be to remove a device from dev-cache after all paths
to it have been removed.
dm-integrity stores checksums of the data written to an
LV, and returns an error if data read from the LV does
not match the previously saved checksum. When used on
raid images, dm-raid will correct the error by reading
the block from another image, and the device user sees
no error. The integrity metadata (checksums) are stored
on an internal LV allocated by lvm for each linear image.
The internal LV is allocated on the same PV as the image.
Create a raid LV with an integrity layer over each
raid image (for raid levels 1,4,5,6,10):
lvcreate --type raidN --raidintegrity y [options]
Add an integrity layer to images of an existing raid LV:
lvconvert --raidintegrity y LV
Remove the integrity layer from images of a raid LV:
lvconvert --raidintegrity n LV
Settings
Use --raidintegritymode journal|bitmap (journal is default)
to configure the method used by dm-integrity to ensure
crash consistency.
Initialization
When integrity is added to an LV, the kernel needs to
initialize the integrity metadata/checksums for all blocks
in the LV. The data corruption checking performed by
dm-integrity will only operate on areas of the LV that
are already initialized. The progress of integrity
initialization is reported by the "syncpercent" LV
reporting field (and under the Cpy%Sync lvs column.)
Example: create a raid1 LV with integrity:
$ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo
Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB.
Logical volume "rr_rimage_0_imeta" created.
Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB.
Logical volume "rr_rimage_1_imeta" created.
Logical volume "rr" created.
$ lvs -a foo
LV VG Attr LSize Origin Cpy%Sync
rr foo rwi-a-r--- 1.00g 4.93
[rr_rimage_0] foo gwi-aor--- 1.00g [rr_rimage_0_iorig] 41.02
[rr_rimage_0_imeta] foo ewi-ao---- 12.00m
[rr_rimage_0_iorig] foo -wi-ao---- 1.00g
[rr_rimage_1] foo gwi-aor--- 1.00g [rr_rimage_1_iorig] 39.45
[rr_rimage_1_imeta] foo ewi-ao---- 12.00m
[rr_rimage_1_iorig] foo -wi-ao---- 1.00g
[rr_rmeta_0] foo ewi-aor--- 4.00m
[rr_rmeta_1] foo ewi-aor--- 4.00m
Do this at two levels, although one would be enough to
fix the problem seen recently:
- Ignore any reported sector size other than 512 of 4096.
If either sector size (physical or logical) is reported
as 512, then use 512. If neither are reported as 512,
and one or the other is reported as 4096, then use 4096.
If neither is reported as either 512 or 4096, then use 512.
- When rounding up a limited write in bcache to be a multiple
of the sector size, check that the resulting write size is
not larger than the bcache block itself. (This shouldn't
happen if the sector size is 512 or 4096.)
An active md device with an end superblock causes lvm to
enable full md component detection. This was being done
within the filter loop instead of before, so the full
filtering of some devs could be missed.
Also incorporate the recently added config setting that
controls the md component detection.
If udev info is missing for a device, (which would indicate
if it's an MD component), then do an end-of-device read to
check if a PV is an MD component. (This is skipped when
using hints since we already know devs in hints are good.)
A new config setting md_component_checks can be used to
disable the additional end-of-device MD checks, or to
always enable end-of-device MD checks.
When both hints and udev info are disabled/unavailable,
the end of PVs will now be scanned by default. If md
devices with end-of-device superblocks are not being
used, the extra I/O overhead can be avoided by setting
md_component_checks="start".
udev_dev_is_md_component and udev_dev_is_mpath_component
are not used for obtaining the device list, but they still
use libudev for device info. When there are problems with
udev, these functions can get stuck. So, use the existing
obtain_device_list_from_udev config setting to also control
whether these "is component" functions are used, which gives
us a way to avoid using libudev entirely when it's causing
problems.
Save the list of PVs in /run/lvm/hints. These hints
are used to reduce scanning in a number of commands
to only the PVs on the system, or only the PVs in a
requested VG (rather than all devices on the system.)
Ensure configure.h is always 1st. included header.
Maybe we could eventually introduce gcc -include option, but for now
this better uses dependency tracking.
Also move _REENTRANT and _GNU_SOURCE into configure.h so it
doesn't need to be present in various source files.
This ensures consistent compilation of headers like stdio.h since
it may produce different declaration.
commit de28637
scan: use full md filter when md 1.0 devices are present
missed the fact that md superblock version 0.90 also puts
metadata at the end of the device, so the full md filter
needs to be used when either 0.90 or 1.0 is present.