IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Take the devices file lock before creating a new devices file.
(Was missed by the change to preemptively create the devices
file prior to setup_devices(), which was done to improve the
error path.)
Names matching internal code layout.
Functionc in thin_manip.c uses thin_pool in its name.
Keep 'pool' only for function working for both cache and thin pools.
No change of functionality.
If we failed or logged anything before we actually execute given command
in lvm shell, we couldn't report the log using lastlog command after.
This patch adds specific 'pre-cmd' log report object type to identify
such log messages and enables lastlog to report even this log.
pvcreate with --uuid would segfault if a devices file entry matched
the specified pvid, but the devices file entry had no device_id, which
could happen if the entry has a devname idtype.
When the volume size is extended, there is no need to flush
IO operations (nothing can be targeting new space yet).
VDO target is supported as target that can safely work with
this condition.
Such support is also needed, when extending VDOPOOL size
while the pool is reaching its capacity - since this allows
to continue working without reaching 'out-of-space' condition
due to flushing of all in flight IO.
When we wanted to insert '#' before a config line (to comment it out),
we used dm_pool_strndup to temporarily copy the space prefix first so
we can assemble the final line with:
"<space_prefix># <key>=<value>":
out of original:
"<space_prefix><key>=<value>".
The space_prefix copy is not necessary, we can just use fprintf's
precision modifier "%.*s" to print the exact part if we alrady
know space_prefix length.
The new --valuesonly option causes the lvmconfig output to contain only
values without keys for each config node. This is practical mainly in
case where we use lvmconfig in scripts and we want to assign the value
to a different custom key or simply output the value itself without the
key.
For example:
# lvmconfig --type full activation/raid_fault_policy
raid_fault_policy="warn"
# lvmconfig --type full activation/raid_fault_policy --valuesonly
"warn"
# my_var=$(lvmconfig --type full activation/raid_fault_policy --valuesonly)
# echo $my_var
"warn"
Internally, NUM and BIN fields are marked as DM_REPORT_FIELD_TYPE_NUM_NUMBER
through libdevmapper API. The new 'json_std' format mandates that the report
string representing such a value must be a number, not an arbitrary string.
This is because numeric values in 'json_std' format do not have double quotes
around them. This practically means, we can't use string synonyms
("named reserved values") for such values and the report string must always
represent a proper number.
With 'json' and 'basic' formats, this is not an issue because 'basic' format
doesn't have any structure or typing at all and 'json' format puts all values
in quotes, including numeric ones.
When we tested lvm2, the kernel injected various random faults.
(gdb) bt
...
(gdb) p vg
$1 = (struct volume_group *) 0x0
(gdb) p use_previous_vg
$2 = (unsigned int *) 0x0
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
When lvcreate is makeing VDO pool and user has not specified -V size,
ATM we actually run 'vdoformat' twice to get properly 'extent' aligned
size matching lvm2 properties - so the 2nd. run of vdoformat actually
can stay with 'log_verbose()' so the standard printed result
is not showing confusing info (which is now also correctly using
print_unless_silent)
The 'pe_start' column was incorrectly marked as being of type NUM.
This was not correct as pe_start is actually of type SIZ, which means
it can have a size suffix and hence it's not a pure numeric value.
Proper column type is important for selection to work correctly, so we
can also do comparisons while using suffixes.
This is also important for new "json_std" output format which does not
put double quotes around pure numeric values. With pe_start incorrectly
marked as NUM instead of SIZ, this produced invalid JSON output
like '"pe_start" = 1.00m' because it contained the 'm' (or other)
size suffix. If properly marked as SIZ, this is then put in double
quotes like '"pe_start" = "1.00m"'.
multipath_component_detection=0 has always applied to the filter-based
component detection. Also apply this setting to the duplicate-PV
handling which also eliminates multipath components (based on duplicate
PVs having the same wwid.)
When creating VDO pool based of % values, lvm2 is now more clever
and avoids to create 'unsupportable' sizes of physical backend
volumes as 16TiB is maximum size supported by VDO target
(and also limited by maximum supportable slabs (8192) based on slab
size.
If the requested virtual size is approaching max supported size 4PiB,
switch header size to 0.
Newer VDO kernel target require to have matching virtual size - this
however cause incompatiblity when lvcreate is let to format VDO data
device and read the usable size from vdoformat.
Altough this is a kernel regression and will likely get fixed,
lvm2 can actually reformat VDO device to use properly aligned VDO LV
size to make this problem disappear.
Add function to check for avaialble memory for particular VDO
configuration - to avoid unnecessary machine swapping for configs
that will not fit into memory (possibly in locked section).
Formula tries to estimate RAM size machine can use also with
swapping for kernel target - but still leaving some amount of
usable RAM.
Estimation is based on documented RAM usage of VDO target.
If the /proc/meminfo would be theoretically unavailable, try to use
'sysinfo()' function, however this is giving only free RAM without
the knowledge about how much RAM could be eventually swapped.
TODO: move _get_memory_info() into generic lvm2 API function used
by other targets with non-trivial memory requirements.
Keep single source for most of values printed in lvm.conf
(still needs some conversion)
Correct max for logical threads to 60
(we may refuse some older configuration which might eventually
user higher numbers - but so far let's assume no user have ever set this
as it's been non-trivial and if would complicate code unnecessarily.)
Accept maximum of 4PiB for virtual size of VDO LV
(lvm2 will drop 'header borders to 0 for this case').
to compare with wwids in /etc/multipath/wwids when
excluding multipath components. The wwid printed
from the sysfs wwid file may not be the wwid used
in multipath wwids. Save the wwids found for each
device on dev->wwids to avoid repeating reading
and parsing the sysfs files.
Fixes commit 494372b4ee
"filter-mpath: use multipath blacklist"
to handle wwids with initial type digits 1 and 2 used
for t10 and eui ids. Originally recognized type 3 naa.
A typo of the filename after --devicesfile should result in a
command error rather than the command falling back to using no
devices file at all. Exception is vgcreate|pvcreate which
create a new devices file if the file name doesn't exist.
Explicit wwid's from these sections control whether the
same wwid in /etc/multipath/wwids is recognized as a
multipath component. Other non-wwid keywords are not
used, and may require disabling the use of the multipath
wwids file in lvm.conf.
Change messages that refer to devices being "excluded by filters"
to say just "excluded". This will avoid mistaking the word
"filters" with the lvm.conf filter setting.
Warn if a scsi device is listed in the devices file that
is used by a multipath device that is not listed. This
will happen if a scsi device is listed in the devices
file and then an mpath device is set up to use it.
The way to correct this would be to remove the devices
file entry for the component device and add a new entry
for the multipath device.
When thin-pool had queued some delete message on extension operation
such message has been 'lost' and thin-pool kernel metadata has been
left with a thin volume that no longer existed for lvm2 metadata.
dev_name(dev) returns "[unknown]" if there are no names
on dev->aliases. It's meant mainly for log messages.
Many places assume a valid path name is returned, and
use it directly. A caller that wants to use the path
from dev_name() must first check if the dev has any
paths with dm_list_empty(&dev->aliases).
Use dev_cache_get_existing() in a few common, high level
locations where it's obvious that only existing dev-cache
entries are wanted. This can be expanded and used in more
locations (or dev_cache_get can stop creating new entries.)
along with some basic checks for cases when a device
has no aliases.
lvm itself creates many situations where a struct device
has no valid paths, when it activates and opens an LV,
does something with it, e.g. zeroing, and then closes
and deactivates it. (dev-cache is intended for PVs, and
the use of LVs should be moved out of dev-cache in a
future patch.)
Different target type for LV it's not an internal error.
i.e. when target type is replaced with 'error' type - it should be
reported as regular warning and not cause interruption of command with
internall error.
With very old version of DM target driver we have to avoid
trying to use newuuid setting - otherwise we get error
during ioctl preparation phase.
Patch is fixing regression from commit:
988ea0e94c
In a certain disconnected state, a block device is present on
the system, can be opened, reports a valid size, reports the
correct device id (wwid), and matches a devices file entry.
But, reading the device can still fail. In this case,
device_ids_validate() was misinterpreting the read error as
the device having no data/label on it (and no PVID).
The validate function would then clear the PVID from the
devices file entry for the device, thinking that it was
fixing the devices file (making it consistent with the on disk
state.) Fix this by not attempting to check and correct a
devices file entry that cannot be read. Also make this case
explicit in the hints validation code (which was doing the
right thing but indirectly.)
When compiled and used with:
CFLAGS="-fsanitize=address -g -O0"
ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
we have few reported issue - they where not normally spotted, since
we were still accessing our own memory - but ouf of buffer-range.
TODO: there is still something to enhance with handling of #orphan vgids
event based autoactivation is now the only method that lvm
provides for autoactivation.
Setting lvm.conf event_activation=0 can still be used to disable
event based autoactivation commands, but doing so will no longer
enable static autoactivation.
Removes some incorrect and unnecessary checks for other entries
when adding a new devices. The removed checks and corrections were
mostly redundant with what is already done by device id matching.
Other checking is reworked so the warnings are a bit different.
When a device has a wwid (from sysfs), but lvm ignores the wwid,
e.g. because it contains an unreliable "QEMU" value, then lvm
falls back to using IDTYPE=devname (the device name) as the
device id. If the device name changes after reboot, then lvm
automatically searches for the PV on other devices to find the
new device name and correct system.devices. When searching for
the PV, lvm skips looking at devices that would use other id types,
e.g. if a device would use a wwid and not a devname, then it
skips checking it. However, it failed to account for the fact
that a device may have wwid that was ignored, in which case it
should be checked.
. error exit means that lvmdevices --update would make a change.
. remove check of PART field from --check because it isn't used.
. unlink searched_devnames file to ensure check|update will search
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
After a vg_write, this function was used to attempt to
make lvmcache data match the new state written to disk.
It was not updated correctly in a many or most cases,
and the resulting lvmcache is not actually used after
vg_write, making the update unnecessary.
This reverts commit bd2baeaaa6.
This commit broke vgrename because vgrename relies on old bugs
in lvmcache_update_vg_from_write and lvmcache_update_vgname
which need to be fixed first.
The approach to duplicate VGIDs has been that it is not possible
or not allowed, so the behavior has been undefined. The actual
result was unpredictable and/or broken, and generally unhelpful.
Improve this by recognizing the problem, displaying the VGs,
and printing a warning to fix the problem. Beyond this,
using VGs with duplicate VGIDs remains undefined, but should
work well enough to correct the problem with vgchange -u.
It's possible to create this condition without too much difficulty
by cloning PVs, followed by an incomplete attempt at making the two
VGs unique (vgrename and pvchange -u, but missing vgchange -u.)
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
Description stolen from linux d/b/rbd.c L3:
rbd.c -- Export ceph rados objects as a Linux block device
16 partitions seem to make sense according to L90:
#define RBD_SINGLE_MAJOR_PART_SHIFT 4
Running *scan -vvvvvvdddddd yields
#filters/filter-type.c:28 /dev/rbd1p5: Skipping: Unrecognised LVM device type 252
#filters/filter-persistent.c:131 filter caching bad /dev/rbd1p5
right now, and adding
types = ["rbd", 252]
to /e/l/lvm.conf (with the matching "252 rbd" in /p/devices) works as a
per-machine fix:
rbd1 252:16 0 1T 1 disk
|-rbd1p1 252:17 0 243M 1 part
|-rbd1p2 252:18 0 1K 1 part
`-rbd1p5 252:21 0 1023.8G 1 part
`-dev01--vg-root 253:0 0 1023.8G 0 lvm
but rbd is supported by upstream so it'd be nice to have it work OOB
Improve handling of md components that get through the
filter, like the previous improvement for multipath.
If md components get through the filter and trigger
duplicate PV code, then eliminate any devs entirely
that are not an md device.
If multipath component devices get through the filter and
cause lvm to see duplicate PVs, then check the wwid of the
devs and drop the component devices as if they had been
filtered. If a dm mpath device was found among the duplicates
then use that as the PV, otherwise do not use any of the
components as the PV.
"duplicate PVs" associated with multipath configs will no
longer stop commands from working.
Remove the searched_devnames file in a couple more places:
. When hints need refreshing it's possible that a missing
devices file entry could be found by searching devices
again.
. When a devices file entry devname is first found to be
incorrect, a new search for missing entries may be
useful.
When devnames are used as device ids and devnames change,
then new devices need to be located for the PVs. If the old
devname is now used by a filtered device, this was preventing
the code from searching for the new device, so the PV was
reported as missing.
If the optimized label scan fails (using online files),
then clear the device state prior to falling back to the
standard label_scan. This avoids printing output about
unexpected state.
Copy another optimization from pvscan -aay to vgchange -aay.
When using the optimized label scan for only one VG, acquire the
VG lock prior to the scan. This allows vg_read to then skip the
repeated label scan that normally happens after locking the vg.
Include the device name in the /run/lvm/pvs_online/pvid files.
Commands using the pvid file can use the devname to more quickly
find the correct device, vs finding the device using the
major:minor number. If the devname in the pvid file is missing
or incorrect, fall back to using the devno.
For completeness and consistency, adjust the behavior
for some variations of:
vgchange -aay --autoactivation event [vgname]
The current standard use is with a VG name arg, and the
command is only called when all pvs_online files exist.
This is the optimal case, in which only pvs_online devs
are read. This remains the same.
Clean up behaviors for some other unexpected uses of the
command:
. With no VG name arg, the command activates any VGs
that are complete according to pvs_online. If no
pvs_online files exist, it does nothing.
. If a VG name is used but no PVs online files exist for
the VG, or the PVs online files are incomplete, then
consider there could be a problem with the pvs_online
files, and fall back to a full label scan prior to
attempting the activation.
Part of the optimization to avoid a full dev_cache_scan requires
translating major:minor numbers to a device name. If this devno
translation fails, then fall back to doing a full dev_cache_scan
which is slower but certain to provide the info. This preserves
the most important part of the label scanning optimization in the
vgchange aay (avoiding dev_cache_scan is a relatively small part
of the optimized activation compared to label scanning.)
Port another optimization from pvscan -aay to vgchange -aay:
"pvscan: only add device args to dev cache"
This optimization avoids doing a full dev_cache_scan, and
instead populates dev-cache with only the devices in the
VG being activated.
This involves shifting the use of pvs_online files from
the hints interface up to the higher level label_scan
interface. This specialized label_scan is structured
around creating a list of devices from the pvs_online
files. Previously, a list of all devices was created
first, and then reduced based on the pvs_online files.
The initial step of listing all devices was slow when
thousands of devices are present on the system.
This optimization extends the previous optimization that
used pvs_online files to limit the devices that were
actually scanned (i.e. reading to identify the device):
"vgchange -aay: optimize device scan using pvs_online files"
The information in /run/lvm/pvs_online/<pvid> files can
be used to build a list of devices for a given VG.
The pvscan -aay command has long used this information to
activate a VG while scanning only devices in that VG, which
is an important optimization for autoactivation.
This patch implements the same thing through the existing
device hints interface, so that the optimization can be
applied elsewhere. A future patch will take advantage of
this optimization in vgchange -aay, which is now used in
place of pvscan -aay for event activation.
When a device id is set for a device, using an idtype other
than devname, it means that sysfs has been used with the device
to match the device id. So, checking for a sysfs entry for the
device in filter-sysfs is redundant. For any other cases not
covered by this (e.g. devname ids), have filter-sysfs simply
stat /sys/dev/block/major:minor to test if the device exists
in sysfs.
The extensive processing done by filter-sysfs init is removed.
It was taking an immense amount of time with many devices, e.g.
. 1024 PVs in 520 VGs
. 520 concurrent vgchange -ay <vgname> commands
. vgchange scans only PVs in the named VG (based on pvs_online
files from a pending patch)
A large number of the vgchange commands were taking over 1 min,
and nearly half of that time was used by filter-sysfs init.
With this patch, the vgchange commands take about half the time.
When scanning configured /dev dir, avoid entring
directories with different filesystem.
This minimizes risk we will block on i.e. entring
directory with mount point.
Resolve event_activation configure option just once.
Do not print debug_devs about 'bad' filtering, when
actually filter already printed reason for skipping
Do not trace more then once about backup being disabled.
No debug when unlinked file does not exists in pvscan.
Reporting non-PVs / "all devices" is only done by
pvs -a or pvdisplay -a, so avoid the work managing
a list of all devices in process_each_pv.
In the case when it's needed, use the results of
label_scan which already determines which devs
are not PVs.
Just setting lvm.conf level=N should not send messages to
syslog (now the journal by default.)
Sending messages to syslog should require setting lvm.conf
log { syslog=1 level=N }.
Configure via lvm.conf log/journal or command line --journal.
Possible values:
"command" records command information.
"output" records default command output.
"debug" records full command debugging.
Multiple values can be set in lvm.conf as an array.
One value can be set in --journal which is added to
values set in lvm.conf
pvscan --cache <dev>
. read only dev
. create online file for dev
pvscan --listvg <dev>
. read only dev
. list VG using dev
pvscan --listlvs <dev>
. read only dev
. list VG using dev
. list LVs using dev
pvscan --cache --listvg [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. [check online files and report if VG is complete]
pvscan --cache --listlvs [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. list LVs using dev
. [check online files and report if VG is complete]
. [check online files and report if LVs are complete]
[--vgonline]
can be used with --checkcomplete, to enable use of a vg online
file. This results in only the first pvscan command to see
the complete VG to report 'VG complete', and others will report
'VG finished'. This allows the caller to easily run a single
activation of the VG.
[--udevoutput]
can be used with --cache --listvg --checkcomplete, to enable
an output mode that prints LVM_VG_NAME_COMPLETE='vgname' that
a udev rule can import, and prevents other output from the
command (other output causes udev to ignore the command.)
The list of complete LVs is meant to be passed to lvchange -aay,
or the complete VG used with vgchange -aay.
When --checkcomplete is used, lvm assumes that that the output
will be used to trigger event-based autoactivation, so the pvscan
does nothing if event_activation=0 and --checkcomplete is used.
Example of listlvs
------------------
$ lvs -a vg -olvname,devices
LV Devices
lv_a /dev/loop0(0)
lv_ab /dev/loop0(1),/dev/loop1(1)
lv_abc /dev/loop0(3),/dev/loop1(3),/dev/loop2(1)
lv_b /dev/loop1(0)
lv_c /dev/loop2(0)
$ pvscan --cache --listlvs --checkcomplete /dev/loop0
pvscan[35680] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
LV vg/lv_a complete
LV vg/lv_ab incomplete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop1
pvscan[35681] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
LV vg/lv_b complete
LV vg/lv_ab complete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop2
pvscan[35682] PV /dev/loop2 online, VG vg is complete.
VG vg complete
LV vg/lv_c complete
LV vg/lv_abc complete
Example of listvg
-----------------
$ pvscan --cache --listvg --checkcomplete /dev/loop0
pvscan[35684] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop1
pvscan[35685] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop2
pvscan[35686] PV /dev/loop2 online, VG vg is complete.
VG vg complete
The new system_id_source="appmachineid" will cause
lvm to use an lvm-specific derivation of the machine-id,
instead of the machine-id directly. This is now
recommended in place of using machineid.
Do not store full path with each archived name reduces memory usage if
the directory has thousands of entries and just add 'dir' path when
needed.
Also emit info print message to a user if the total size of archived
files for a VG is more then 128MiB or 8192 files.
TODO: Consider wheather adding a new 'lvm.conf archive{option}' to support
trimming these wild archive sizes can make situation better.
We already support retain_min && retain_days - but if user is
generating too many and too large archives with minutes - maybe archiving
should be disabled by a user - as it's not producing anything largely usable
and just slows-down command ??
If we add 'retain_max & retain_max_size' the condition will go against
each other and we need to chose priorities.
mm
Consider missing config tree from vg read to be an internal error
since we do not want to 'regenerate' this one in expesive parsing way.
Also if there is any failure on recreating committed VG, make it also
a 'vg_write' error.
Corrupt metadata text (with good mda header) was being handled
in the label_scan phase, but not in the vg_read phase. This
was sufficient because metadata areas would always be read and
checksummed during label_scan (metadata parsing was skipped
previously as an optimization.)
This changed with the optimization in
commit 61a6f9905e
"metadata: optimize reading metadata copies in scan"
Now, some metadata areas will not be read and checksummed
at all during the label_scan phase, only during the vg_read
phase. This means that bad metadata text may first be detected
in the vg_read phase. So, add equivalent bad metadata handling
to the vg_read path to match the label_scan path.
While being in lockless scanning phase, we can avoid reading and checking
matching metadata copies if we already know them from other PV
and just rely on matching metadata header information.
These copies will be examined later during locked metadata read/write
access.
This patch may postpone discovering some read failures to locked phase.
When creating lvm2 metadata for VG, lvm2 allocate some buffer,
and if buffer is not big enough, the buffer is 'reallocated' bigger,
and whole metadata creation is repeated until metadata fits.
We can try to use 'previous' metadata size as hint to reduce looping
here.
Preserve computed crc32 check from first written PV, just like we
preserve generated metadata.
Also there is no need to call crc32 twice on wrapping buffer with 2 calcs,
result must be always the same as with single crc32 checking.
sysconf() may also return -1 although rather theoretically.
Default to 4K when such case would happen.
Also in function call it just once and keep as static variable.
Ensure only nonNULL 'du' pointer is dereference altough the comment
to the last assign 'du' pointer already suggest 'NULL' case should not happen.
So just being explicit.
mer du
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
When cache creation fails on table reload path, implemen more
advanced revert solution, that tries to restore state of LVM
metadata into is look before actual caching started.
Loading invalid MQ/SMQ policy settings table line cause immediate
rejection - to prevent such failure, automatically filter valid
settins before it is being uploaded by lvm2.
For invalid setting issue a warnning informing user how to remove
them.
This solution is used to keep running cached LVs that might had
been created in the past with invalid settings that have been actually
unused due to another code bug.
New versions of kvdo module exposes statistics at new location:
/sys/block/dm-XXX/vdo/statistics/...
Enhance lvm2 to access this location first.
Also if the statistic info is missing - make it 'debug' level info,
so it is not failing 'lvs' command.
Since VDO is always returns 'zero' on unprovisioned read
and every provisioned block is always 'zeroed' on partial writes,
we can avoid 'zeroing' of such LVs.
Some device id types can only be used with specific device major
numbers, so use this restriction to avoid some comparisions.
This is more efficient, and can avoid some incorrect matches.
pvid and vgid are sometimes a null-terminated string, and
other times a 'struct id', and the two types were often
cast between each other. When a struct id was cast to a char
pointer, the resulting string would not necessarily be null
terminated. Casting a null-terminated string id to a
struct id is fine, but is still avoided when possible.
A struct id is: int8_t uuid[ID_LEN]
A string id is: char pvid[ID_LEN + 1]
A convention is introduced to help distinguish them:
- variables and struct fields named "pvid" or "vgid"
should be null-terminated strings.
- variables and struct fields named "pv_id" or "vg_id"
should be struct id's.
- examples:
char pvid[ID_LEN + 1];
char vgid[ID_LEN + 1];
struct id pv_id;
struct id vg_id;
Function names also attempt to follow this convention.
Avoid casting between the two types as much as possible,
with limited exceptions when known to be safe and clearly
commented.
Avoid using variations of strcpy and strcmp, and instead
use memcpy/memcmp with ID_LEN (with similar limited
exceptions possible.)
When splitting VG with thin/cache pool volume, handle pmspare during
such split and allocate new pmspare in new VG or extend existing pmspare
there and eventually drop pmspare in original VG if is no longer needed
there.
When check active componet of thinLV with external origin,
we need to check if the external origin isn't already active.
For this however we need to use layered check for -real device.
sysfs-based multipath component detection quit if a
device had multiple holders, and in this case would
fail to detect a device was an mpath component.
For the pv_min_size check, always use dev_get_size()
which is commonly used elsewhere, and don't bother
asking libudev for the device size when
external_device_info_source=udev.
related to config settings:
obtain_device_info_from_udev (controls if lvm gets
a list of devices from readdir /dev or from libudev)
external_device_info_source (controls if lvm asks
libudev for device information)
. Make the obtain_device_list_from_udev setting
affect only the choice of readdir /dev vs libudev.
The setting no longer controls if udev is used for
device type checks.
. Change obtain_device_list_from_udev default to 0.
This helps avoid boot timeouts due to slow libudev
queries, avoids reported failures from
udev_enumerate_scan_devices, and avoids delays from
"device not initialized in udev database" errors.
Even without errors, for a system booting with 1024 PVs,
lvm2-pvscan times improve from about 100 sec to 15 sec,
and the pvscan command from about 64 sec to about 4 sec.
. For external_device_info_source="none", remove all
libudev device info queries, and use only lvm
native device info.
. For external_device_info_source="udev", first check
lvm native device info, then check libudev info.
. Remove sleep/retry loop when attempting libudev
queries for device info. udev info will simply
be skipped if it's not immediately available.
. Only set up a libdev connection if it will be used by
obtain_device_list_from_udev/external_device_info_source.
. For native multipath component detection, use
/etc/multipath/wwids. If a device has a wwid
matching an entry in the wwids file, then it's
considered a multipath component. This is
necessary to natively detect multipath
components when the mpath device is not set up.
How to trigger:
```
~ # export LVM_SYSTEM_DIR=_
~ # pvscan
No matching physical volumes found
double free or corruption (!prev)
Aborted (core dumped)
```
when LVM_SYSTEM_DIR is empty, _load_config_file() won't be called.
when LVM_SYSTEM_DIR is not empty, cfl->cft links into cmd->config_files
by _load_config_file()@lib/commands/toolcontext.c
core dumped code: _destroy_config()@lib/commands/toolcontext.c
```
/* CONFIG_FILE/CONFIG_MERGED_FILES */
if ((cft = remove_config_tree_by_source(cmd, CONFIG_MERGED_FILES)))
config_destroy(cft);
else if ((cft = remove_config_tree_by_source(cmd, CONFIG_FILE)))
config_destroy(cft); <=== first free the cft
dm_list_iterate_items(cfl, &cmd->config_files)
config_destroy(cfl->cft); <=== double free the cft
```
Fixes: c43f2f8ae0
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
expands commit d5a06f9a7d
"pvscan: skip indexing devices used by LVs"
The dev cache index is expensive and slow, so limit it
to commands that are used to observe the state of lvm.
The index is only used to print warnings about incorrect
device use by active LVs, e.g. if an LV is using a
multipath component device instead of the multipath
device. Commands that continue to use the index and
print the warnings:
fullreport, lvmdiskscan, vgs, lvs, pvs,
vgdisplay, lvdisplay, pvdisplay,
vgscan, lvscan, pvscan (excluding --cache)
A couple other commands were borrowing the DEV_USED_FOR_LV
flag to just check if a device was actively in use by LVs.
These are converted to the new dev_is_used_by_active_lv().
dev_cache_index_devs() is taking a large amount of time
when there are many PVs. The index keeps track of
devices that are currently in use by active LVs. This
info is used to print warnings for users in some limited
cases.
The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.
This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
There have been two separate checks for metadata
validity: first that the metadata text begins with
a valid VG name, and second the checksum of the
metadata text. These happen in different places,
which means there have been two separate error paths
for invalid metadata. This also causes large metadata
to be read in multiple parts, the first part is read
just to check the vgname, and then remaining parts are
read later when the full metadata is needed.
This patch moves the vg name verification so it's
done just before the checksum verification, which
results in a single error path for invalid metadata,
and causes the entire metadata to be read together
rather that in parts from different parts of the code.
If label_scan encounters bad vg metadata, invalidate
bcache data for the device and reread the mda_header
and metadata text back to back. With concurrent commands
modifying large metadata, it's possible that the entire
metadata area can be rewritten in the time between a
command reading the mda_header and reading the metadata
text that the header points to. Since the label_scan
is just assembling an initial overview of devices, it
doesn't use locking to serialize with other commands
that may be modifying the vg metadata at the same time.
Add profilable configurable setting for vdo pool header size, that is
used as 'extra' empty space at the front and end of vdo-pool device
to avoid having a disk in the system the may have same data is real
vdo LV.
For some conversion cases however we may need to allow using '0' header size.
TODO: in this case we may eventually avoid adding 'linear' mapping layer
in future - but this requires further modification over lvm code base.
When adding a device to the devices file with --adddev, lvm
by default chooses the best device ID type for the new device.
The new --deviceidtype option allows the user to override the
built in preference. This is useful if there's a problem with
the default type, or if a secondary type is preferrable.
If the specified deviceidtype does not produce a device ID,
then lvm falls back to the preference it would otherwise use.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
Instead of calling explicit archive with command processing logic,
move this step towards 1st. vg_write() call, which will automatically
store archive of committed metadata.
This slightly changes some error path where the error in archiving
was detected earlier in the command, while now some on going command
'actions' might have been, but will be simply scratched in case
of error (since even new metadata would not have been even written).
So general effect should be only some command message ordering.
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
error reading dev and no pvid on dev were both
returning 0. make it easier for callers to
know which, if they care.
return 1 if the device could be read, regardless
of whether a pvid was found or not.
set has_pvid=1 if a pvid is found and 0 if no
pvid is found.
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path not match.
We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
With commit 0b18c25d93 there
was introduced 'zalloc()' for allocation of outdates pvs,
but no matching 'free()' is present.
Switch to use cmd mempool instead of adding free() code into
several places.
The autoactivation property can be specified in lvcreate
or vgcreate for new LVs/VGs, and the property can be changed
by lvchange or vgchange for existing LVs/VGs.
--setautoactivation y|n
enables|disables autoactivation of a VG or LV.
Autoactivation is enabled by default, which is consistent with
past behavior. The disabled state is stored as a new flag
in the VG metadata, and the absence of the flag allows
autoactivation.
If autoactivation is disabled for the VG, then no LVs in the VG
will be autoactivated (the LV autoactivation property will have
no effect.) When autoactivation is enabled for the VG, then
autoactivation can be controlled on individual LVs.
The state of this property can be reported for LVs/VGs using
the "-o autoactivation" option in lvs/vgs commands, which will
report "enabled", or "" for the disabled state.
Previous versions of lvm do not recognize this property. Since
autoactivation is enabled by default, the disabled setting will
have no effect in older lvm versions. If the VG is modified by
older lvm versions, the disabled state will also be dropped from
the metadata.
The autoactivation property is an alternative to using the lvm.conf
auto_activation_volume_list, which is still applied to to VGs/LVs
in addition to the new property.
If VG or LV autoactivation is disabled either in metadata or in
auto_activation_volume_list, it will not be autoactivated.
An autoactivation command will silently skip activating an LV
when the autoactivation property is disabled.
To determine the effective autoactivation behavior for a specific
LV, multiple settings would need to be checked:
the VG autoactivation property, the LV autoactivation property,
the auto_activation_volume_list. The "activation skip" property
would also be relevant, since it applies to both normal and auto
activation.