shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00

Author	SHA1	Message	Date
David Teigland	14b68ea313	vgchange -aay: fall back to dev_cache_scan if optimization fails Part of the optimization to avoid a full dev_cache_scan requires translating major:minor numbers to a device name. If this devno translation fails, then fall back to doing a full dev_cache_scan which is slower but certain to provide the info. This preserves the most important part of the label scanning optimization in the vgchange aay (avoiding dev_cache_scan is a relatively small part of the optimized activation compared to label scanning.)	2021-11-05 17:07:13 -05:00
David Teigland	62533ae3fa	vgchange -aay: optimize device list using pvs_online files Port another optimization from pvscan -aay to vgchange -aay: "pvscan: only add device args to dev cache" This optimization avoids doing a full dev_cache_scan, and instead populates dev-cache with only the devices in the VG being activated. This involves shifting the use of pvs_online files from the hints interface up to the higher level label_scan interface. This specialized label_scan is structured around creating a list of devices from the pvs_online files. Previously, a list of all devices was created first, and then reduced based on the pvs_online files. The initial step of listing all devices was slow when thousands of devices are present on the system. This optimization extends the previous optimization that used pvs_online files to limit the devices that were actually scanned (i.e. reading to identify the device): "vgchange -aay: optimize device scan using pvs_online files"	2021-11-05 12:19:35 -05:00
David Teigland	f40fd88374	move code from pvscan.c to online.c related to managing files in /run/lvm/pvs_online and /run/lvm/vgs_online	2021-11-04 11:09:29 -05:00
David Teigland	726dd25969	add hints interface to the pvs_online file information The information in /run/lvm/pvs_online/<pvid> files can be used to build a list of devices for a given VG. The pvscan -aay command has long used this information to activate a VG while scanning only devices in that VG, which is an important optimization for autoactivation. This patch implements the same thing through the existing device hints interface, so that the optimization can be applied elsewhere. A future patch will take advantage of this optimization in vgchange -aay, which is now used in place of pvscan -aay for event activation.	2021-11-04 10:58:16 -05:00
David Teigland	6ea8d975b2	lvmdevices: increase open file limit	2021-11-03 08:50:57 -05:00
David Teigland	5d0964d127	hints: remove the cmd hints list which is no longer used after commit "toollib: remove all devices list from process_each_pv"	2021-11-01 16:01:45 -05:00
David Teigland	c38473548e	fix segfault handling duplicate PVs cmd arg was missing when switching to use an alternative duplicate dev.	2021-10-14 14:02:59 -05:00
David Teigland	6fb497ef42	toollib: remove all devices list from process_each_pv Reporting non-PVs / "all devices" is only done by pvs -a or pvdisplay -a, so avoid the work managing a list of all devices in process_each_pv. In the case when it's needed, use the results of label_scan which already determines which devs are not PVs.	2021-10-13 17:29:32 -05:00
Zdenek Kabelac	9eafd44734	gcc: use more zero length arrays Define last array struct member with zero size.	2021-09-22 17:18:50 +02:00
Zdenek Kabelac	8c44597820	gcc-fanalyzer: zallocate memory for clean buffer	2021-09-21 21:03:47 +02:00
Zdenek Kabelac	63930f576a	cov: add some initializers	2021-09-13 12:34:41 +02:00
David Teigland	96b777167c	cov: clean up pvid and vgid usage pvid and vgid are sometimes a null-terminated string, and other times a 'struct id', and the two types were often cast between each other. When a struct id was cast to a char pointer, the resulting string would not necessarily be null terminated. Casting a null-terminated string id to a struct id is fine, but is still avoided when possible. A struct id is: int8_t uuid[ID_LEN] A string id is: char pvid[ID_LEN + 1] A convention is introduced to help distinguish them: - variables and struct fields named "pvid" or "vgid" should be null-terminated strings. - variables and struct fields named "pv_id" or "vg_id" should be struct id's. - examples: char pvid[ID_LEN + 1]; char vgid[ID_LEN + 1]; struct id pv_id; struct id vg_id; Function names also attempt to follow this convention. Avoid casting between the two types as much as possible, with limited exceptions when known to be safe and clearly commented. Avoid using variations of strcpy and strcmp, and instead use memcpy/memcmp with ID_LEN (with similar limited exceptions possible.)	2021-08-16 11:31:15 -05:00
David Teigland	e035e32350	scan: retry reading metadata on error If label_scan encounters bad vg metadata, invalidate bcache data for the device and reread the mda_header and metadata text back to back. With concurrent commands modifying large metadata, it's possible that the entire metadata area can be rewritten in the time between a command reading the mda_header and reading the metadata text that the header points to. Since the label_scan is just assembling an initial overview of devices, it doesn't use locking to serialize with other commands that may be modifying the vg metadata at the same time.	2021-07-06 10:10:23 -05:00
David Teigland	d89942d157	scan: don't hold bcache block during scan This allows data from the bcache block to be invalidated and reread if needed.	2021-07-06 10:10:23 -05:00
David Teigland	4dc5d4ac7e	label_read_pvid: separate error and no-pvid error reading dev and no pvid on dev were both returning 0. make it easier for callers to know which, if they care. return 1 if the device could be read, regardless of whether a pvid was found or not. set has_pvid=1 if a pvid is found and 0 if no pvid is found.	2021-04-23 17:37:08 -05:00
Zdenek Kabelac	05eb90db68	cleanup: indent	2021-04-23 23:00:55 +02:00
Zdenek Kabelac	395ce6c2bb	cov: explicitely ignore return value	2021-04-23 23:00:55 +02:00
Zdenek Kabelac	bc1bc4cffc	debug: drop stack from regular code flow	2021-03-15 11:13:24 +01:00
Zdenek Kabelac	75037bee5d	debug: more tracing Check result of device_ids_write() and at least provide stack;	2021-03-10 01:27:13 +01:00
Zdenek Kabelac	ca12dae32b	hints: keep strings aligned in structure Preffer aligned string access.	2021-03-08 15:33:15 +01:00
Zdenek Kabelac	00531186fc	label: check only with active device for rescan When 'lv_info()' is called with &info structure, the presence of node has to be checked from this structure. Without this we were needlesly trying to look out 0:0 device.	2021-03-02 22:54:40 +01:00
David Teigland	83fe6e720f	device usage based on devices file The LVM devices file lists devices that lvm can use. The default file is /etc/lvm/devices/system.devices, and the lvmdevices(8) command is used to add or remove device entries. If the file does not exist, or if lvm.conf includes use_devicesfile=0, then lvm will not use a devices file. When the devices file is in use, the regex filter is not used, and the filter settings in lvm.conf or on the command line are ignored. LVM records devices in the devices file using hardware-specific IDs, such as the WWID, and attempts to use subsystem-specific IDs for virtual device types. These device IDs are also written in the VG metadata. When no hardware or virtual ID is available, lvm falls back using the unstable device name as the device ID. When devnames are used, lvm performs extra scanning to find devices if their devname changes, e.g. after reboot. When proper device IDs are used, an lvm command will not look at devices outside the devices file, but when devnames are used as a fallback, lvm will scan devices outside the devices file to locate PVs on renamed devices. A config setting search_for_devnames can be used to control the scanning for renamed devname entries. Related to the devices file, the new command option --devices <devnames> allows a list of devices to be specified for the command to use, overriding the devices file. The listed devices act as a sort of devices file in terms of limiting which devices lvm will see and use. Devices that are not listed will appear to be missing to the lvm command. Multiple devices files can be kept in /etc/lvm/devices, which allows lvm to be used with different sets of devices, e.g. system devices do not need to be exposed to a specific application, and the application can use lvm on its own set of devices that are not exposed to the system. The option --devicesfile <filename> is used to select the devices file to use with the command. Without the option set, the default system devices file is used. Setting --devicesfile "" causes lvm to not use a devices file. An existing, empty devices file means lvm will see no devices. The new command vgimportdevices adds PVs from a VG to the devices file and updates the VG metadata to include the device IDs. vgimportdevices -a will import all VGs into the system devices file. LVM commands run by dmeventd not use a devices file by default, and will look at all devices on the system. A devices file can be created for dmeventd (/etc/lvm/devices/dmeventd.devices) If this file exists, lvm commands run by dmeventd will use it. Internal implementaion: - device_ids_read - read the devices file . add struct dev_use (du) to cmd->use_devices for each devices file entry - dev_cache_scan - get /dev entries . add struct device (dev) to dev_cache for each device on the system - device_ids_match - match devices file entries to /dev entries . match each du on cmd->use_devices to a dev in dev_cache, using device ID . on match, set du->dev, dev->id, dev->flags MATCHED_USE_ID - label_scan - read lvm headers and metadata from devices . filters are applied, those that do not need data from the device . filter-deviceid skips devs without MATCHED_USE_ID, i.e. skips /dev entries that are not listed in the devices file . read lvm label from dev . filters are applied, those that use data from the device . read lvm metadata from dev . add info/vginfo structs for PVs/VGs (info is "lvmcache") - device_ids_find_renamed_devs - handle devices with unstable devname ID where devname changed . this step only needed when devs do not have proper device IDs, and their dev names change, e.g. after reboot sdb becomes sdc. . detect incorrect match because PVID in the devices file entry does not match the PVID found when the device was read above . undo incorrect match between du and dev above . search system devices for new location of PVID . update devices file with new devnames for PVIDs on renamed devices . label_scan the renamed devs - continue with command processing	2021-02-23 16:43:32 -06:00
David Teigland	c94d78f068	scan: wipe filters when dropping scanned data Fix clearing persistent filter state when clearing all the state from a label_scan. label_scan reads devs and saves info in bcache, lvmcache, and in the persistent filter. In some uncommon cases, an lvm command wants to clear all info from a prior label_scan, and repeat label_scan from scratch. In these cases, info in lvmcache, bcache and the persistent filter all need to be cleared before repeating label_scan. By missing the persistent filter wiping, outdated persistent filter info, from a prior label_scan, could cause lvm to incorrectly filter devices that change between polling intervals. (i.e. if the device changes in such a way that the filtering results change.) A case where lvm wants to do multiple label_scans is a polling command (like lvconvert --merge), when lvmpolld has been disabled, so that the command itself needs to to do repeated polling checks.	2021-02-10 15:34:45 -06:00
Zdenek Kabelac	a383586177	label: avoid rescaning unusable DM devices	2021-02-10 15:39:03 +01:00
David Teigland	87ee401eea	md component detection changes Move extra md component detection into the label scan phase. It had been in set_pv_devices which was deep within the vg_read phase, which wasn't a good place (better to detect that earlier.) Now that pv metadata info is available in the scan phase, the pv details (size and device_hint) can be used for extra md checking. Use the device_hint from the pv metadata to trigger a full md component check if the device_hint begins with /dev/md. Stop triggering full md component checks based on missing udev info for a dev. Changes to tests to reflect that the code is now detecting md components in some test case that it wasn't before.	2021-02-05 16:23:51 -06:00
David Teigland	2ec29d0677	label_scan: fix missing free of filtered_devs missing free of devl entries on filtered_devs list in commit `2c9bb67604`	2021-01-18 16:26:02 -06:00
David Teigland	2c31939827	pvcreate: clean up opening and filtering of args The args for pvcreate/pvremove (and vgcreate/vgextend when applicable) were not efficiently opened, scanned, and filtered. This change reorganizes the opening and filtering in the following steps: - label scan and filter all devs . open ro . standard label scan at the start of command - label scan and filter dev args . open ro . uses full md component check . typically the first scan and filter of pvcreate devs - close and reopen dev args . open rw and excl - repeat label scan and filter dev args . using reopened rw excl fd - wipe and write new headers . using reopened rw excl fd	2020-10-26 11:13:27 -05:00
David Teigland	a7f195b7e8	add label_scan_devs_cached label_scan_devs without invalidating data first for cases where the caller wants to use any bcache data they have already read.	2020-10-21 16:24:16 -05:00
David Teigland	677f829e54	add label_read_pvid To read the lvm headers and set dev->pvid if the device is a PV. Difference from label_scan_ functions is this does not read any vg metadata or add any info to lvmcache.	2020-10-21 16:24:16 -05:00
David Teigland	c7311d4722	lvmcache: rename label_read label_scan_dev for consistent naming with other similar functions	2020-10-21 16:24:16 -05:00
David Teigland	2c9bb67604	scanning: improve filtering control Filtering in label_scan was controlled indirectly by the fact that bcache was not yet set up when label_scan first ran. The result is that filters that needed data would not run and would return -EAGAIN, which would result in the dev flag FILTER_AFTER_SCAN being set. After the dev header was read for checking the label, filters would be rechecked because of FILTER_AFTER_SCAN. All filters would be checked this time because bcache was now set up, and the filters needing data would largely use data already scanned for reading the label. This design worked but is hard to adjust for future cases where bcache is already set up. Replace this method (based on setting up bcache, or not) with a new cmd flag filter_nodata_only. When this flag is set filters that need data will not run. This allows the same label_scan behavior when bcache has been set up. There are no expected changes in behavior.	2020-10-21 16:24:16 -05:00
Zdenek Kabelac	117fc64e6e	debug: no backtrace As the path already printed verbose message drop backtrace.	2020-10-02 21:04:16 +02:00
David Teigland	da14cf68cb	scanning: keep open an lvm device with scanning problem The command may want to update it.	2020-09-28 13:25:57 -05:00
David Teigland	890c7ef451	devices: fix reopen for unopened device If there's a request to reopen rw a device that's not open, then just call the normal open function.	2020-09-28 13:25:57 -05:00
David Teigland	1404e5ee61	metadata: open rw fd before closing ro fd lvm opens devices readonly to scan them, but needs to open then readwrite to update the metadata. Previously, the ro fd was closed before the rw fd was opened, leaving a small gap where the dev was not held open, and during which the dev could possibly change which storage it referred to. With the bcache_change_fd() interface, lvm opens a rw fd on a device to be written, tells bcache to change to the new rw fd, and closes the ro fd. . open dev ro . read dev with the ro fd (label_scan) . lock vg (ex for writing) . open dev rw . close ro fd . rescan dev to check if the metadata changed between the scan and the lock . if the metadata did change, reread in full . write the metadata	2020-09-18 15:10:11 -05:00
David Teigland	1570e76233	bcache: use indirection table for fd Add a "device index" (di) for each device, and use this in the bcache api to the rest of lvm. This replaces the file descriptor (fd) in the api. The rest of lvm uses new functions bcache_set_fd(), bcache_clear_fd(), and bcache_change_fd() to control which fd bcache uses for io to a particular device. . lvm opens a dev and gets and fd. fd = open(dev); . lvm passes fd to the bcache layer and gets a di to use in the bcache api for the dev. di = bcache_set_fd(fd); . lvm uses bcache functions, passing di for the dev. bcache_write_bytes(di, ...), etc. . bcache translates di to fd to do io. . lvm closes the device and clears the di/fd bcache state. close(fd); bcache_clear_fd(di); In the bcache layer, a di-to-fd translation table (int *_fd_table) is added. When bcache needs to perform io on a di, it uses _fd_table[di]. In the following commit, lvm will make use of the new bcache_change_fd() function to change the fd that bcache uses for the dev, without dropping cached blocks.	2020-09-18 15:10:11 -05:00
David Teigland	46f43589d0	hints: enhance debug messages	2020-09-16 15:01:10 -05:00
David Teigland	491eb25832	label: cleanup set_byte error exit	2020-09-16 13:54:16 -05:00
David Teigland	37bcd7ce84	Revert "label: use formaters FMTu64 and FMTsize_t" This reverts commit `d0ccb2521b`.	2020-09-16 13:47:06 -05:00
Zdenek Kabelac	d0ccb2521b	label: use formaters FMTu64 and FMTsize_t Produces code without casts to differntly signed types and also shortens and enhances readbility.	2020-09-15 23:07:06 +02:00
Zdenek Kabelac	7bcc994776	label: deduplicate dev_set_bytes As dev_write_zeros() is same as dev_set_bytes() reused the code directly.	2020-09-15 22:52:25 +02:00
Zdenek Kabelac	7b08133844	label: code deduplication	2020-09-15 22:52:25 +02:00
Zdenek Kabelac	6d344b4ac0	hints: enhance debug with log_sys_debug	2020-09-15 22:52:25 +02:00
David Teigland	8b9028bbe7	hints: remove warning when clearing hint file When the hint file cannot be accessed, silently ignore hints, like other instances do.	2020-09-02 14:06:46 -05:00
Zdenek Kabelac	a481f42630	cov: always initialized values Make sure values are initialized for all possible paths.	2020-09-01 17:57:50 +02:00
Zdenek Kabelac	85e2c7e14d	cov: explicitely ignore function result	2020-09-01 17:57:50 +02:00
Zdenek Kabelac	1705b439b1	cov: always sure we end with '0' Use easier dm_strncpy().	2020-09-01 17:57:50 +02:00
Zdenek Kabelac	fd96f1014b	gcc: zero-sized array to fexlible array C99 Switch remaining zero sized struct to flexible arrays to be C99 complient. These simple rules should apply: - The incomplete array type must be the last element within the structure. - There cannot be an array of structures that contain a flexible array member. - Structures that contain a flexible array member cannot be used as a member of another structure. - The structure must contain at least one named member in addition to the flexible array member. Although some of the code pieces should be still improved.	2020-09-01 17:57:50 +02:00
Zdenek Kabelac	19e9c88faf	gcc: do not use return with void function Follow C norm and do not use 'return' in void function to call other functions.	2020-08-28 21:43:03 +02:00
Zdenek Kabelac	ee0cb17608	gcc: use apropriate type for reading and printing values	2020-08-28 21:43:03 +02:00
Zdenek Kabelac	cca2a652d1	cov: avoid double call of free_hints() on error path Since we 'free_hints()' on return error path from call of _read_hint_file(), avoid calling it twice in the middle of error path process.	2020-06-24 15:01:03 +02:00
David Teigland	5c095400de	hints: free hint structs on exit and free on a couple error paths.	2020-05-13 17:20:16 -05:00
David Teigland	2f29765e7f	devs: add some checks for a dev with no path name It's possible for a dev-cache entry to remain after all paths for it have been removed, and other parts of the code expect that a dev always has a name. A better fix may be to remove a device from dev-cache after all paths to it have been removed.	2020-05-13 16:26:26 -05:00
David Teigland	cc4051eec0	pass cmd struct through more functions no functional change	2020-04-21 10:58:05 -05:00
David Teigland	f50e7ce76c	hints: free hint list in error exit path	2020-03-03 12:25:34 -06:00
Zdenek Kabelac	de43527f94	cov: unused header file removal cov: unused header removed Also ensure library header file with config settings goes first. Move inclusion of format-text.h into layout.h	2020-02-04 17:22:06 +01:00
David Teigland	94076245df	scan: add simple scan to find a pvid	2019-11-27 11:13:47 -06:00
David Teigland	56a295f78c	bcache: add invalidate_bytes function	2019-11-26 16:52:28 -06:00
David Teigland	7ea71a9eb9	Revert "hints: rewrite function" This reverts commit `70fb31b5d6`.	2019-11-14 12:15:05 -06:00
David Teigland	31a862a6be	Revert "debug: enhance debug messages" This reverts commit `e92d3bd1f7`.	2019-11-14 12:11:53 -06:00
Zdenek Kabelac	e92d3bd1f7	debug: enhance debug messages	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	14e01d6316	hints: drop unneeded memset strncpy will zero buffer itself.	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	1760b96368	hints: no need to check for NULL before free free() itself checks for NULL.	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	61a483a654	hints: check for _touch_hints Exit when !_touch_hints().	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	c38be06531	hints: fix mem leaking buffers	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	1349a52626	hints: validate allocation result	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	219fe72359	hints: validate sscanf results	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	d4d82dbb70	hints: allocate hint only when needed Avoid mem leaking hint on every loop continue and allocate hint only when it's going to be added into list. Switch to use 'dm_strncpy()' and validate sizes.	2019-11-14 18:06:42 +01:00
Zdenek Kabelac	70fb31b5d6	hints: rewrite function	2019-11-14 18:06:42 +01:00
Heming Zhao	13c254fc05	fix dev_unset_last_byte after write error dev_unset_last_byte() must be called while the fd is still valid. After a write error, dev_unset_last_byte() must be called before closing the dev and resetting the fd. In the write error path, dev_unset_last_byte() was being called after label_scan_invalidate() which meant that it would not unset the last_byte values. After a write error, dev_unset_last_byte() is now called in dev_write_bytes() before label_scan_invalidate(), instead of by the caller of dev_write_bytes(). In the common case of a successful write, the sequence is still: dev_set_last_byte(); dev_write_bytes(); dev_unset_last_byte(); Signed-off-by: Zhao Heming <heming.zhao@suse.com>	2019-11-13 09:36:58 -06:00
Joe Thornber	6b0d969b2a	[label] Use bcache_abort_fd() to ensure blocks are no longer in the cache. The return value from bcache_invalidate_fd() was not being checked. So I've introduced a little function, _invalidate_fd() that always calls bcache_abort_fd() if the write fails.	2019-10-28 15:01:47 +00:00
David Teigland	5b3fbccab9	hints: check for malloc failure	2019-08-28 12:41:57 -05:00
David Teigland	12707adac8	hints: fix copy of filter Only the first entry of the filter array was being included in the copy of the filter, rather than the entire thing. The result is that hints would not be refreshed if the filter was changed but the first entry was unchanged.	2019-08-28 12:33:04 -05:00
Zdenek Kabelac	55f1d8a269	configure: check for prlimit Update configure and make code compilable if prlimit() is not present. Since the code is suspicious do not cope yet with it's replacement with set/getrlimit().	2019-08-26 17:24:37 +02:00
David Teigland	61fce72a11	bcache: increase max allowed bcache size from 128MB to 512MB (the default remains 8MB)	2019-08-16 13:35:09 -05:00
David Teigland	e01fddc578	improve duplicate pv handling for md components Eliminate md components at the start so they don't interfere with actual duplicates, and don't need to be removed later. This also allows for choosing no copy of a PVID if they all happen to be md components.	2019-08-16 13:26:12 -05:00
David Teigland	677833ce6f	lvmcache: renaming functions and variables related to duplicates, no functional changes.	2019-08-16 13:26:11 -05:00
David Teigland	ecefcc9ca8	increase soft open file limit When there are more devices than the current soft open file limit (default 1024), raise the soft limit to the hard/max limit (default 4096). Do this prior to scanning in case enough of the devices are PVs that need to be kept open.	2019-08-08 15:45:03 -05:00
David Teigland	7f347698e3	Fix rounding writes up to sector size Do this at two levels, although one would be enough to fix the problem seen recently: - Ignore any reported sector size other than 512 of 4096. If either sector size (physical or logical) is reported as 512, then use 512. If neither are reported as 512, and one or the other is reported as 4096, then use 4096. If neither is reported as either 512 or 4096, then use 512. - When rounding up a limited write in bcache to be a multiple of the sector size, check that the resulting write size is not larger than the bcache block itself. (This shouldn't happen if the sector size is 512 or 4096.)	2019-07-26 14:21:08 -05:00
David Teigland	4567c6a2b2	enable full md component detection at the right time An active md device with an end superblock causes lvm to enable full md component detection. This was being done within the filter loop instead of before, so the full filtering of some devs could be missed. Also incorporate the recently added config setting that controls the md component detection.	2019-07-10 13:30:50 -05:00
David Teigland	d2b88f2715	scan: remove unused arg to setup_bcache	2019-07-09 13:16:26 -05:00
David Teigland	d16142f90f	scanning: open devs rw when rescanning for write When vg_read rescans devices with the intention of writing the VG, the label rescan can open the devs RW so they do not need to be closed and reopened RW in dev_write_bytes.	2019-06-21 10:57:49 -05:00
David Teigland	ba7ff96faf	improve reading and repairing vg metadata The fact that vg repair is implemented as a part of vg read has led to a messy and complicated implementation of vg_read, and limited and uncontrolled repair capability. This splits read and repair apart. Summary ------- - take all kinds of various repairs out of vg_read - vg_read no longer writes anything - vg_read now simply reads and returns vg metadata - vg_read ignores bad or old copies of metadata - vg_read proceeds with a single good copy of metadata - improve error checks and handling when reading - keep track of bad (corrupt) copies of metadata in lvmcache - keep track of old (seqno) copies of metadata in lvmcache - keep track of outdated PVs in lvmcache - vg_write will do basic repairs - new command vgck --updatemetdata will do all repairs Details ------- - In scan, do not delete dev from lvmcache if reading/processing fails; the dev is still present, and removing it makes it look like the dev is not there. Records are now kept about the problems with each PV so they be fixed/repaired in the appropriate places. - In scan, record a bad mda on failure, and delete the mda from mda in use list so it will not be used by vg_read or vg_write, only by repair. - In scan, succeed if any good mda on a device is found, instead of failing if any is bad. The bad/old copies of metadata should not interfere with normal usage while good copies can be used. - In scan, add a record of old mdas in lvmcache for later, do not repair them while reading, and do not let them prevent us from finding and using a good copy of metadata from elsewhere. One result is that "inconsistent metadata" is no longer a read error, but instead a record in lvmcache that can be addressed separate from the read. - Treat a dev with no good mdas like a dev with no mdas, which is an existing case we already handle. - Don't use a fake vg "handle" for returning an error from vg_read, or the vg_read_error function for getting that error number; just return null if the vg cannot be read or used, and an error_flags arg with flags set for the specific kind of error (which can be used later for determining the kind of repair.) - Saving an original copy of the vg metadata, for purposes of reverting a write, is now done explicitly in vg_read instead of being hidden in the vg_make_handle function. - When a vg is not accessible due to "access restrictions" but is otherwise fine, return the vg through the new error_vg arg so that process_each_pv can skip the PVs in the VG while processing. (This is a temporary accomodation for the way process_each_pv tracks which devs have been looked at, and can be dropped later when process_each_pv implementation dev tracking is changed.) - vg_read does not try to fix or recover a vg, but now just reads the metadata, checks access restrictions and returns it. (Checking access restrictions might be better done outside of vg_read, but this is a later improvement.) - _vg_read now simply makes one attempt to read metadata from each mda, and uses the most recent copy to return to the caller in the form of a 'vg' struct. (bad mdas were excluded during the scan and are not retried) (old mdas were not excluded during scan and are retried here) - vg_read uses _vg_read to get the latest copy of metadata from mdas, and then makes various checks against it to produce warnings, and to check if VG access is allowed (access restrictions include: writable, foreign, shared, clustered, missing pvs). - Things that were previously silently/automatically written by vg_read that are now done by vg_write, based on the records made in lvmcache during the scan and read: . clearing the missing flag . updating old copies of metadata . clearing outdated pvs . updating pv header flags - Bad/corrupt metadata are now repaired; they were not before. Test changes ------------ - A read command no longer writes the VG to repair it, so add a write command to do a repair. (inconsistent-metadata, unlost-pv) - When a missing PV is removed from a VG, and then the device is enabled again, vgck --updatemetadata is needed to clear the outdated PV before it can be used again, where it wasn't before. (lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair, mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv) Reading bad/old metadata ------------------------ - "bad metadata": the mda_header or metadata text has invalid fields or can't be parsed by lvm. This is a form of corruption that would not be caused by known failure scenarios. A checksum error is typically included among the errors reported. - "old metadata": a valid copy of the metadata that has a smaller seqno than other copies of the metadata. This can happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that has been removed from the VG is the "outdated" case below. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad metadata in particular is something that users will want to investigate and repair themselves, since it should not happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with the metadata repair command. When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Outdated PVs ------------ An outdated PV is a PV that has an old copy of VG metadata that shows it is a member of the VG, but the latest copy of the VG metadata does not include this PV. This happens if the PV is disconnected, vgreduce --removemissing is run to remove the PV from the VG, then the PV is reconnected. In this case, the outdated PV needs have its outdated metadata removed and the PV used flag needs to be cleared. This repair will be done by the subsequent repair command. It is also done if vgremove is run on the VG. MISSING PVs ----------- When a device is missing, most commands will refuse to modify the VG. This is the simple case. More complicated is when a command is allowed to modify the VG while it is missing a device. When a VG is written while a device is missing for one of it's PVs, the VG metadata is written to disk with the MISSING flag on the PV with the missing device. When the VG is next used, it is treated as if the PV with the MISSING flag still has a missing device, even if that device has reappeared. If all LVs that were using a PV with the MISSING flag are removed or repaired so that the MISSING PV is no longer used, then the next time the VG metadata is written, the MISSING flag will be dropped. Alternative methods of clearing the MISSING flag are: vgreduce --removemissing will remove PVs with missing devices, or PVs with the MISSING flag where the device has reappeared. vgextend --restoremissing will clear the MISSING flag on PVs where the device has reappeared, allowing the VG to be used normally. This must be done with caution since the reappeared device may have old data that is inconsistent with data on other PVs. Bad mda repair -------------- The new command: vgck --updatemetadata VG first uses vg_write to repair old metadata, and other basic issues mentioned above (old metadata, outdated PVs, pv_header flags, MISSING_PV flags). It will also go further and repair bad metadata: . text metadata that has a bad checksum . text metadata that is not parsable . corrupt mda_header checksum and version fields (To keep a clean diff, #if 0 is added around functions that are replaced by new code. These commented functions are removed by the following commit.)	2019-06-07 15:54:04 -05:00
David Teigland	86d831b916	change args for text label read function Have the caller pass the label_sector to the read function so the read function can set the sector field in the label struct, instead of having the read function return a pointer to the label for the caller to set the sector field. Also have the read function return a flag indicating to the caller that the scanned device was identified as a duplicate pv.	2019-06-07 15:54:04 -05:00
David Teigland	db98a6e362	Additional MD component checking If udev info is missing for a device, (which would indicate if it's an MD component), then do an end-of-device read to check if a PV is an MD component. (This is skipped when using hints since we already know devs in hints are good.) A new config setting md_component_checks can be used to disable the additional end-of-device MD checks, or to always enable end-of-device MD checks. When both hints and udev info are disabled/unavailable, the end of PVs will now be scanned by default. If md devices with end-of-device superblocks are not being used, the extra I/O overhead can be avoided by setting md_component_checks="start".	2019-06-07 13:27:16 -05:00
David Teigland	2b241eb1f6	pvck: use new dump routines for old output Use the recently added dump routines to produce the old/traditional pvck output, and remove the code that had been used for that. The validation/checking done by the new routines means that new lines prefixed with CHECK are printed for incorrect values.	2019-06-05 16:28:52 -05:00
David Teigland	dc1e12dcd4	scan: expand and update label scan comments	2019-05-21 12:02:40 -05:00
David Teigland	60bf9c9f33	hints: exclude md components In some cases md components could be included in the hints, so add a check to hint creation to make sure they are excluded.	2019-05-21 11:58:01 -05:00
David Teigland	99de816a1b	scan: remove comments about lvmetad	2019-05-02 13:32:30 -05:00
David Teigland	0046c4e7a7	use memcpy for constant ondisk strings Use memcpy/memcmp for on disk strings which are not null terminated: FMTT_MAGIC, LVM2_LABEL and LABEL_ID. Quiets compile warnings.	2019-05-02 12:59:50 -05:00
David Teigland	c3e385c108	hints: skip hint flock if nolocking option is set	2019-04-29 13:01:15 -05:00
David Teigland	8c87dda195	locking: unify global lock for flock and lockd There have been two file locks used to protect lvm "global state": "ORPHANS" and "GLOBAL". Commands that used the ORPHAN flock in exclusive mode: pvcreate, pvremove, vgcreate, vgextend, vgremove, vgcfgrestore Commands that used the ORPHAN flock in shared mode: vgimportclone, pvs, pvscan, pvresize, pvmove, pvdisplay, pvchange, fullreport Commands that used the GLOBAL flock in exclusive mode: pvchange, pvscan, vgimportclone, vgscan Commands that used the GLOBAL flock in shared mode: pvscan --cache, pvs The ORPHAN lock covers the important cases of serializing the use of orphan PVs. It also partially covers the reporting of orphan PVs (although not correctly as explained below.) The GLOBAL lock doesn't seem to have a clear purpose (it may have eroded over time.) Neither lock correctly protects the VG namespace, or orphan PV properties. To simplify and correct these issues, the two separate flocks are combined into the one GLOBAL flock, and this flock is used from the locking sites that are in place for the lvmlockd global lock. The logic behind the lvmlockd (distributed) global lock is that any command that changes "global state" needs to take the global lock in ex mode. Global state in lvm is: the list of VG names, the set of orphan PVs, and any properties of orphan PVs. Reading this global state can use the global lock in sh mode to ensure it doesn't change while being reported. The locking of global state now looks like: lockd_global() previously named lockd_gl(), acquires the distributed global lock through lvmlockd. This is unchanged. It serializes distributed lvm commands that are changing global state. This is a no-op when lvmlockd is not in use. lockf_global() acquires an flock on a local file. It serializes local lvm commands that are changing global state. lock_global() first calls lockf_global() to acquire the local flock for global state, and if this succeeds, it calls lockd_global() to acquire the distributed lock for global state. Replace instances of lockd_gl() with lock_global(), so that the existing sites for lvmlockd global state locking are now also used for local file locking of global state. Remove the previous file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN). The following commands which change global state are now serialized with the exclusive global flock: pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove, vgcreate, vgextend, vgremove, vgreduce, vgrename, vgcfgrestore, vgimportclone, vgmerge, vgsplit Commands that use a shared flock to read global state (and will be serialized against the prior list) are those that use process_each functions that are based on processing a list of all VG names, or all PVs. The list of all VGs or all PVs is global state and the shared lock prevents those lists from changing while the command is processing them. The ORPHAN lock previously attempted to produce an accurate listing of orphan PVs, but it was only acquired at the end of the command during the fake vg_read of the fake orphan vg. This is not when orphan PVs were determined; they were determined by elimination beforehand by processing all real VGs, and subtracting the PVs in the real VGs from the list of all PVs that had been identified during the initial scan. This is fixed by holding the single global lock in shared mode while processing all VGs to determine the list of orphan PVs.	2019-04-29 13:01:05 -05:00
David Teigland	ccd1386070	wipe_lv: initially open LV in writable mode wipe_lv knows it's going to write the device, so it can open rw from the start. It was opening readonly, and then dev_write needed to reopen it readwrite.	2019-04-26 14:49:27 -05:00
David Teigland	d0b869e46a	hints: fix non-empty hints list when not using hints When hints are invalid and ignored, the list of hints could be non-empty (from additions before an invalid hint was found). This confused the calling code which was checking for an empty list to see if hints were used. Ensure the list is empty when hints are not used.	2019-04-11 11:58:51 -05:00
David Teigland	0cc80ccfd5	hints: fix case of error getting device size When checking hints, if there's an error getting the device size, that should be equivalent to seeing zero size.	2019-04-11 10:32:28 -05:00
David Teigland	6f18186bfd	pvscan: print more reasons for ignoring devices	2019-04-05 15:48:12 -05:00
David Teigland	7edbf8a441	io: increase the default io memory from 4 to 8 MiB This is the default bcache size that is created at the start of the command. It needs to be large enough to hold a single copy of metadata for a given VG, or the VG cannot be read or written (since the entire VG would not fit into available memory.) Increasing the default reduces the chances of anyone needing to increase the default to use their VG. The size can be set in lvm.conf global/io_memory_size; the lower limit is 4 MiB and the upper limit is 128 MiB.	2019-03-04 12:14:06 -06:00
David Teigland	3584e0c0d5	io: warn when metadata size approaches io memory size When a single copy of metadata gets within 1MB of the current io_memory_size value, begin printing a warning that the io_memory_size should be increased.	2019-03-04 12:13:09 -06:00
David Teigland	dd8d083795	config: add new setting io_memory_size which defines the amount of memory that lvm will allocate for bcache. Increasing this setting is required if it is smaller than a single copy of VG metadata.	2019-03-04 11:36:21 -06:00
David Teigland	0aa51a2f61	hints: fix recreating hints from pvscan When aay was included in the pvscan --cache command, the activation part was complaining about the unusual state of the hint file since it had been recreated just prior.	2019-02-13 15:23:43 -06:00

1 2 3 4 5 ...

311 Commits