shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00

Author	SHA1	Message	Date
Zdenek Kabelac	0e5f39a5ac	snapshot: use single merging sequence The resume of 'released' 'COW' should preceed the resume of origin. The fact we need to do the sequence differently for merge was cause by bugs fixed in 2 previous commits - so we no longer need to recognize 'merging' and we should always go with single sequence. The importance of this order is - to properly remove '-real' device from origin LV. When COW is activated as 2nd. '-real' device is kept in table as it cannot be removed during 1st. resume of origin, and later activation of COW LV no longer builds tree associated with origin LV.	2019-10-26 00:49:16 +02:00
Zdenek Kabelac	855b16ce14	snapshot: fix checking of merged thin volume When merging of thin snapshot is taking place, the origin target will be of thin type.	2019-10-26 00:49:16 +02:00
Zdenek Kabelac	9968be55ed	snapshot: correctly check device id of merged thin When checking device id of a thin device that is just being merged - the snapshot actually could have been already finished which means '-real' suffix for the LV is already gone and just LV is there - so check explicitely for this condition and use correct UUID for this case.	2019-10-26 00:49:16 +02:00
David Teigland	6a8bd0c509	lvmlockd: fix cachevol locking When a cachevol LV is attached, have the LV keep it's lock allocated. The lock on the cachevol won't be used while it's attached. When the cachevol is split a new lock does not need to be allocated. (Applies to cachevol usage by both dm-cache and dm-writecache.)	2019-10-25 14:08:59 -05:00
Zdenek Kabelac	80ae7206a8	cache: _cpool is protected suffix now	2019-10-22 16:07:21 +02:00
Zdenek Kabelac	a7563dc6a1	gcc: older version can't see udev is always set	2019-10-22 13:39:22 +02:00
David Teigland	c08704cee7	cachevol: use cachepool code for metadata size Based on a more detailed calculation, but because of extent size rounding, the final result is about the same.	2019-10-21 12:13:33 -05:00
Zdenek Kabelac	0c01a4c2a6	gcc: avoid warning: declaration of xxx shadows a global declaration Fix some gcc complaints again shadowing global declarations	2019-10-21 15:32:35 +02:00
Zdenek Kabelac	f61d828c86	gcc: older compiler is happier with this initilizer	2019-10-21 15:32:35 +02:00
Zdenek Kabelac	dd7629ea09	cache: use _cpool for used cache-pools When LV gets cached and uses cache-pool - such cache-pool will now get _cpool suffix automatically. Thus 'Pool' column for cached LV will now show either _cvol or _cpool LV.	2019-10-21 15:31:33 +02:00
Zdenek Kabelac	766dedb628	lvm-string: add drop_lvname_suffix Internal function to drop suffix out of lvname.	2019-10-21 12:14:15 +02:00
Zdenek Kabelac	2266a1863f	lv_manip: add lv_uniq_rename_update Add function to rename LV to either passed name or if the name is already in use, generate new lvol% name.	2019-10-21 12:14:15 +02:00
Zdenek Kabelac	ec85dfe0f8	cachevol: support removal of cachevol Removal of cachevol is equivalent of lvconvert --uncache and works the same way as with cachepool.	2019-10-17 13:03:50 +02:00
Zdenek Kabelac	5938cde11b	cache: single code for removal of cached volume Use same routine for dropping cached LV for cachevol and cachepool.	2019-10-17 13:03:50 +02:00
Zdenek Kabelac	9969361b51	debug: missing trace	2019-10-17 13:03:50 +02:00
Zdenek Kabelac	dab4a2c893	cachevol: move flag setting after taking archive Before 'archive()' is called, lvm2 must not touch/modify metadata. So move setting CACHE_VOL related flags past this point. Also make sure reading of cache segtype always restores this flag properly (even if compatible flag would be lost).	2019-10-17 13:03:50 +02:00
Zdenek Kabelac	f63e20ebcc	cache: drop validation check Since now we can cache either with cache-pool LV or any other LV (being used as cachevol LV) drop the validation condition.	2019-10-17 13:03:49 +02:00
Zdenek Kabelac	af8cfa90d9	cache: add more comments for min meta size Enhance source code with better explanation how the minimal metadata size is evaluated from data size and chunk size.	2019-10-17 13:03:49 +02:00
Zdenek Kabelac	2a08d6d1d4	cachevol: use CVOL UUID for cdata and cmeta layered devices Since code is using -cdata and -cmeta UUID suffixes, it does not need any new 'extra' ID to be generated and stored in metadata. Since introduce of new 'segtype' cache+CACHE_USES_CACHEVOL we can safely assume 'new' cache with cachevol will now be created without extra metadata_id and data_id in metadata. For backward compatibility, code still reads them in case older version of metadata have them - so it still should be able to activate such volumes. Bonus is lowered size of lv structure used to store info about LV (noticable with big volume groups).	2019-10-17 13:03:49 +02:00
David Teigland	81fe045714	cache: change default cachevol metadata sizes The first part of a cachevol LV is used for metadata, and the rest of the space is used for data. The division of space between metadata and data depends on the total size of the cachevol. The previous division gave more space than needed to metadata, it was: cachevol size 8M to 128M -> metadata size 16M * cachevol size 128M to 1G -> metadata size 32M cachevol size 1G and up -> metadata size 64M (* if this resulted in over half the LV used as metadata, then half the cachevol would be used for metadata, and the other half for data.) The division of space now gives less space to metadata, it is: cachevol size 8M to 16M -> metadata size 4M cachevol size 16M to 4G -> metadata size 8M cachevol size 4G to 16G -> metadata size 16M cachevol size 16G to 32G -> metadata size 32M cachevol size 32G and up -> metadata size 64M	2019-10-15 14:36:03 -05:00
David Teigland	0443d00ff1	allow activating known LVs when other LVs have unknown segtypes When a VG contains some LVs with unknown segtypes, the user should still be allowed to activate other LVs in the VG that are understood. $ lvs foo WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL. WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL LV VG Attr LSize lvol0 foo -wi------- 4.00m other foo vwi---u--- 48.00m $ lvcreate -l1 foo WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL. WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL Cannot change VG foo with unknown segments in it! Cannot process volume group foo $ lvchange -ay foo/lvol0 WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL. WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL $ lvchange -ay foo/other WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL. WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL Refusing activation of LV foo/other containing an unrecognised segment. $ lvs foo WARNING: Unrecognised flag CACHE_USES_CACHEVOL in segment type cache+CACHE_USES_CACHEVOL. WARNING: Unrecognised segment type cache+CACHE_USES_CACHEVOL LV VG Attr LSize lvol0 foo -wi-a----- 4.00m other foo vwi---u--- 48.00m	2019-10-15 14:34:53 -05:00
David Teigland	91ee025d5b	cache: change cachevol flags for backward compat A cachevol LV had the CACHE_VOL status flag in metadata, and the cache LV using it had no new flag. This caused problems if the new metadata was used by an old version of lvm. An old version of lvm would have two problems processing the new metadata: . The old lvm would return an error when reading the VG metadata when it saw the unknown CACHE_VOL status flag. . The old lvm would return an error when reading the VG metadata because it would not find an expected cache pool attached to the cache LV (since the cache LV had a cachevol attached instead.) Change the use of flags: . Change the CACHE_VOL flag to be a COMPATIBLE flag (instead of a STATUS flag) so that old versions will not fail when they see it. . When a cache LV is using a cachevol, the cache LV gets a new SEGTYPE flag CACHE_USES_CACHEVOL. This flag is appended to the segtype name, so that old lvm versions will fail to use the LV because of an unknown segtype, as opposed to failing to read the VG.	2019-10-15 09:05:52 -05:00
Zdenek Kabelac	1cd308d640	cachevol: drop no longer needed functions Code is no longer used/needed.	2019-10-14 15:20:25 +02:00
Zdenek Kabelac	201ffbd04a	cachevol: use lv_cache_remove Use same routine for dropping cache.	2019-10-14 15:20:25 +02:00
Zdenek Kabelac	2825ad9dd2	cachevol: improve manipulation with dm tree Enhance activation of cached devices using cachevol. Correctly instatiace cachevol -cdata & -cmeta devices with '-' in name (as they are only layered devices). Code is also a bit more compacted (although still not ideal, as the usage of extra UUIDs stored in metadata is troublesome and will be repaired later). NOTE: this patch my brink potentially minor incompatiblity for 'runtime' upgrade	2019-10-14 15:17:50 +02:00
Zdenek Kabelac	a454a1b4ea	cachevol: put _cvol as protected suffix. This revert "drop cvol dm uuid suffix for cachevol LVs" commit `5191057d9d`. Start using -cvol for DM UUID.	2019-10-14 15:16:05 +02:00
Zdenek Kabelac	77deadd3af	cachevol: drop LV_CACHE_VOL on detach automatically Move dropping of cachevol flag into detach function. TODO: this flag should be internal to lvm2.	2019-10-14 15:15:14 +02:00
Zdenek Kabelac	615e18f5b2	cache: enhance removal function to work with cvol To keep things simple, use same code for all cache removal functions, not just for cachepools but also cachevols.	2019-10-14 15:14:25 +02:00
Zdenek Kabelac	6ee83f699b	cache: correct condition	2019-10-14 15:14:25 +02:00
Zdenek Kabelac	bc35ccd174	cache: recognize cachevol with lv_cache_remove	2019-10-14 15:14:25 +02:00
Zdenek Kabelac	36944e1009	cache: reload only when switched to cleaner policy Reload cache target only when lvm2 reload table with cache with clearer policy.	2019-10-14 15:14:22 +02:00
David Teigland	bd21736e8b	vgck: let updatemetadata repair mismatched metadata Let vgck --updatemetadata repair cases where different mdas hold indepedently valid but unmatching copies of the metadata, i.e. different text metadata checksums or text metadata sizes.	2019-10-11 12:57:39 -05:00
David Teigland	d6ffc99052	vgck: fix updatemetadata writing different descriptions vgck --updatemetadata would write the same correct metadata to good mdas, and then to bad mdas, but the sequence of vg_write/vg_commit calls betwen good and bad mdas could cause a different description field to be generated for good/bad mdas. (The description field describing the command was recently included in the ondisk copy of the metadata text.)	2019-10-11 12:57:32 -05:00
David Teigland	fe16d296b0	pvmove: remove some cmirror related code which is no longer used	2019-10-11 11:31:42 -05:00
David Teigland	b6240c9188	vgremove: remove internal lvmlock LV If a VG is forcibly changed from lock_type sanlock to lock_type none, the internal lvmlock LV is left behind. If that LV is not removed before vgremove is run on the VG, then an internal check will be triggered by the hidden lvmlock LV. So, check for and remove a left over lvmlock LV during vgremove.	2019-10-04 12:01:30 -05:00
Zdenek Kabelac	ca70dc4540	vdo: add lvs fields to query vdo volume properties Add lots of vdo fields: vdo_operating_mode - For vdo pools, its current operating mode. vdo_compression_state - For vdo pools, whether compression is running. vdo_index_state - For vdo pools, state of index for deduplication. vdo_used_size - For vdo pools, currently used space. vdo_saving_percent - For vdo pools, percentage of saved space. vdo_compression - Set for compressed LV (vdopool). vdo_deduplication - Set for deduplicated LV (vdopool). vdo_use_metadata_hints - Use REQ_SYNC for writes (vdopool). vdo_minimum_io_size - Minimum acceptable IO size (vdopool). vdo_block_map_cache_size - Allocated caching size (vdopool). vdo_block_map_era_length - Speed of cache writes (vdopool). vdo_use_sparse_index - Sparse indexing (vdopool). vdo_index_memory_size - Allocated indexing memory (vdopool). vdo_slab_size - Increment size for growing (vdopool). vdo_ack_threads - Acknowledging threads (vdopool). vdo_bio_threads - IO submitting threads (vdopool). vdo_bio_rotation - IO enqueue (vdopool). vdo_cpu_threads - CPU threads for compression and hashing (vdopool). vdo_hash_zone_threads - Threads for subdivide parts (vdopool). vdo_logical_threads - Logical threads for subdivide parts (vdopool). vdo_physical_threads - Physical threads for subdivide parts (vdopool). vdo_max_discard - Maximum discard size volume can recieve (vdopool). vdo_write_policy - Specified write policy (vdopool). vdo_header_size - Header size at front of vdopool. Previously only 'lvdisplay -m' was exposing them.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	862aa06e5e	vdo: remember configure VDO write policy in metadata Store write_policy in vdopool metadata. In case it's not present 'auto' is selected.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	7ca9be034f	vdo: field update	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	cf8aee096f	vdo: introduce get_vdo_write_policy_name	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	c756f76802	vdo: correct internal API for set_vdo_write_policy This is 'setting' function.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	9d8a028e8c	vdo: keep minimum_io_size in sectors	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	aad91330fe	vdo: raise VDO default bio threads to 4 Since 'vdo create' tends to use this setting, update lvm2 to provide same default.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	98419e0667	display: try to show status for inactive vdopool Since we now support activation of 'vdo' volume without explicit activation of 'vdopool' it's now possible to have active layer vdopool (-vpool) volume and having vdopool itself inactive - yet still in this case we can show available stats for this volume. But we need to show correct activation status and other standard info.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	6a9a4b4534	resize: continue change for getting vdo status before resize Continue commit `a98b77c164`. There needs to be error reported when status can't be obtained.	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	cb5f0bdba9	cache: report for succesful status	2019-10-04 17:31:55 +02:00
David Teigland	a68258339d	lvmlockd: set failure flag for test mode Set a failure flag when vg_read returns an error for test mode. The caller can segfault if there's an error with no flag set.	2019-10-04 10:09:49 -05:00
David Teigland	f836fe3836	scan: use PV device name hint for choosing duplicate PV Prefer a device if its name matches the PV device name hint.	2019-09-30 11:38:10 -05:00
David Teigland	4910a31f6d	scan: use PV size for choosing duplicate PV Prefer a device if it matches the size of the PV.	2019-09-30 11:38:10 -05:00
David Teigland	f3084ee2e5	scan: add PV summary info to lvmcache Expand the lvmcache info that is saved by the scan to include PV info from the metadata.	2019-09-30 11:38:10 -05:00
David Teigland	3a8e41a67b	metadata: import device name hint from metadata Start by using it in a comment for a missing PV.	2019-09-30 11:38:10 -05:00
David Teigland	fcfabb26a5	metadata: add args to metadata import functions instead of getting them through fid arg no functional change	2019-09-30 11:38:10 -05:00
Zdenek Kabelac	5c0264d689	vdo: restore monitoring of vdo pool Switch to -vpool layered name needs to monitor proper device.	2019-09-30 13:34:34 +02:00
Zdenek Kabelac	a98b77c164	vdo: properly check percentage for resize Avoid checking 'lv_is_active()' since special LV types does this validation anyway what calling _percent() function and call it ONLY when none of special types is queried. This restores support for VDO resize (as with support for separate VDO pool activation, plain query for lv_is_active() is not working in this case).	2019-09-30 13:34:34 +02:00
Zdenek Kabelac	c813db8fc2	vdo: deactivate forgotten vdo pool If the linear mapping is lost (for whatever reason, i.e. test suite forcible 'dmsetup remove' linear LV, lvm2 had hard times figuring out how to deactivate such DM table. So add function which is in case inactive VDO pool LV checks if the pool is actually still active (-vpool device present) and it has open count == 0. In this case deactivation is allowed to continue and cleanup DM table.	2019-09-30 13:34:34 +02:00
David Teigland	26596ce7fa	writecache: allow removing LV with attached writecache	2019-09-24 15:51:05 -05:00
David Teigland	76dd9b2b51	writecache: move code into new file put writecache specific code in writecache_manip.c should be no functional change	2019-09-24 15:51:05 -05:00
David Teigland	56aadd7fe2	lvremove: remove attached cachevol with removed LV When an LV is removed that has an attached cachevol, also remove the cachevol LV.	2019-09-24 15:51:05 -05:00
David Teigland	5191057d9d	drop cvol dm uuid suffix for cachevol LVs The "-cvol" suffix on the uuid is interfering with activation code, so drop the suffix for now.	2019-09-23 14:13:31 -05:00
David Teigland	27c3c1d7c8	writecache: display layout and role fields	2019-09-20 14:55:11 -05:00
David Teigland	6f7d7089b4	writecache: use dm suffixes and lv attributes - use internal CACHE_VOL flag on cachevol LV - add suffixes to dm uuids for internal LVs - display appropriate letters in the LV attr field - display writecache's cachevol in lvs output	2019-09-20 14:08:51 -05:00
David Teigland	5d3bced5ea	lvconvert: detaching cachevol with missing PVs . For dm-cache in writethrough, always allow splitcache, whether the cache is missing PVs or not. . For dm-cache in writeback, if the cache is missing PVs, allow splitcache with force and yes. . For dm-writecache, if the cache is missing PVs, allow splitcache with force and yes.	2019-09-20 09:59:37 -05:00
David Teigland	515e37b6dd	cachevol: add dm uuid suffixes to hidden lvs to indicate they are private lvm devs	2019-09-20 09:59:37 -05:00
David Teigland	d2c065789c	lvconvert: cachevol LV can have multiple segments	2019-09-20 09:59:37 -05:00
Zdenek Kabelac	6612d8dd5e	vdo: enhance activation with layer -vpool Enhance 'activation' experience for VDO pool to more closely match what happens for thin-pools where we do use a 'fake' LV to keep pool running even when no thinLVs are active. This gives user a choice whether he want to keep thin-pool running (wihout possibly lenghty activation/deactivation process) As we do plan to support multple VDO LVs to be mapped into a single VDO, we want to give user same experience and 'use-patter' as with thin-pools. This patch gives option to activate VDO pool only without activating VDO LV. Also due to 'fake' layering LV we can protect usage of VDO pool from command like 'mkfs' which do require exlusive access to the volume, which is no longer possible. Note: VDO pool contains 1024 initial sectors as 'empty' header - such header is also exposed in layered LV (as read-only LV). For blkid we are indentified as LV with UUID suffix - thus private DM device of lvm2 - so we do not need to store any extra info in this header space (aka zero is good enough).	2019-09-17 13:17:19 +02:00
Zdenek Kabelac	66f69e766e	thin: activate layer pool aas read-only LV When lvm2 is activating layered pool LV (to basically keep pool opened, the other function used to be 'locking' be in sync with DM table) use this LV in read-only mode - this prevents 'write' access into data volume content of thin-pool. Note: since EMPTY/unused thin-pool is created as 'public LV' for generic use by any user who i.e. wish to maintain thin-pool and thins himself. At this moment, thin-pool appears as writable LV. As soon as the 1st. thinLV is created, layer volume will appear is 'read-only' LV from this moment.	2019-09-17 13:16:50 +02:00
Zdenek Kabelac	693215716b	devices: crypto skip Devices with UUID signature CRYPT-SUBDEV are internal crypto devices.	2019-09-17 13:15:22 +02:00
David Teigland	fcbffbdbc0	bcache: change log level for prefetch message The "new new blocks" message was printed as an error but it's not an error condition.	2019-09-03 12:02:09 -05:00
David Teigland	25b58310e3	pvscan: avoid full scan for activation When an online PV completed a VG, the standard activation functions were used to activate the VG. These functions use a full scan of all devs. When many pvscans are run during startup and need to activate many VGs, scanning all devs from all the pvscans can take a long time. Optimize VG activation in pvscan to scan only the devs in the VG being activated. This makes use of the online file info that was used to determine the VG was complete. The downside of this approach is that pvscan activation will not detect duplicate PVs and block activation, where a normal activation command (which scans all devices) would.	2019-09-03 10:11:16 -05:00
David Teigland	98d420200e	vgextend: check missing device during block size check Checking the block size when a device is missing could trigger a segfault.	2019-09-03 10:07:56 -05:00
David Teigland	7cfbf3a394	fix segfault for invalid characters in vg name Fixes a regression from commit `ba7ff96faf` "improve reading and repairing vg metadata" where the error path for a vg name with invalid charaters was missing an error flag, which led to the caller not recognizing an error occured. Previously, an error flag was hidden in the old _vg_make_handle function.	2019-08-29 11:35:46 -05:00
David Teigland	5b3fbccab9	hints: check for malloc failure	2019-08-28 12:41:57 -05:00
David Teigland	12707adac8	hints: fix copy of filter Only the first entry of the filter array was being included in the copy of the filter, rather than the entire thing. The result is that hints would not be refreshed if the filter was changed but the first entry was unchanged.	2019-08-28 12:33:04 -05:00
David Teigland	dcbed38b33	fix duplicate pv size check Fixes a segfault in the recent commit `e01fddc57`: "improve duplicate pv handling for md components" While choosing between duplicates, the info struct is not always valid; it may have been dropped already. Remove the code that was still using the info struct for size comparisons. The size comparisons were a bogus check anyway because it was just preferring the dev that had already been chosen, it wasn't actually comparing the dev size to the PV size. It would be good to use a dev/PV size comparison in the duplicate handling code, but the PV size is not available until after vg_read, not from the scan.	2019-08-27 15:40:24 -05:00
Zdenek Kabelac	b2885b7103	activation: use cmd pending mem for pending_delete Since we need to preserve allocated strings across 2 separate activation calls of '_tree_action()' we need to use other mem pool them dm->mem - but since cmd->mem is released between individual lvm2 locking calls, we rather introduce a new separate mem pool just for pending deletes with easy to see life-span. (not using 'libmem' as it would basicaly keep allocations over the whole lifetime of clvmd) This patch is fixing previous commmit where the memory was improperly used after pool release.	2019-08-27 15:54:42 +02:00
Zdenek Kabelac	55f1d8a269	configure: check for prlimit Update configure and make code compilable if prlimit() is not present. Since the code is suspicious do not cope yet with it's replacement with set/getrlimit().	2019-08-26 17:24:37 +02:00
Zdenek Kabelac	4b1dcc2eeb	lv_manip: add synchronizations New udev in rawhide seems to be 'dropping' udev rule operations for devices that are no longer existing - while this is 'probably' a bug - it's revealing moments in lvm2 that likely should not run in a single transaction and we should wait for a cookie before submitting more work. TODO: it seem more 'error' paths should always include synchronization before starting deactivating 'just activated' devices. We should probably figure out some 'automatic' solution for this instead of placing sync_local_dev_name() all over the place...	2019-08-26 15:32:19 +02:00
Zdenek Kabelac	c98e34e4d0	cache: improve vgremove loop Support internal removal of 'cache origin' volume - which we do not normally expose to a user - however internal processing loops may hit this condition (depending on order of list LVs). So when this operation is internally requested - we automatically try to remove it's 'holding' LV (cache LV) - which will also remove the origin.	2019-08-26 15:32:12 +02:00
Zdenek Kabelac	af0b84ccc8	snapshot: always activate Drop the 'cluster-only' optimization so we do resume ALL device before we try to wait on cookie before 'removal' operation. It's more correct order of operation - alhtough possibly slightly less efficient - but until we have correct list of operations 'in-progress' we can't do anything better.	2019-08-26 15:23:44 +02:00
Zdenek Kabelac	7833c45fbe	activation: extend handling of pending_delete With previous patch `30a98e4d67` we started to put devices one pending_delete list instead of directly scheduling their removal. However we have operations like 'snapshot merge' where we are resuming device tree in 2 subsequent activation calls - so 1st such call will still have suspened devices and no chance to push 'remove' ioctl. Since we curently cannot easily solve this by doing just single activation call (which would be preferred solution) - we introduce a preservation of pending_delete via command structure and then restore it on next activation call. This way we keep to remove devices later - although it might be not the best moment - this may need futher tunning. Also we don't keep the list of operation in 1 trasaction (unless we do verify udev symlinks) - this could probably also make it more correct in terms of which 'remove' can be combined we already running 'resume'.	2019-08-26 15:16:38 +02:00
Zdenek Kabelac	30a98e4d67	activation: add synchronization point Resuming of 'error' table entry followed with it's dirrect removal is now troublesame with latest udev as it may skip processing of udev rules for already 'dropped' device nodes. As we cannot 'synchronize' with udev while we know we have devices in suspended state - rework 'cleanup' so it collects nodes for removal into pending_delete list and process the list with synchronization once we are without any suspended nodes.	2019-08-20 12:46:11 +02:00
Zdenek Kabelac	0451225c19	pvmove: correcting read_ahead setting When pvmove is finished, we do a tricky operation since we try to resume multiple different device that were all joined into 1 big tree. Currently we use the infromation from existing live DM table, where we can get list of all holders of pvmove device. We look for these nodes (by uuid) in new metadata, and we do now a full regular device add into dm tree structure. All devices should be already PRELOAD with correct table before entering suspend state, however for correctly working readahead we need to put correct info also into RESUME tree. Since table are preloaded, the same table is skip and resume, but correct read ahead is now set.	2019-08-20 12:37:32 +02:00
David Teigland	0534cd9cd4	pvscan: disable sleeping and retrying for udev When systemd is running pvscans, udev may not be entirely initialized, so the pvscan should not sleep and retry waiting for udev info.	2019-08-16 14:41:26 -05:00
David Teigland	61fce72a11	bcache: increase max allowed bcache size from 128MB to 512MB (the default remains 8MB)	2019-08-16 13:35:09 -05:00
David Teigland	e01fddc578	improve duplicate pv handling for md components Eliminate md components at the start so they don't interfere with actual duplicates, and don't need to be removed later. This also allows for choosing no copy of a PVID if they all happen to be md components.	2019-08-16 13:26:12 -05:00
David Teigland	ee4a32e992	lvmcache: use devl list helper	2019-08-16 13:26:12 -05:00
David Teigland	96dfad5022	lvmcache: replace found_duplicates variable With just checking if the duplicates lists are empty.	2019-08-16 13:26:11 -05:00
David Teigland	677833ce6f	lvmcache: renaming functions and variables related to duplicates, no functional changes.	2019-08-16 13:26:11 -05:00
David Teigland	65bcd16be2	md component detection addition in vg_read Usually md components are eliminated in label scan and/or duplicate resolution, but they could sometimes get into the vg_read stage, where set_pv_devices compares the device to the PV. If set_pv_devices runs an md component check and finds one, vg_read should eliminate the components. In set_pv_devices, run an md component check always if the PV is smaller than the device (this is not very common.) If the PV is larger than the device, (more common), do the component check when the config setting is "auto" (the default).	2019-08-16 13:24:34 -05:00
David Teigland	ecefcc9ca8	increase soft open file limit When there are more devices than the current soft open file limit (default 1024), raise the soft limit to the hard/max limit (default 4096). Do this prior to scanning in case enough of the devices are PVs that need to be kept open.	2019-08-08 15:45:03 -05:00
David Teigland	eb6aa5fefe	devices: put ifdef around BLKPBSZGET BLKPBSZGET is not defined before kernel version 2.6.32 (e.g. rhel5)	2019-08-08 15:45:03 -05:00
David Teigland	09bc2d0fd1	devices: clean up block size functions Replace calls to the old dev_get_block_size function with calls to the new dev_get_direct_block_size function, and remove the old function.	2019-08-07 11:48:10 -05:00
David Teigland	bec3088f85	Revert "config: cache_policy should be cfg_runtime" This reverts commit `29eee32ac2`. Some other changes are needed to make this runtime.	2019-08-07 11:35:45 -05:00
David Teigland	29eee32ac2	config: cache_policy should be cfg_runtime	2019-08-07 11:08:15 -05:00
David Teigland	682b6216df	config: set deprecated version for segment_libraries Stopped being used some time ago.	2019-08-07 11:08:11 -05:00
David Teigland	0404539edb	vgcreate/vgextend: restrict PVs with mixed block sizes Avoid having PVs with different logical block sizes in the same VG. This prevents LVs from having mixed block sizes, which can produce file system errors. The new config setting devices/allow_mixed_block_sizes (default 0) can be changed to 1 to return to the unrestricted mode.	2019-08-01 10:06:47 -05:00
David Teigland	7f347698e3	Fix rounding writes up to sector size Do this at two levels, although one would be enough to fix the problem seen recently: - Ignore any reported sector size other than 512 of 4096. If either sector size (physical or logical) is reported as 512, then use 512. If neither are reported as 512, and one or the other is reported as 4096, then use 4096. If neither is reported as either 512 or 4096, then use 512. - When rounding up a limited write in bcache to be a multiple of the sector size, check that the resulting write size is not larger than the bcache block itself. (This shouldn't happen if the sector size is 512 or 4096.)	2019-07-26 14:21:08 -05:00
David Teigland	c22ad12bab	metadata: extend writes to zero space Previously, consecutive copies of metadata would have garbage data in the space between them. After metadata wrapping, the garbage would be portions of old metadata. This made analysis of the metadata area more difficult. This would happen because the start of new copy of metadata is advanced from the end of the last copy to start at the next 512 byte boundary. Zero the space between consecutive copies of metadata by extending each metadata write to end at the next 512 byte boundary. The size of the metadata itself is not extended, only the write. The buffer being written contains the metadata text followed by the necessary number of zeros.	2019-07-12 15:00:12 -05:00
David Teigland	4567c6a2b2	enable full md component detection at the right time An active md device with an end superblock causes lvm to enable full md component detection. This was being done within the filter loop instead of before, so the full filtering of some devs could be missed. Also incorporate the recently added config setting that controls the md component detection.	2019-07-10 13:30:50 -05:00
David Teigland	f17353e3e6	md component detection for differing PV and device sizes This check was mistakenly removed when shifting code in commit "separate code for setting devices from metadata parsing". Put it back with some new conditions.	2019-07-09 13:40:41 -05:00
David Teigland	d2b88f2715	scan: remove unused arg to setup_bcache	2019-07-09 13:16:26 -05:00
David Teigland	b4402bd821	exported vg handling The exported VG checking/enforcement was scattered and inconsistent. This centralizes it and makes it consistent, following the existing approach for foreign and shared VGs/PVs, which are very similar to exported VGs/PVs. The access policy that now applies to foreign/shared/exported VGs/PVs, is that if a foreign/shared/exported VG/PV is named on the command line (i.e. explicitly requested by the user), and the command is not permitted to operate on it because it is foreign/shared/exported, then an access error is reported and the command exits with an error. But, if the command is processing all VGs/PVs, and happens to come across a foreign/shared/exported VG/PV (that is not explicitly named on the command line), then the command silently skips it and does not produce an error. A command using tags or --select handles inaccessible VGs/PVs the same way as a command processing all VGs/PVs, and will not report/return errors if these inaccessible VGs/PVs exist. The new policy fixes the exit codes on a somewhat random set of commands that previously exited with an error if they were looking at all VGs/PVs and an exported VG existed on the system. There should be no change to which commands are allowed/disallowed on exported VGs/PVs. Certain LV commands (lvs/lvdisplay/lvscan) would previously not display LVs from an exported VG (for unknown reasons). This has not changed. The lvm fullreport command would previously report info about an exported VG but not about the LVs in it. This has changed to include all info from the exported VG.	2019-06-25 15:39:08 -05:00
David Teigland	d16142f90f	scanning: open devs rw when rescanning for write When vg_read rescans devices with the intention of writing the VG, the label rescan can open the devs RW so they do not need to be closed and reopened RW in dev_write_bytes.	2019-06-21 10:57:49 -05:00
David Teigland	8fecd9c14e	metadata: include description with command in metadata areas Previously the VG metadata description field (which contains the command line) was only included in backup/archive copies of the metadata. Now also include it in the metadata written to the metadata areas.	2019-06-20 16:09:05 -05:00
Marian Csontos	556dcd2c6b	config: Fix default option which makes no sense Default value is either undefined or commented, never both.	2019-06-17 19:08:28 +02:00
David Teigland	7c697c1058	config: remove filter typo Remove unnecessary but harmless / in the filter string "a\|.*/\|".	2019-06-17 09:38:24 -05:00
David Teigland	4bb7d3da0e	lvmcache: remove wrapper around lvmcache_get_vgnameids This was left over from when there was an lvmetad version of the function.	2019-06-11 14:10:14 -05:00
David Teigland	0f350ba890	remove unused trustcache option	2019-06-11 11:42:49 -05:00
David Teigland	b7850faba7	locking: fix repeated convert to ex Some uncommon commands like pvchange -a -u may call convert to ex multiple times.	2019-06-10 13:37:03 -05:00
David Teigland	49b8846567	lvmcache: remove unused function Drop lvmcache_fmt_from_vgname(), the way it was called made it identical to the existing lvmcache_vginfo_from_vgname().	2019-06-10 10:38:32 -05:00
David Teigland	550536474f	vgsplit: simplify vg creation The way that this command now uses the global lock followed by a label scan, it can simply check if the new VG name exists, and if not lock it and create it.	2019-06-10 10:38:32 -05:00
David Teigland	5036244ce8	lvmcache: remove unused code	2019-06-10 10:38:32 -05:00
David Teigland	a07cc8dbef	reset cmd wipe_outdated_pvs at the start of a command, which is needed in case the cmd struct is reused.	2019-06-10 10:34:58 -05:00
David Teigland	36cbc6db24	locking: reset global_ex flag at end of cmd These two flags may be not reset at the end of the command when the unlock is implicit, which is a problem if the cmd struct is reused. Clear the flags in the general fin_locking.	2019-06-10 10:34:58 -05:00
David Teigland	a3a676e0e7	metadata.c: removed unused code if 0 was placed around old vg_read code by the previous commit.	2019-06-07 15:54:04 -05:00
David Teigland	ba7ff96faf	improve reading and repairing vg metadata The fact that vg repair is implemented as a part of vg read has led to a messy and complicated implementation of vg_read, and limited and uncontrolled repair capability. This splits read and repair apart. Summary ------- - take all kinds of various repairs out of vg_read - vg_read no longer writes anything - vg_read now simply reads and returns vg metadata - vg_read ignores bad or old copies of metadata - vg_read proceeds with a single good copy of metadata - improve error checks and handling when reading - keep track of bad (corrupt) copies of metadata in lvmcache - keep track of old (seqno) copies of metadata in lvmcache - keep track of outdated PVs in lvmcache - vg_write will do basic repairs - new command vgck --updatemetdata will do all repairs Details ------- - In scan, do not delete dev from lvmcache if reading/processing fails; the dev is still present, and removing it makes it look like the dev is not there. Records are now kept about the problems with each PV so they be fixed/repaired in the appropriate places. - In scan, record a bad mda on failure, and delete the mda from mda in use list so it will not be used by vg_read or vg_write, only by repair. - In scan, succeed if any good mda on a device is found, instead of failing if any is bad. The bad/old copies of metadata should not interfere with normal usage while good copies can be used. - In scan, add a record of old mdas in lvmcache for later, do not repair them while reading, and do not let them prevent us from finding and using a good copy of metadata from elsewhere. One result is that "inconsistent metadata" is no longer a read error, but instead a record in lvmcache that can be addressed separate from the read. - Treat a dev with no good mdas like a dev with no mdas, which is an existing case we already handle. - Don't use a fake vg "handle" for returning an error from vg_read, or the vg_read_error function for getting that error number; just return null if the vg cannot be read or used, and an error_flags arg with flags set for the specific kind of error (which can be used later for determining the kind of repair.) - Saving an original copy of the vg metadata, for purposes of reverting a write, is now done explicitly in vg_read instead of being hidden in the vg_make_handle function. - When a vg is not accessible due to "access restrictions" but is otherwise fine, return the vg through the new error_vg arg so that process_each_pv can skip the PVs in the VG while processing. (This is a temporary accomodation for the way process_each_pv tracks which devs have been looked at, and can be dropped later when process_each_pv implementation dev tracking is changed.) - vg_read does not try to fix or recover a vg, but now just reads the metadata, checks access restrictions and returns it. (Checking access restrictions might be better done outside of vg_read, but this is a later improvement.) - _vg_read now simply makes one attempt to read metadata from each mda, and uses the most recent copy to return to the caller in the form of a 'vg' struct. (bad mdas were excluded during the scan and are not retried) (old mdas were not excluded during scan and are retried here) - vg_read uses _vg_read to get the latest copy of metadata from mdas, and then makes various checks against it to produce warnings, and to check if VG access is allowed (access restrictions include: writable, foreign, shared, clustered, missing pvs). - Things that were previously silently/automatically written by vg_read that are now done by vg_write, based on the records made in lvmcache during the scan and read: . clearing the missing flag . updating old copies of metadata . clearing outdated pvs . updating pv header flags - Bad/corrupt metadata are now repaired; they were not before. Test changes ------------ - A read command no longer writes the VG to repair it, so add a write command to do a repair. (inconsistent-metadata, unlost-pv) - When a missing PV is removed from a VG, and then the device is enabled again, vgck --updatemetadata is needed to clear the outdated PV before it can be used again, where it wasn't before. (lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair, mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv) Reading bad/old metadata ------------------------ - "bad metadata": the mda_header or metadata text has invalid fields or can't be parsed by lvm. This is a form of corruption that would not be caused by known failure scenarios. A checksum error is typically included among the errors reported. - "old metadata": a valid copy of the metadata that has a smaller seqno than other copies of the metadata. This can happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that has been removed from the VG is the "outdated" case below. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad metadata in particular is something that users will want to investigate and repair themselves, since it should not happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with the metadata repair command. When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Outdated PVs ------------ An outdated PV is a PV that has an old copy of VG metadata that shows it is a member of the VG, but the latest copy of the VG metadata does not include this PV. This happens if the PV is disconnected, vgreduce --removemissing is run to remove the PV from the VG, then the PV is reconnected. In this case, the outdated PV needs have its outdated metadata removed and the PV used flag needs to be cleared. This repair will be done by the subsequent repair command. It is also done if vgremove is run on the VG. MISSING PVs ----------- When a device is missing, most commands will refuse to modify the VG. This is the simple case. More complicated is when a command is allowed to modify the VG while it is missing a device. When a VG is written while a device is missing for one of it's PVs, the VG metadata is written to disk with the MISSING flag on the PV with the missing device. When the VG is next used, it is treated as if the PV with the MISSING flag still has a missing device, even if that device has reappeared. If all LVs that were using a PV with the MISSING flag are removed or repaired so that the MISSING PV is no longer used, then the next time the VG metadata is written, the MISSING flag will be dropped. Alternative methods of clearing the MISSING flag are: vgreduce --removemissing will remove PVs with missing devices, or PVs with the MISSING flag where the device has reappeared. vgextend --restoremissing will clear the MISSING flag on PVs where the device has reappeared, allowing the VG to be used normally. This must be done with caution since the reappeared device may have old data that is inconsistent with data on other PVs. Bad mda repair -------------- The new command: vgck --updatemetadata VG first uses vg_write to repair old metadata, and other basic issues mentioned above (old metadata, outdated PVs, pv_header flags, MISSING_PV flags). It will also go further and repair bad metadata: . text metadata that has a bad checksum . text metadata that is not parsable . corrupt mda_header checksum and version fields (To keep a clean diff, #if 0 is added around functions that are replaced by new code. These commented functions are removed by the following commit.)	2019-06-07 15:54:04 -05:00
David Teigland	015b906069	add a warning message when updating old metadata in an mda that had previously not been updated	2019-06-07 15:54:04 -05:00
David Teigland	47effdc025	vgck --updatemetadata is a new command uses vg_write to correct more common or less severe issues, and also adds the ability to repair some metadata corruption that couldn't be handled previously.	2019-06-07 15:54:04 -05:00
David Teigland	de3d3b11f4	move pv header repairs to vg_write Correct PV header in-use or version fields from vg_write instead of vg_read.	2019-06-07 15:54:04 -05:00
David Teigland	ab61a6d85d	move wipe_outdated_pvs to vg_write and implement it based on a device, not based on a pv struct (which is not available when the device is not a part of the vg.) currently only the vgremove command wipes outdated pvs until more advanced recovery is added in a subsequent commit	2019-06-07 15:54:04 -05:00
David Teigland	45b164f62c	create separate lvmcache update functions for read and write The vg read and vg write cases need to update lvmcache differently, so create separate functions for them. The read case now handles checking for outdated mdas and moves them aside into a new list to be repaired in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	027e0e92e6	fix vg_commit return value The existing comment was desribing the correct behavior, but the code didn't match. The commit is successful if one mda was committed. Making it depend on the result of the internal lvmcache update was wrong.	2019-06-07 15:54:04 -05:00
David Teigland	86d831b916	change args for text label read function Have the caller pass the label_sector to the read function so the read function can set the sector field in the label struct, instead of having the read function return a pointer to the label for the caller to set the sector field. Also have the read function return a flag indicating to the caller that the scanned device was identified as a duplicate pv.	2019-06-07 15:54:04 -05:00
David Teigland	889b5d3183	add mda arg to add_mda Allow the caller of lvmcache_add_mda() to have the new mda returned.	2019-06-07 15:54:04 -05:00
David Teigland	b2447e3538	keep track of which mdas have old metadata in lvmcache This will be used for more advanced repair in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	0b18c25d93	ability to keep track of outdated pvs in lvmcache Outdated PVs hold metadata for VG from which they have been removed. Add the ability to keep track of these in lvmcache. This will be used for more advanced repair in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	650524b955	ability to keep track of bad mdas in lvmcache mda's that cannot be processed by lvm because of some corruption can be kept on a separate list. These will be used for more advanced repair in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	aeafdc1f45	add flags to keep track of bad metadata When reading metadata headers and text, use a new set of flags to identify specific errors that are seen. These will be used for more advanced repair in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	db98a6e362	Additional MD component checking If udev info is missing for a device, (which would indicate if it's an MD component), then do an end-of-device read to check if a PV is an MD component. (This is skipped when using hints since we already know devs in hints are good.) A new config setting md_component_checks can be used to disable the additional end-of-device MD checks, or to always enable end-of-device MD checks. When both hints and udev info are disabled/unavailable, the end of PVs will now be scanned by default. If md devices with end-of-device superblocks are not being used, the extra I/O overhead can be avoided by setting md_component_checks="start".	2019-06-07 13:27:16 -05:00
David Teigland	2bcd43c683	lvmcache: remove unused_duplicate_devs list from cmd Save the previous duplicate PVs in a global list instead of a list on the cmd struct. dmeventd reuses the cmd struct for multiple commands, and the list entries between commands were being freed (apparently), causing a segfault in dmeventd when it tried to use items in cmd->unused_duplicate_devs that had been saved there by the previous command.	2019-06-07 10:14:33 -05:00
David Teigland	2b241eb1f6	pvck: use new dump routines for old output Use the recently added dump routines to produce the old/traditional pvck output, and remove the code that had been used for that. The validation/checking done by the new routines means that new lines prefixed with CHECK are printed for incorrect values.	2019-06-05 16:28:52 -05:00
Zdenek Kabelac	e3c4ab0cc7	cache: support no_discard_passdown Recent kernel version from kernel commit: de7180ff908b2bc0342e832dbdaa9a5f1ecaa33a started to report in cache status line new flag: no_discard_passdown Whenever lvm spots unknown status it reports: Unknown feature in status: So add reconginzing this feature flag and also report this with 'lvs -o+kernel_discards' When no_discard_passdown is found in status 'nopassdown' gets reported for this field (roughly matching what we report for thin-pools).	2019-06-05 15:48:41 +02:00
David Teigland	d18e491f68	pvck: dump headers and metadata Add 'pvck --dump headers' to print all the lvm ondisk structs. Also checks the values and prints any problems. The previous dump metadata is also converted to use these same routines, which do not depend on lvm fully scanning/reading/processing the headers and metadata on disk. This makes it useful to get data in cases where there is corruption that would otherwise prevent the normal functions from working.	2019-06-03 15:13:32 -05:00
David Teigland	645dd27604	separate code for setting devices from metadata parsing Pull the code that sets devs for PVs out of the metadata parsing code and call it separately.	2019-05-23 11:57:38 -05:00
David Teigland	52586b1039	pvck: new dump option to extract metadata The new command 'pvck --dump metadata PV' will extract the current version of VG metadata from a PV for testing and debugging. --dump metadata_area extracts the entire text metadata area.	2019-05-23 11:49:06 -05:00
David Teigland	dc1e12dcd4	scan: expand and update label scan comments	2019-05-21 12:02:40 -05:00
David Teigland	60bf9c9f33	hints: exclude md components In some cases md components could be included in the hints, so add a check to hint creation to make sure they are excluded.	2019-05-21 11:58:01 -05:00
David Teigland	19ef399ea7	devs: rename dev_is_md dev_is_md_component The naming was confusing and misleading since it it's testing if a device is an md component, not an md device.	2019-05-21 11:44:39 -05:00
David Teigland	6078585381	add md component check in vg_read based on size If an md component is not excluded by other means and vg_read is used to read metadata from it, then this new check compares the device size with the PV size, and runs a full md check on the device if the sizes don't match.	2019-05-03 14:39:42 -05:00
Zdenek Kabelac	d60d59a5f3	cleanup: use unsigned type	2019-05-03 13:17:22 +02:00
Zdenek Kabelac	7a5ea681fb	build: fix compilation without lvmlockd	2019-05-03 13:17:22 +02:00
Zdenek Kabelac	a520b3002c	locking: validate locking mode Ensure 'ret' is always defined and validate 'mode'.	2019-05-03 13:17:22 +02:00
David Teigland	99de816a1b	scan: remove comments about lvmetad	2019-05-02 13:32:30 -05:00
David Teigland	0046c4e7a7	use memcpy for constant ondisk strings Use memcpy/memcmp for on disk strings which are not null terminated: FMTT_MAGIC, LVM2_LABEL and LABEL_ID. Quiets compile warnings.	2019-05-02 12:59:50 -05:00
David Teigland	adfb9bf20c	remove unused string writecache	2019-05-01 16:50:14 -05:00
David Teigland	90b94ead12	lvmcache: remove unused flag The new label scan design is never called recursively, so we don't need a flag to check for that.	2019-04-30 14:59:27 -05:00
David Teigland	c3e385c108	hints: skip hint flock if nolocking option is set	2019-04-29 13:01:15 -05:00
David Teigland	8c87dda195	locking: unify global lock for flock and lockd There have been two file locks used to protect lvm "global state": "ORPHANS" and "GLOBAL". Commands that used the ORPHAN flock in exclusive mode: pvcreate, pvremove, vgcreate, vgextend, vgremove, vgcfgrestore Commands that used the ORPHAN flock in shared mode: vgimportclone, pvs, pvscan, pvresize, pvmove, pvdisplay, pvchange, fullreport Commands that used the GLOBAL flock in exclusive mode: pvchange, pvscan, vgimportclone, vgscan Commands that used the GLOBAL flock in shared mode: pvscan --cache, pvs The ORPHAN lock covers the important cases of serializing the use of orphan PVs. It also partially covers the reporting of orphan PVs (although not correctly as explained below.) The GLOBAL lock doesn't seem to have a clear purpose (it may have eroded over time.) Neither lock correctly protects the VG namespace, or orphan PV properties. To simplify and correct these issues, the two separate flocks are combined into the one GLOBAL flock, and this flock is used from the locking sites that are in place for the lvmlockd global lock. The logic behind the lvmlockd (distributed) global lock is that any command that changes "global state" needs to take the global lock in ex mode. Global state in lvm is: the list of VG names, the set of orphan PVs, and any properties of orphan PVs. Reading this global state can use the global lock in sh mode to ensure it doesn't change while being reported. The locking of global state now looks like: lockd_global() previously named lockd_gl(), acquires the distributed global lock through lvmlockd. This is unchanged. It serializes distributed lvm commands that are changing global state. This is a no-op when lvmlockd is not in use. lockf_global() acquires an flock on a local file. It serializes local lvm commands that are changing global state. lock_global() first calls lockf_global() to acquire the local flock for global state, and if this succeeds, it calls lockd_global() to acquire the distributed lock for global state. Replace instances of lockd_gl() with lock_global(), so that the existing sites for lvmlockd global state locking are now also used for local file locking of global state. Remove the previous file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN). The following commands which change global state are now serialized with the exclusive global flock: pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove, vgcreate, vgextend, vgremove, vgreduce, vgrename, vgcfgrestore, vgimportclone, vgmerge, vgsplit Commands that use a shared flock to read global state (and will be serialized against the prior list) are those that use process_each functions that are based on processing a list of all VG names, or all PVs. The list of all VGs or all PVs is global state and the shared lock prevents those lists from changing while the command is processing them. The ORPHAN lock previously attempted to produce an accurate listing of orphan PVs, but it was only acquired at the end of the command during the fake vg_read of the fake orphan vg. This is not when orphan PVs were determined; they were determined by elimination beforehand by processing all real VGs, and subtracting the PVs in the real VGs from the list of all PVs that had been identified during the initial scan. This is fixed by holding the single global lock in shared mode while processing all VGs to determine the list of orphan PVs.	2019-04-29 13:01:05 -05:00
David Teigland	ccd1386070	wipe_lv: initially open LV in writable mode wipe_lv knows it's going to write the device, so it can open rw from the start. It was opening readonly, and then dev_write needed to reopen it readwrite.	2019-04-26 14:49:27 -05:00
David Teigland	d0b869e46a	hints: fix non-empty hints list when not using hints When hints are invalid and ignored, the list of hints could be non-empty (from additions before an invalid hint was found). This confused the calling code which was checking for an empty list to see if hints were used. Ensure the list is empty when hints are not used.	2019-04-11 11:58:51 -05:00
David Teigland	0cc80ccfd5	hints: fix case of error getting device size When checking hints, if there's an error getting the device size, that should be equivalent to seeing zero size.	2019-04-11 10:32:28 -05:00
David Teigland	6f18186bfd	pvscan: print more reasons for ignoring devices	2019-04-05 15:48:12 -05:00
David Teigland	c33770c02d	lvmlockd: do not allow mirror LV to be activated shared This reverts `518a8e8cfb` "lvmlockd: activate mirror LVs in shared mode with cmirrord" because while activating a mirror LV with cmirrord worked, changes to the active cmirror did not work.	2019-04-04 13:21:38 -05:00
Zdenek Kabelac	fcec6691f0	thin: fix maintenance of _pmspare When metadata grows lvm2 may need to extend also _pmspare volume.	2019-04-03 13:28:54 +02:00
Zdenek Kabelac	e27d027155	thin: resize metadata with data When data are growing, adapt also size of metadata. As we get way too many reports from users doing huge growths of data portion while keep metadata small and avoiding using monitoring. So to enhance the user-experience in case user requests grown of thin-pool (without passing PV list for growth) - lvm2 will automaticaly grown also the metadata part of thin-pool (if possible).	2019-04-03 13:28:22 +02:00
Zdenek Kabelac	7c3de2fd93	thin: introduce estimate_thin_pool_metadata_size Add function for estimation of thin-pool metadata size for given size of data. Function is using already existing internal API so it can be reused for resize of thin-pool data.	2019-04-03 13:27:17 +02:00
Zdenek Kabelac	bca0a4df9a	filter: fix mpath test Fix bug which leaked into commit `dc6dea4033`, where the testing code got mistakenly commited.	2019-04-03 13:27:17 +02:00
David Teigland	2f471f0184	lvresize: fix when compiled without lvmlockd The no-op result of lockd_lv_resize should be success.	2019-04-02 10:51:38 -05:00
David Teigland	85e68a8333	lvextend: refresh shared LV remotely using dlm/corosync When lvextend extends an LV that is active with a shared lock, use this as a signal that other hosts may also have the LV active, with gfs2 mounted, and should have the LV refreshed to reflect the new size. Use the libdlmcontrol run api, which uses dlm_controld/corosync to run an lvchange --refresh command on other cluster nodes.	2019-03-21 12:38:20 -05:00
David Teigland	d369de8399	lvextend: allow on LV active with a shared lock Detect when a shared lock exists, don't require the normal exclusive lock, and allow the lvextend.	2019-03-21 12:38:20 -05:00
David Teigland	9b4926aaff	warn about changes to an active lv with shared lock When an LV is active with a shared lock, a command can be run to change the LV with --lockopt skiplv (to override the exclusive lock the command ordinarily requires which is not compatible with the outstanding shared lock.) In this case, other commands may have the LV active and may need to refresh the LV, so print warning stating this.	2019-03-21 12:38:20 -05:00
Zdenek Kabelac	4411fe2ba8	activation: synchronize before removing devices Udev is running udev-rule action upon 'resume'. However lvm2 in special case is doing replacement of 'soon-to-be-removed' device with 'error' target for resuming and then follows actual removal - the sequence is usually quick, so when udev start action - it can result in 'strange' error message in kernel log like: Process '/usr/sbin/dmsetup info -j 253 -m 17 -c --nameprefixes --noheadings --rows -o name,uuid,suspended' failed with exit code 1. To avoid this - we need to ensure there is synchronization wait for udev between 'resume' and 'remove' part of this process. However existing code put strict requirement to avoid synchronizing with udev inside critical section - but this originally came from requirement to not do anything special while there could be devices in suspend-state. Now we are able to see differnce between critical section with or without suspended devices. For udev synchronization only suspended devices are prohibited to be there - so slightly relax condition and allow calling and using 'fs_sync()' even inside critical section - but there must not be any suspended device.	2019-03-20 14:39:09 +01:00
Zdenek Kabelac	677aa84be3	vdo: enable caching for vdopool LV and vdo LV Allow using caching with VDO. User can either cache a single vdopool or a vdo LV - difference when the caching is put-in depends on a use-case and it's upto user to decide which kind of speed is expected.	2019-03-20 14:38:31 +01:00
Zdenek Kabelac	0db22c5f81	lv_manip: insert remove layer skips pools Fixing renaming of subLVs when removing and inserting layers - this got visible when using stacked VDO pools.	2019-03-20 14:38:05 +01:00
Zdenek Kabelac	1cc690e911	thin: max thin	2019-03-20 14:37:44 +01:00
Zdenek Kabelac	74b5f22838	debug: use log_warn This reports are not causing command failure, so report them as warning.	2019-03-20 14:37:44 +01:00
Zdenek Kabelac	dc6dea4033	filter: enhance mpath detection Internal detection of SCSI device being in-use by DM mpath has been performed several times for each component device - this could be eventually racy - so instead when we do remember 1st. checked result for device being mpath and use it consistenly over the filter runtime.	2019-03-20 14:37:42 +01:00
Zdenek Kabelac	1eeb2fa3f6	dev_manager: add dev_manager_remove_dm_major_minor Move DM usage into dev_manager.c source file. Also convert STATUS to INFO ioctl - as that's enough to obtain UUID - this also avoid issuing unwanted flush on checked DM device for being mpath.	2019-03-20 14:37:10 +01:00
David Teigland	9b2b0fef9c	config: improve scan_lvs description	2019-03-06 13:33:07 -06:00
David Teigland	4e20ebd6a1	pvscan: ignore online for shared and foreign PVs Activation would not be allowed anyway, but we can check for these cases early and avoid wasted time in pvscan managing online files an attempting activation.	2019-03-05 15:19:05 -06:00
David Teigland	7edbf8a441	io: increase the default io memory from 4 to 8 MiB This is the default bcache size that is created at the start of the command. It needs to be large enough to hold a single copy of metadata for a given VG, or the VG cannot be read or written (since the entire VG would not fit into available memory.) Increasing the default reduces the chances of anyone needing to increase the default to use their VG. The size can be set in lvm.conf global/io_memory_size; the lower limit is 4 MiB and the upper limit is 128 MiB.	2019-03-04 12:14:06 -06:00
David Teigland	3584e0c0d5	io: warn when metadata size approaches io memory size When a single copy of metadata gets within 1MB of the current io_memory_size value, begin printing a warning that the io_memory_size should be increased.	2019-03-04 12:13:09 -06:00
David Teigland	dd8d083795	config: add new setting io_memory_size which defines the amount of memory that lvm will allocate for bcache. Increasing this setting is required if it is smaller than a single copy of VG metadata.	2019-03-04 11:36:21 -06:00
David Teigland	3ed9256985	remove unused io functions	2019-02-28 10:58:00 -06:00
David Teigland	fb83719d7f	logging: remove unused code Incomplete bits of original code that's unused.	2019-02-28 10:30:54 -06:00
David Teigland	a9eaab6beb	Use "cachevol" to refer to cache on a single LV and "cachepool" to refer to a cache on a cache pool object. The problem was that the --cachepool option was being used to refer to both a cache pool object, and to a standard LV used for caching. This could be somewhat confusing, and it made it less clear when each kind would be used. By separating them, it's clear when a cachepool or a cachevol should be used. Previously: - lvm would use the cache pool approach when the user passed a cache-pool LV to the --cachepool option. - lvm would use the cache vol approach when the user passed a standard LV in the --cachepool option. Now: - lvm will always use the cache pool approach when the user uses the --cachepool option. - lvm will always use the cache vol approach when the user uses the --cachevol option.	2019-02-27 08:52:34 -06:00
David Teigland	c8fc18e8bf	config: make hints setting commented	2019-02-26 15:54:30 -06:00
David Teigland	90149c303e	logging: new config settings to specify debug fields For users who do not want all of the fields included in debug lines, let them specify in lvm.conf which fields to include. timestamp, command[pid], and file:line fields can all be disabled.	2019-02-26 14:42:16 -06:00
David Teigland	9aea6ae956	logging: add command[pid] and timestamp to file and verbose output Without this, the output from different commands in a single log file could not be separated. Change the default "indent" setting to 0 so that the default debug output does not include variable spaces in the middle of debug lines.	2019-02-26 10:03:44 -06:00
David Teigland	7be6791e70	config: change scan_lvs default to 0 so that lvm does not scan LVs for PVs by default.	2019-02-20 13:30:46 -06:00
David Teigland	0aa51a2f61	hints: fix recreating hints from pvscan When aay was included in the pvscan --cache command, the activation part was complaining about the unusual state of the hint file since it had been recreated just prior.	2019-02-13 15:23:43 -06:00
David Teigland	3ebce8dbd2	apply obtain_device_list_from_udev to all libudev usage udev_dev_is_md_component and udev_dev_is_mpath_component are not used for obtaining the device list, but they still use libudev for device info. When there are problems with udev, these functions can get stuck. So, use the existing obtain_device_list_from_udev config setting to also control whether these "is component" functions are used, which gives us a way to avoid using libudev entirely when it's causing problems.	2019-02-05 10:15:40 -06:00
Zdenek Kabelac	d19e372795	cleanup: indent	2019-01-28 22:39:10 +01:00
Zdenek Kabelac	78dd9d820d	thin: select chunk size as power of 2 Whenever thin-pool chunk size is unspecified and left for lvm calculation try to select the size as nearest highest power-of-2 instead of just being a multiple of 64KiB.	2019-01-28 22:17:25 +01:00
Zdenek Kabelac	58ad831c72	cache: select chunk size as power of 2 When cache chunk size is not configured, and left for lvm deduction, select the value which is power-of-2.	2019-01-28 22:17:14 +01:00
Zdenek Kabelac	105a8edea1	lv_manip: better work with PERCENT_VG modifier with lvresize Fixing recent commit `022ebb0cfe` Resize already has size that needs to be counted with, otherwise upsizing operation could turn into size reduction one.	2019-01-21 15:39:24 +01:00
Zdenek Kabelac	e689bfb5d5	vdo: minor API cleanup Since the parse_vdo_pool_status() become vdo_manip API part, and there will be no 'dm' matching status parser, the API can be simplified and closely match thin API here.	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	f3c52a515b	vdo: enable dmeventd resize	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	3d367f3348	vdo: add simple wrapper for getting pool percentage Just like with i.e. thins provide simple function for getting percentage of VDO Pool usage (uses existing status function).	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	a16d914d34	cleanup: better naming	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	08cabe9b83	vdo: allow resize of VDO and VDO pool volumes Now with newer VDO kvdo target we can start to use standard mechanism to enable resize of VDO volumes. VDO pool can be grown. Virtual volume grows on top of VDO pool when is not big enough. Reduced VDOLV is calling discard for reduced areas - this can take long time! TODO: implement some pollable mechanism for out-of-lock TRIM.	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	bd6709cec6	vdo: size reduction requires VDO to be active To be able to send discard to reduced areas - the VDO LV needs to be active.	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	f1ad4b0679	vdo: discard reduced area Implement sending discard to reduced LV area.	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	ca72d19691	vdo: estimate virtual size after resize	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	ab031d673d	vdo: introduce function for estimation of virtual size	2019-01-21 12:53:16 +01:00
Zdenek Kabelac	022ebb0cfe	lv_manip: better work with PERCENT_VG modifier When using 'lvcreate -l100%VG' and there is big disproportion between real available space and requested setting - automatically fallback to 100%FREE. Difference can be seen when VG is big and already most space was allocated, so the requestion 100%VG can end (and by spec for % modifier it's correct) as LV with size of 1%VG. Usually this is not a big problem - buit in some cases - like cache-pool allocation, this can result a big difference for chunksize selection. With this patch it's more closely match common-sense logic without the need of reitteration of too big changes in lvm2 core ATM. TODO: in the future there should be allocator solving all allocations in a single call.	2019-01-21 12:53:15 +01:00
Zdenek Kabelac	f87dd7b127	vdo: fix archived metadata comment lvm uses 'minimum_io_size' name to exactly match VDO naming here, however in all common cases _size is using 'sector/512b' unit. But in this case the value is in bytes and can have only 2 values: either 512 or 4096. It's probably not worth to rename it internaly, so we can just drop comment - instead of using 1 or 8. Thought let's think about it....	2019-01-21 12:37:52 +01:00
David Teigland	5f102b3421	hints: invalidate when pvscan --cache sees a new PV An idea from Zdenek for better ensuring valid hints by invalidating them when pvscan --cache <device> sees a new PV, which is a case where we know that hints should be invalidated. This is triggered from systemd/udev logic, and there may be some cases where it would invalidate hints that the existing methods wouldn't detect.	2019-01-16 15:34:20 -06:00
David Teigland	facd520931	lvmlockd: fix make lockstart wait when building without lvmlockd	2019-01-16 13:24:29 -06:00
David Teigland	ebaaff3590	move init_use_aio it doesn't make sense to call from init_logging	2019-01-16 11:45:53 -06:00
David Teigland	e158835a05	lvmlockd: make lockstart wait for existing start If there are two independent scripts doing: vgchange --lockstart vg lvchange -ay vg/lv The first vgchange to do the lockstart will wait for the lockstart to complete before returning. The second vgchange to do the lockstart will see that the start is already in progress (from the first) and will do nothing. This means the second does not wait for any lockstart to complete, and moves on to the lvchange which may find the lockspace still starting and fail. To fix this, make the vgchange lockstart command wait for any lockstart's in progress to complete.	2019-01-16 10:49:04 -06:00

... 2 3 4 5 6 ...

6638 Commits