shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00

Author	SHA1	Message	Date
Zdenek Kabelac	e6f735d411	vdo: read new sysfs path New versions of kvdo module exposes statistics at new location: /sys/block/dm-XXX/vdo/statistics/... Enhance lvm2 to access this location first. Also if the statistic info is missing - make it 'debug' level info, so it is not failing 'lvs' command.	2021-09-09 15:24:15 +02:00
Zdenek Kabelac	2c6a2b6e86	vdo: support vdo_pool_header_size Add profilable configurable setting for vdo pool header size, that is used as 'extra' empty space at the front and end of vdo-pool device to avoid having a disk in the system the may have same data is real vdo LV. For some conversion cases however we may need to allow using '0' header size. TODO: in this case we may eventually avoid adding 'linear' mapping layer in future - but this requires further modification over lvm code base.	2021-06-28 20:41:07 +02:00
David Teigland	4a746f7ffc	lvremove: fix removing thin pool with writecache on data	2021-05-24 16:09:35 -05:00
Leo Yan	ef1c57e68f	lib: locking: Add new type "idm" We can consider the drive firmware a server to handle the locking request from nodes, this essentially is a client-server model. DLM uses the kernel as a central place to manage locks, so it also complies with client-server model for locking operations. This is why IDM and DLM are similar with each other for their wrappers. This patch largely works by generalizing the DLM code paths and then providing degeneralized functions as wrappers for both IDM and DLM. Signed-off-by: Leo Yan <leo.yan@linaro.org>	2021-05-20 16:01:05 -05:00
David Teigland	0a28e3c44b	Add metadata-based autoactivation property for VG and LV The autoactivation property can be specified in lvcreate or vgcreate for new LVs/VGs, and the property can be changed by lvchange or vgchange for existing LVs/VGs. --setautoactivation y\|n enables\|disables autoactivation of a VG or LV. Autoactivation is enabled by default, which is consistent with past behavior. The disabled state is stored as a new flag in the VG metadata, and the absence of the flag allows autoactivation. If autoactivation is disabled for the VG, then no LVs in the VG will be autoactivated (the LV autoactivation property will have no effect.) When autoactivation is enabled for the VG, then autoactivation can be controlled on individual LVs. The state of this property can be reported for LVs/VGs using the "-o autoactivation" option in lvs/vgs commands, which will report "enabled", or "" for the disabled state. Previous versions of lvm do not recognize this property. Since autoactivation is enabled by default, the disabled setting will have no effect in older lvm versions. If the VG is modified by older lvm versions, the disabled state will also be dropped from the metadata. The autoactivation property is an alternative to using the lvm.conf auto_activation_volume_list, which is still applied to to VGs/LVs in addition to the new property. If VG or LV autoactivation is disabled either in metadata or in auto_activation_volume_list, it will not be autoactivated. An autoactivation command will silently skip activating an LV when the autoactivation property is disabled. To determine the effective autoactivation behavior for a specific LV, multiple settings would need to be checked: the VG autoactivation property, the LV autoactivation property, the auto_activation_volume_list. The "activation skip" property would also be relevant, since it applies to both normal and auto activation.	2021-04-07 15:32:49 -05:00
Zdenek Kabelac	a9b4acd511	dev_manager: add lv_raid_status Just like with other segtype use this function to get whole raid status info available per a single ioctl call. Also it nicely simplifies read of percentage info about in_sync portion of raid volume. TODO: drop use of other calls then lv_raid_status call, since all such calls could already use status - so it just adds unnecessary duplication.	2021-03-18 18:34:57 +01:00
Zdenek Kabelac	a915cd5a46	lvconvert: vdo may convert already formated vdo User use 'lvconvert -Zn --type vdo-pool' to convert an existing vdo formated volume and skip lvm2 internal formating. This however requires user is passing proper matching parameters. For them user can use --profile\|--metadataprofile option whos support has been also enhanced. TODO: add support to read values directly from formated volume.	2021-02-17 11:21:35 +01:00
Zdenek Kabelac	abc9265a06	cache: reuse code for metadata min_max Use update_pool_metadata_min_max() which is shared with thin-pool metadata min-max updating. Gives improved messages when converting volumes to metadata.	2021-02-01 12:06:13 +01:00
Zdenek Kabelac	b4212be2e7	thin: improve 16g support for thin pool metadata Initial support for thin-pool used slightly smaller max size 15.81GiB for thin-pool metadata. However the real limit later settled at 15.88GiB (difference is ~64MiB - 16448 4K blocks). lvm2 could not simply increase the size as it has been using hard cropping of the loaded metadata device to avoid warnings printing warning of kernel when the size was bigger (i.e. due to bigger extent_size). This patch adds the new lvm.conf configurable setting: allocation/thin_pool_crop_metadata which defaults to 0 -> no crop of metadata beyond 15.81GiB. Only user with these sizes of metadata will be affected. Without cropping lvm2 now limits metadata allocation size to 15.88GiB. Any space beyond is currently not used by thin-pool target. Even if i.e. bigger LV is used for metadata via lvconvert, or allocated bigger because of to large extent size. With cropping enabled (=1) lvm2 preserves the old limitation 15.81GiB and should allow to work in the evironement with older lvm2 tools (i.e. older distribution). Thin-pool metadata with size bigger then 15.81G is now using CROP_METADATA flag within lvm2 metadata, so older lvm2 recognizes an incompatible thin-pool and cannot activate such pool! Users should use uncropped version as it is not suffering from various issues between thin_repair results and allocated metadata LV as thin_repair limit is 15.88GiB Users should use cropping only when really needed! Patch also better handles resize of thin-pool metadata and prevents resize beoyond usable size 15.88GiB. Resize beyond 15.81GiB automatically switches pool to no-crop version. Even with existing bigger thin-pool metadata command 'lvextend -l+1 vg/pool_tmeta' does the change. Patch gives better controls 'coverted' metadata LV and reports less confusing message during conversion. Patch set also moves the code for updating min/max into pool_manip.c for better sharing with cache_pool code.	2021-02-01 12:06:13 +01:00
Heinz Mauelshagen	f08ef23856	lvdisplay: enhance LV status output for raid(0) In case legs of a raid0 LV are removed, the lvdisplay command still reports 'available' though raid0 is not providing any resilience compared to the other raid levels. Also lvdisplay does not display '(partial)' in case of missing raid0 legs as oposed to the lvs command. Enhance lvdisplay to report "NOT available" for any RaidLV type in case too many legs are inaccessible hence causing data loss. I.e. any leg for raid0, all for raid1, more than 1 for raid4/5, more than 2 for raid6 and in case of completely lost mirror groups for raid10. Add test/shell/lvdisplay-raid.sh. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1872678	2021-01-27 16:56:22 +01:00
David Teigland	5fef89361d	integrity: display total mismatches at raid LV level Each integrity image in a raid LV reports its own number of integrity mismatches, e.g. lvs -o integritymismatches vg/lv_rimage_0 lvs -o integritymismatches vg/lv_rimage_1 In addition to this, allow the total number of integrity mismatches from all images to be displayed for the raid LV. lvs -o integritymismatches vg/lv shows the number of mismatches from both lv_rimage_0 and lv_rimage_1.	2020-11-11 15:10:15 -06:00
David Teigland	c32d7fed4f	writecache: use two step detach When detaching a writecache, use the cleaner setting by default to writeback data prior to suspending the lv to detach the writecache. This avoids potentially blocking for a long period with the device suspended. Detaching a writecache first sets the cleaner option, waits for a short period of time (less than a second), and checks if the writecache has quickly become clean. If so, the writecache is detached immediately. This optimizes the case where little writeback is needed. If the writecache does not quickly become clean, then the detach command leaves the writecache attached with the cleaner option set. This leaves the LV in the same state as if the user had set the cleaner option directly with lvchange --cachesettings cleaner=1 LV. After leaving the LV with the cleaner option set, the detach command will wait and watch the writeback progress, and will finally detach the writecache when the writeback is finished. The detach command does not need to wait during the writeback phase, and can be canceled, in which case the LV will remain with the writecache attached and the cleaner option set. When the user runs the detach command again it will complete the detach. To detach a writecache directly, without using the cleaner step (which has been the approach previously), add the option --cachesettings cleaner=0 to the detach command.	2020-10-01 11:33:02 -05:00
Zdenek Kabelac	4de6f58085	thin: use lv_status_thin and lv_status_thin_pool Introduce structures lv_status_thin_pool and lv_status_thin (pair to lv_status_cache, lv_status_vdo) Convert lv_thin_percent() -> lv_thin_status() and lv_thin_pool_percent() + lv_thin_pool_transaction_id() -> lv_thin_pool_status(). This way a function user can see not only percentages, but also other important status info about thin-pool. TODO: This patch tries to not change too many other things, but pool_below_threshold() now uses new thin-pool info to return failure if thin-pool cannot be actually modified. This should be handle separately in a better way.	2020-09-29 10:43:56 +02:00
Zdenek Kabelac	d588de77aa	wipe: convert zero_value to uint8_t We always write this value as byte.	2020-09-15 22:52:25 +02:00
David Teigland	ed249a2c53	integrity: report mismatches with lvs -o integritymismatches reported for integrity images, which may report different values	2020-09-01 17:13:21 -05:00
David Teigland	f2c1de783c	integrity: always default to journal mode lvconvert was defaulting to bitmap mode, and lvcreate was defaulting to journal mode.	2020-09-01 17:12:28 -05:00
Zdenek Kabelac	ff4827ffb1	lv_manip: get_default_region_size return uint32_t	2020-08-28 21:43:02 +02:00
Zdenek Kabelac	bc39d5bec6	pool: zero metadata To avoid polution of metadata with some 'garbage' content or eventualy some leak of stale data in case user want to upload metadata somewhere, ensure upon allocation the metadata device is fully zeroed. Behaviour may slow down allocation of thin-pool or cache-pool a bit so the old behaviour can be restored with lvm.conf setting: allocation/zero_metadata=0 TODO: add zeroing for extension of metadata volume.	2020-06-24 15:01:03 +02:00
David Teigland	2aed2a41f7	lvcreate: new cache or writecache lv with single command To create a new cache or writecache LV with a single command: lvcreate --type cache\|writecache -n Name -L Size --cachedevice PVfast VG [PVslow ...] - A new main linear\|striped LV is created as usual, using the specified -n Name and -L Size, and using the optionally specified PVslow devices. - Then, a new cachevol LV is created internally, using PVfast specified by the cachedevice option. - Then, the cachevol is attached to the main LV, converting the main LV to type cache\|writecache. Include --cachesize Size to specify the size of cache\|writecache to create from the specified --cachedevice PVs, otherwise the entire cachedevice PV is used. The --cachedevice option can be repeated to create the cache from multiple devices, or the cachedevice option can contain a tag name specifying a set of PVs to allocate the cache from. To create a new cache or writecache LV with a single command using an existing cachevol LV: lvcreate --type cache\|writecache -n Name -L Size --cachevol LVfast VG [PVslow ...] - A new main linear\|striped LV is created as usual, using the specified -n Name and -L Size, and using the optionally specified PVslow devices. - Then, the cachevol LVfast is attached to the main LV, converting the main LV to type cache\|writecache. In cases where more advanced types (for the main LV or cachevol LV) are needed, they should be created independently and then combined with lvconvert. Example ------- user creates a new VG with one slow device and one fast device: $ vgcreate vg /dev/slow1 /dev/fast1 user creates a new 8G main LV on /dev/slow1 that uses all of /dev/fast1 as a writecache: $ lvcreate --type writecache --cachedevice /dev/fast1 -n main -L 8G vg /dev/slow1 Example ------- user creates a new VG with two slow devs and two fast devs: $ vgcreate vg /dev/slow1 /dev/slow2 /dev/fast1 /dev/fast2 user creates a new 8G main LV on /dev/slow1 and /dev/slow2 that uses all of /dev/fast1 and /dev/fast2 as a writecache: $ lvcreate --type writecache --cachedevice /dev/fast1 --cachedevice /dev/fast2 -n main -L 8G vg /dev/slow1 /dev/slow2 Example ------- A user has several slow devices and several fast devices in their VG, the slow devs have tag @slow, the fast devs have tag @fast. user creates a new 8G main LV on the slow devs with a 2G writecache on the fast devs: $ lvcreate --type writecache -n main -L 8G --cachedevice @fast --cachesize 2G vg @slow	2020-06-16 13:46:51 -05:00
David Teigland	1ee42f1391	writecache: cachesettings in lvchange and lvs lvchange --cachesettings lvs -o+cache_settings	2020-06-10 12:14:00 -05:00
David Teigland	240062a183	writecache: remove from an active lv	2020-06-10 12:13:31 -05:00
David Teigland	d945b53ff7	remove vg_read_error Once converted results to error numbers but is now just a null check.	2020-04-24 11:14:29 -05:00
David Teigland	d9e8895a96	Allow dm-integrity to be used for raid images dm-integrity stores checksums of the data written to an LV, and returns an error if data read from the LV does not match the previously saved checksum. When used on raid images, dm-raid will correct the error by reading the block from another image, and the device user sees no error. The integrity metadata (checksums) are stored on an internal LV allocated by lvm for each linear image. The internal LV is allocated on the same PV as the image. Create a raid LV with an integrity layer over each raid image (for raid levels 1,4,5,6,10): lvcreate --type raidN --raidintegrity y [options] Add an integrity layer to images of an existing raid LV: lvconvert --raidintegrity y LV Remove the integrity layer from images of a raid LV: lvconvert --raidintegrity n LV Settings Use --raidintegritymode journal\|bitmap (journal is default) to configure the method used by dm-integrity to ensure crash consistency. Initialization When integrity is added to an LV, the kernel needs to initialize the integrity metadata/checksums for all blocks in the LV. The data corruption checking performed by dm-integrity will only operate on areas of the LV that are already initialized. The progress of integrity initialization is reported by the "syncpercent" LV reporting field (and under the Cpy%Sync lvs column.) Example: create a raid1 LV with integrity: $ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB. Logical volume "rr_rimage_0_imeta" created. Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB. Logical volume "rr_rimage_1_imeta" created. Logical volume "rr" created. $ lvs -a foo LV VG Attr LSize Origin Cpy%Sync rr foo rwi-a-r--- 1.00g 4.93 [rr_rimage_0] foo gwi-aor--- 1.00g [rr_rimage_0_iorig] 41.02 [rr_rimage_0_imeta] foo ewi-ao---- 12.00m [rr_rimage_0_iorig] foo -wi-ao---- 1.00g [rr_rimage_1] foo gwi-aor--- 1.00g [rr_rimage_1_iorig] 39.45 [rr_rimage_1_imeta] foo ewi-ao---- 12.00m [rr_rimage_1_iorig] foo -wi-ao---- 1.00g [rr_rmeta_0] foo ewi-aor--- 4.00m [rr_rmeta_1] foo ewi-aor--- 4.00m	2020-04-15 12:10:32 -05:00
David Teigland	b6b4ad8e28	move pv_list code into lib	2020-04-13 10:04:14 -05:00
Zdenek Kabelac	2266a1863f	lv_manip: add lv_uniq_rename_update Add function to rename LV to either passed name or if the name is already in use, generate new lvol% name.	2019-10-21 12:14:15 +02:00
Zdenek Kabelac	2a08d6d1d4	cachevol: use CVOL UUID for cdata and cmeta layered devices Since code is using -cdata and -cmeta UUID suffixes, it does not need any new 'extra' ID to be generated and stored in metadata. Since introduce of new 'segtype' cache+CACHE_USES_CACHEVOL we can safely assume 'new' cache with cachevol will now be created without extra metadata_id and data_id in metadata. For backward compatibility, code still reads them in case older version of metadata have them - so it still should be able to activate such volumes. Bonus is lowered size of lv structure used to store info about LV (noticable with big volume groups).	2019-10-17 13:03:49 +02:00
David Teigland	91ee025d5b	cache: change cachevol flags for backward compat A cachevol LV had the CACHE_VOL status flag in metadata, and the cache LV using it had no new flag. This caused problems if the new metadata was used by an old version of lvm. An old version of lvm would have two problems processing the new metadata: . The old lvm would return an error when reading the VG metadata when it saw the unknown CACHE_VOL status flag. . The old lvm would return an error when reading the VG metadata because it would not find an expected cache pool attached to the cache LV (since the cache LV had a cachevol attached instead.) Change the use of flags: . Change the CACHE_VOL flag to be a COMPATIBLE flag (instead of a STATUS flag) so that old versions will not fail when they see it. . When a cache LV is using a cachevol, the cache LV gets a new SEGTYPE flag CACHE_USES_CACHEVOL. This flag is appended to the segtype name, so that old lvm versions will fail to use the LV because of an unknown segtype, as opposed to failing to read the VG.	2019-10-15 09:05:52 -05:00
Zdenek Kabelac	1cd308d640	cachevol: drop no longer needed functions Code is no longer used/needed.	2019-10-14 15:20:25 +02:00
David Teigland	fe16d296b0	pvmove: remove some cmirror related code which is no longer used	2019-10-11 11:31:42 -05:00
Zdenek Kabelac	cf8aee096f	vdo: introduce get_vdo_write_policy_name	2019-10-04 17:31:55 +02:00
Zdenek Kabelac	c756f76802	vdo: correct internal API for set_vdo_write_policy This is 'setting' function.	2019-10-04 17:31:55 +02:00
David Teigland	76dd9b2b51	writecache: move code into new file put writecache specific code in writecache_manip.c should be no functional change	2019-09-24 15:51:05 -05:00
David Teigland	27c3c1d7c8	writecache: display layout and role fields	2019-09-20 14:55:11 -05:00
David Teigland	6f7d7089b4	writecache: use dm suffixes and lv attributes - use internal CACHE_VOL flag on cachevol LV - add suffixes to dm uuids for internal LVs - display appropriate letters in the LV attr field - display writecache's cachevol in lvs output	2019-09-20 14:08:51 -05:00
David Teigland	5d3bced5ea	lvconvert: detaching cachevol with missing PVs . For dm-cache in writethrough, always allow splitcache, whether the cache is missing PVs or not. . For dm-cache in writeback, if the cache is missing PVs, allow splitcache with force and yes. . For dm-writecache, if the cache is missing PVs, allow splitcache with force and yes.	2019-09-20 09:59:37 -05:00
David Teigland	25b58310e3	pvscan: avoid full scan for activation When an online PV completed a VG, the standard activation functions were used to activate the VG. These functions use a full scan of all devs. When many pvscans are run during startup and need to activate many VGs, scanning all devs from all the pvscans can take a long time. Optimize VG activation in pvscan to scan only the devs in the VG being activated. This makes use of the online file info that was used to determine the VG was complete. The downside of this approach is that pvscan activation will not detect duplicate PVs and block activation, where a normal activation command (which scans all devices) would.	2019-09-03 10:11:16 -05:00
David Teigland	0404539edb	vgcreate/vgextend: restrict PVs with mixed block sizes Avoid having PVs with different logical block sizes in the same VG. This prevents LVs from having mixed block sizes, which can produce file system errors. The new config setting devices/allow_mixed_block_sizes (default 0) can be changed to 1 to return to the unrestricted mode.	2019-08-01 10:06:47 -05:00
David Teigland	b4402bd821	exported vg handling The exported VG checking/enforcement was scattered and inconsistent. This centralizes it and makes it consistent, following the existing approach for foreign and shared VGs/PVs, which are very similar to exported VGs/PVs. The access policy that now applies to foreign/shared/exported VGs/PVs, is that if a foreign/shared/exported VG/PV is named on the command line (i.e. explicitly requested by the user), and the command is not permitted to operate on it because it is foreign/shared/exported, then an access error is reported and the command exits with an error. But, if the command is processing all VGs/PVs, and happens to come across a foreign/shared/exported VG/PV (that is not explicitly named on the command line), then the command silently skips it and does not produce an error. A command using tags or --select handles inaccessible VGs/PVs the same way as a command processing all VGs/PVs, and will not report/return errors if these inaccessible VGs/PVs exist. The new policy fixes the exit codes on a somewhat random set of commands that previously exited with an error if they were looking at all VGs/PVs and an exported VG existed on the system. There should be no change to which commands are allowed/disallowed on exported VGs/PVs. Certain LV commands (lvs/lvdisplay/lvscan) would previously not display LVs from an exported VG (for unknown reasons). This has not changed. The lvm fullreport command would previously report info about an exported VG but not about the LVs in it. This has changed to include all info from the exported VG.	2019-06-25 15:39:08 -05:00
David Teigland	d16142f90f	scanning: open devs rw when rescanning for write When vg_read rescans devices with the intention of writing the VG, the label rescan can open the devs RW so they do not need to be closed and reopened RW in dev_write_bytes.	2019-06-21 10:57:49 -05:00
David Teigland	4bb7d3da0e	lvmcache: remove wrapper around lvmcache_get_vgnameids This was left over from when there was an lvmetad version of the function.	2019-06-11 14:10:14 -05:00
David Teigland	ba7ff96faf	improve reading and repairing vg metadata The fact that vg repair is implemented as a part of vg read has led to a messy and complicated implementation of vg_read, and limited and uncontrolled repair capability. This splits read and repair apart. Summary ------- - take all kinds of various repairs out of vg_read - vg_read no longer writes anything - vg_read now simply reads and returns vg metadata - vg_read ignores bad or old copies of metadata - vg_read proceeds with a single good copy of metadata - improve error checks and handling when reading - keep track of bad (corrupt) copies of metadata in lvmcache - keep track of old (seqno) copies of metadata in lvmcache - keep track of outdated PVs in lvmcache - vg_write will do basic repairs - new command vgck --updatemetdata will do all repairs Details ------- - In scan, do not delete dev from lvmcache if reading/processing fails; the dev is still present, and removing it makes it look like the dev is not there. Records are now kept about the problems with each PV so they be fixed/repaired in the appropriate places. - In scan, record a bad mda on failure, and delete the mda from mda in use list so it will not be used by vg_read or vg_write, only by repair. - In scan, succeed if any good mda on a device is found, instead of failing if any is bad. The bad/old copies of metadata should not interfere with normal usage while good copies can be used. - In scan, add a record of old mdas in lvmcache for later, do not repair them while reading, and do not let them prevent us from finding and using a good copy of metadata from elsewhere. One result is that "inconsistent metadata" is no longer a read error, but instead a record in lvmcache that can be addressed separate from the read. - Treat a dev with no good mdas like a dev with no mdas, which is an existing case we already handle. - Don't use a fake vg "handle" for returning an error from vg_read, or the vg_read_error function for getting that error number; just return null if the vg cannot be read or used, and an error_flags arg with flags set for the specific kind of error (which can be used later for determining the kind of repair.) - Saving an original copy of the vg metadata, for purposes of reverting a write, is now done explicitly in vg_read instead of being hidden in the vg_make_handle function. - When a vg is not accessible due to "access restrictions" but is otherwise fine, return the vg through the new error_vg arg so that process_each_pv can skip the PVs in the VG while processing. (This is a temporary accomodation for the way process_each_pv tracks which devs have been looked at, and can be dropped later when process_each_pv implementation dev tracking is changed.) - vg_read does not try to fix or recover a vg, but now just reads the metadata, checks access restrictions and returns it. (Checking access restrictions might be better done outside of vg_read, but this is a later improvement.) - _vg_read now simply makes one attempt to read metadata from each mda, and uses the most recent copy to return to the caller in the form of a 'vg' struct. (bad mdas were excluded during the scan and are not retried) (old mdas were not excluded during scan and are retried here) - vg_read uses _vg_read to get the latest copy of metadata from mdas, and then makes various checks against it to produce warnings, and to check if VG access is allowed (access restrictions include: writable, foreign, shared, clustered, missing pvs). - Things that were previously silently/automatically written by vg_read that are now done by vg_write, based on the records made in lvmcache during the scan and read: . clearing the missing flag . updating old copies of metadata . clearing outdated pvs . updating pv header flags - Bad/corrupt metadata are now repaired; they were not before. Test changes ------------ - A read command no longer writes the VG to repair it, so add a write command to do a repair. (inconsistent-metadata, unlost-pv) - When a missing PV is removed from a VG, and then the device is enabled again, vgck --updatemetadata is needed to clear the outdated PV before it can be used again, where it wasn't before. (lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair, mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv) Reading bad/old metadata ------------------------ - "bad metadata": the mda_header or metadata text has invalid fields or can't be parsed by lvm. This is a form of corruption that would not be caused by known failure scenarios. A checksum error is typically included among the errors reported. - "old metadata": a valid copy of the metadata that has a smaller seqno than other copies of the metadata. This can happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that has been removed from the VG is the "outdated" case below. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad metadata in particular is something that users will want to investigate and repair themselves, since it should not happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with the metadata repair command. When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Outdated PVs ------------ An outdated PV is a PV that has an old copy of VG metadata that shows it is a member of the VG, but the latest copy of the VG metadata does not include this PV. This happens if the PV is disconnected, vgreduce --removemissing is run to remove the PV from the VG, then the PV is reconnected. In this case, the outdated PV needs have its outdated metadata removed and the PV used flag needs to be cleared. This repair will be done by the subsequent repair command. It is also done if vgremove is run on the VG. MISSING PVs ----------- When a device is missing, most commands will refuse to modify the VG. This is the simple case. More complicated is when a command is allowed to modify the VG while it is missing a device. When a VG is written while a device is missing for one of it's PVs, the VG metadata is written to disk with the MISSING flag on the PV with the missing device. When the VG is next used, it is treated as if the PV with the MISSING flag still has a missing device, even if that device has reappeared. If all LVs that were using a PV with the MISSING flag are removed or repaired so that the MISSING PV is no longer used, then the next time the VG metadata is written, the MISSING flag will be dropped. Alternative methods of clearing the MISSING flag are: vgreduce --removemissing will remove PVs with missing devices, or PVs with the MISSING flag where the device has reappeared. vgextend --restoremissing will clear the MISSING flag on PVs where the device has reappeared, allowing the VG to be used normally. This must be done with caution since the reappeared device may have old data that is inconsistent with data on other PVs. Bad mda repair -------------- The new command: vgck --updatemetadata VG first uses vg_write to repair old metadata, and other basic issues mentioned above (old metadata, outdated PVs, pv_header flags, MISSING_PV flags). It will also go further and repair bad metadata: . text metadata that has a bad checksum . text metadata that is not parsable . corrupt mda_header checksum and version fields (To keep a clean diff, #if 0 is added around functions that are replaced by new code. These commented functions are removed by the following commit.)	2019-06-07 15:54:04 -05:00
David Teigland	47effdc025	vgck --updatemetadata is a new command uses vg_write to correct more common or less severe issues, and also adds the ability to repair some metadata corruption that couldn't be handled previously.	2019-06-07 15:54:04 -05:00
David Teigland	8c87dda195	locking: unify global lock for flock and lockd There have been two file locks used to protect lvm "global state": "ORPHANS" and "GLOBAL". Commands that used the ORPHAN flock in exclusive mode: pvcreate, pvremove, vgcreate, vgextend, vgremove, vgcfgrestore Commands that used the ORPHAN flock in shared mode: vgimportclone, pvs, pvscan, pvresize, pvmove, pvdisplay, pvchange, fullreport Commands that used the GLOBAL flock in exclusive mode: pvchange, pvscan, vgimportclone, vgscan Commands that used the GLOBAL flock in shared mode: pvscan --cache, pvs The ORPHAN lock covers the important cases of serializing the use of orphan PVs. It also partially covers the reporting of orphan PVs (although not correctly as explained below.) The GLOBAL lock doesn't seem to have a clear purpose (it may have eroded over time.) Neither lock correctly protects the VG namespace, or orphan PV properties. To simplify and correct these issues, the two separate flocks are combined into the one GLOBAL flock, and this flock is used from the locking sites that are in place for the lvmlockd global lock. The logic behind the lvmlockd (distributed) global lock is that any command that changes "global state" needs to take the global lock in ex mode. Global state in lvm is: the list of VG names, the set of orphan PVs, and any properties of orphan PVs. Reading this global state can use the global lock in sh mode to ensure it doesn't change while being reported. The locking of global state now looks like: lockd_global() previously named lockd_gl(), acquires the distributed global lock through lvmlockd. This is unchanged. It serializes distributed lvm commands that are changing global state. This is a no-op when lvmlockd is not in use. lockf_global() acquires an flock on a local file. It serializes local lvm commands that are changing global state. lock_global() first calls lockf_global() to acquire the local flock for global state, and if this succeeds, it calls lockd_global() to acquire the distributed lock for global state. Replace instances of lockd_gl() with lock_global(), so that the existing sites for lvmlockd global state locking are now also used for local file locking of global state. Remove the previous file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN). The following commands which change global state are now serialized with the exclusive global flock: pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove, vgcreate, vgextend, vgremove, vgreduce, vgrename, vgcfgrestore, vgimportclone, vgmerge, vgsplit Commands that use a shared flock to read global state (and will be serialized against the prior list) are those that use process_each functions that are based on processing a list of all VG names, or all PVs. The list of all VGs or all PVs is global state and the shared lock prevents those lists from changing while the command is processing them. The ORPHAN lock previously attempted to produce an accurate listing of orphan PVs, but it was only acquired at the end of the command during the fake vg_read of the fake orphan vg. This is not when orphan PVs were determined; they were determined by elimination beforehand by processing all real VGs, and subtracting the PVs in the real VGs from the list of all PVs that had been identified during the initial scan. This is fixed by holding the single global lock in shared mode while processing all VGs to determine the list of orphan PVs.	2019-04-29 13:01:05 -05:00
David Teigland	85e68a8333	lvextend: refresh shared LV remotely using dlm/corosync When lvextend extends an LV that is active with a shared lock, use this as a signal that other hosts may also have the LV active, with gfs2 mounted, and should have the LV refreshed to reflect the new size. Use the libdlmcontrol run api, which uses dlm_controld/corosync to run an lvchange --refresh command on other cluster nodes.	2019-03-21 12:38:20 -05:00
David Teigland	4e20ebd6a1	pvscan: ignore online for shared and foreign PVs Activation would not be allowed anyway, but we can check for these cases early and avoid wasted time in pvscan managing online files an attempting activation.	2019-03-05 15:19:05 -06:00
David Teigland	a9eaab6beb	Use "cachevol" to refer to cache on a single LV and "cachepool" to refer to a cache on a cache pool object. The problem was that the --cachepool option was being used to refer to both a cache pool object, and to a standard LV used for caching. This could be somewhat confusing, and it made it less clear when each kind would be used. By separating them, it's clear when a cachepool or a cachevol should be used. Previously: - lvm would use the cache pool approach when the user passed a cache-pool LV to the --cachepool option. - lvm would use the cache vol approach when the user passed a standard LV in the --cachepool option. Now: - lvm will always use the cache pool approach when the user uses the --cachepool option. - lvm will always use the cache vol approach when the user uses the --cachevol option.	2019-02-27 08:52:34 -06:00
Zdenek Kabelac	ab031d673d	vdo: introduce function for estimation of virtual size	2019-01-21 12:53:16 +01:00
Heinz Mauelshagen	dd5716ddf2	raid: fix (de)activation of RaidLVs with visible SubLVs There's a small window during creation of a new RaidLV when rmeta SubLVs are made visible to wipe them in order to prevent erroneous discovery of stale RAID metadata. In case a crash prevents the SubLVs from being committed hidden after such wiping, the RaidLV can still be activated with the SubLVs visible. During deactivation though, a deadlock occurs because the visible SubLVs are deactivated before the RaidLV. The patch adds _check_raid_sublvs to the raid validation in merge.c, an activation check to activate.c (paranoid, because the merge.c check will prevent activation in case of visible SubLVs) and shares the existing wiping function _clear_lvs in raid_manip.c moved to lv_manip.c and renamed to activate_and_wipe_lvlist to remove code duplication. Whilst on it, introduce activate_and_wipe_lv to share with (lvconvert\|lvchange).c. Resolves: rhbz1633167	2018-12-11 16:35:34 +01:00
David Teigland	904e1e3d26	Place the first PE at 1 MiB for all defaults . When using default settings, this commit should change nothing. The first PE continues to be placed at 1 MiB resulting in a metadata area size of 1020 KiB (for 4K page sizes; slightly smaller for larger page sizes.) . When default_data_alignment is disabled in lvm.conf, align pe_start at 1 MiB, based on a default metadata area size that adapts to the page size. Previously, disabling this option would result in mda_size that was too small for common use, and produced a 64 KiB aligned pe_start. . Customized pe_start and mda_size values continue to be set as before in lvm.conf and command line. . Remove the configure option for setting default_data_alignment at build time. . Improve alignment related option descriptions. . Add section about alignment to pvcreate man page. Previously, DEFAULT_PVMETADATASIZE was 255 sectors. However, the fact that the config setting named "default_data_alignment" has a default value of 1 (MiB) meant that DEFAULT_PVMETADATASIZE was having no effect. The metadata area size is the space between the start of the metadata area (page size offset from the start of the device) and the first PE (1 MiB by default due to default_data_alignment 1.) The result is a 1020 KiB metadata area on machines with 4KiB page size (1024 KiB - 4 KiB), and smaller on machines with larger page size. If default_data_alignment was set to 0 (disabled), then DEFAULT_PVMETADATASIZE 255 would take effect, and produce a metadata area that was 188 KiB and pe_start of 192 KiB. This was too small for common use. This is fixed by making the default metadata area size a computed value that matches the value produced by default_data_alignment.	2018-11-26 16:36:50 -06:00
David Teigland	3ae5569570	Add dm-writecache support dm-writecache is used like dm-cache with a standard LV as the cache. $ lvcreate -n main -L 128M -an foo /dev/loop0 $ lvcreate -n fast -L 32M -an foo /dev/pmem0 $ lvconvert --type writecache --cachepool fast foo/main $ lvs -a foo -o+devices LV VG Attr LSize Origin Devices [fast] foo -wi------- 32.00m /dev/pmem0(0) main foo Cwi------- 128.00m [main_wcorig] main_wcorig(0) [main_wcorig] foo -wi------- 128.00m /dev/loop0(0) $ lvchange -ay foo/main $ dmsetup table foo-main_wcorig: 0 262144 linear 7:0 2048 foo-main: 0 262144 writecache p 253:4 253:3 4096 0 foo-fast: 0 65536 linear 259:0 2048 $ lvchange -an foo/main $ lvconvert --splitcache foo/main $ lvs -a foo -o+devices LV VG Attr LSize Devices fast foo -wi------- 32.00m /dev/pmem0(0) main foo -wi------- 128.00m /dev/loop0(0)	2018-11-06 14:18:41 -06:00

1 2 3 4 5 ...

566 Commits