shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2025-01-07 21:18:59 +03:00

Author	SHA1	Message	Date
David Teigland	a3a676e0e7	metadata.c: removed unused code if 0 was placed around old vg_read code by the previous commit.	2019-06-07 15:54:04 -05:00
David Teigland	ba7ff96faf	improve reading and repairing vg metadata The fact that vg repair is implemented as a part of vg read has led to a messy and complicated implementation of vg_read, and limited and uncontrolled repair capability. This splits read and repair apart. Summary ------- - take all kinds of various repairs out of vg_read - vg_read no longer writes anything - vg_read now simply reads and returns vg metadata - vg_read ignores bad or old copies of metadata - vg_read proceeds with a single good copy of metadata - improve error checks and handling when reading - keep track of bad (corrupt) copies of metadata in lvmcache - keep track of old (seqno) copies of metadata in lvmcache - keep track of outdated PVs in lvmcache - vg_write will do basic repairs - new command vgck --updatemetdata will do all repairs Details ------- - In scan, do not delete dev from lvmcache if reading/processing fails; the dev is still present, and removing it makes it look like the dev is not there. Records are now kept about the problems with each PV so they be fixed/repaired in the appropriate places. - In scan, record a bad mda on failure, and delete the mda from mda in use list so it will not be used by vg_read or vg_write, only by repair. - In scan, succeed if any good mda on a device is found, instead of failing if any is bad. The bad/old copies of metadata should not interfere with normal usage while good copies can be used. - In scan, add a record of old mdas in lvmcache for later, do not repair them while reading, and do not let them prevent us from finding and using a good copy of metadata from elsewhere. One result is that "inconsistent metadata" is no longer a read error, but instead a record in lvmcache that can be addressed separate from the read. - Treat a dev with no good mdas like a dev with no mdas, which is an existing case we already handle. - Don't use a fake vg "handle" for returning an error from vg_read, or the vg_read_error function for getting that error number; just return null if the vg cannot be read or used, and an error_flags arg with flags set for the specific kind of error (which can be used later for determining the kind of repair.) - Saving an original copy of the vg metadata, for purposes of reverting a write, is now done explicitly in vg_read instead of being hidden in the vg_make_handle function. - When a vg is not accessible due to "access restrictions" but is otherwise fine, return the vg through the new error_vg arg so that process_each_pv can skip the PVs in the VG while processing. (This is a temporary accomodation for the way process_each_pv tracks which devs have been looked at, and can be dropped later when process_each_pv implementation dev tracking is changed.) - vg_read does not try to fix or recover a vg, but now just reads the metadata, checks access restrictions and returns it. (Checking access restrictions might be better done outside of vg_read, but this is a later improvement.) - _vg_read now simply makes one attempt to read metadata from each mda, and uses the most recent copy to return to the caller in the form of a 'vg' struct. (bad mdas were excluded during the scan and are not retried) (old mdas were not excluded during scan and are retried here) - vg_read uses _vg_read to get the latest copy of metadata from mdas, and then makes various checks against it to produce warnings, and to check if VG access is allowed (access restrictions include: writable, foreign, shared, clustered, missing pvs). - Things that were previously silently/automatically written by vg_read that are now done by vg_write, based on the records made in lvmcache during the scan and read: . clearing the missing flag . updating old copies of metadata . clearing outdated pvs . updating pv header flags - Bad/corrupt metadata are now repaired; they were not before. Test changes ------------ - A read command no longer writes the VG to repair it, so add a write command to do a repair. (inconsistent-metadata, unlost-pv) - When a missing PV is removed from a VG, and then the device is enabled again, vgck --updatemetadata is needed to clear the outdated PV before it can be used again, where it wasn't before. (lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair, mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv) Reading bad/old metadata ------------------------ - "bad metadata": the mda_header or metadata text has invalid fields or can't be parsed by lvm. This is a form of corruption that would not be caused by known failure scenarios. A checksum error is typically included among the errors reported. - "old metadata": a valid copy of the metadata that has a smaller seqno than other copies of the metadata. This can happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that has been removed from the VG is the "outdated" case below. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad metadata in particular is something that users will want to investigate and repair themselves, since it should not happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with the metadata repair command. When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Outdated PVs ------------ An outdated PV is a PV that has an old copy of VG metadata that shows it is a member of the VG, but the latest copy of the VG metadata does not include this PV. This happens if the PV is disconnected, vgreduce --removemissing is run to remove the PV from the VG, then the PV is reconnected. In this case, the outdated PV needs have its outdated metadata removed and the PV used flag needs to be cleared. This repair will be done by the subsequent repair command. It is also done if vgremove is run on the VG. MISSING PVs ----------- When a device is missing, most commands will refuse to modify the VG. This is the simple case. More complicated is when a command is allowed to modify the VG while it is missing a device. When a VG is written while a device is missing for one of it's PVs, the VG metadata is written to disk with the MISSING flag on the PV with the missing device. When the VG is next used, it is treated as if the PV with the MISSING flag still has a missing device, even if that device has reappeared. If all LVs that were using a PV with the MISSING flag are removed or repaired so that the MISSING PV is no longer used, then the next time the VG metadata is written, the MISSING flag will be dropped. Alternative methods of clearing the MISSING flag are: vgreduce --removemissing will remove PVs with missing devices, or PVs with the MISSING flag where the device has reappeared. vgextend --restoremissing will clear the MISSING flag on PVs where the device has reappeared, allowing the VG to be used normally. This must be done with caution since the reappeared device may have old data that is inconsistent with data on other PVs. Bad mda repair -------------- The new command: vgck --updatemetadata VG first uses vg_write to repair old metadata, and other basic issues mentioned above (old metadata, outdated PVs, pv_header flags, MISSING_PV flags). It will also go further and repair bad metadata: . text metadata that has a bad checksum . text metadata that is not parsable . corrupt mda_header checksum and version fields (To keep a clean diff, #if 0 is added around functions that are replaced by new code. These commented functions are removed by the following commit.)	2019-06-07 15:54:04 -05:00
David Teigland	015b906069	add a warning message when updating old metadata in an mda that had previously not been updated	2019-06-07 15:54:04 -05:00
David Teigland	47effdc025	vgck --updatemetadata is a new command uses vg_write to correct more common or less severe issues, and also adds the ability to repair some metadata corruption that couldn't be handled previously.	2019-06-07 15:54:04 -05:00
David Teigland	de3d3b11f4	move pv header repairs to vg_write Correct PV header in-use or version fields from vg_write instead of vg_read.	2019-06-07 15:54:04 -05:00
David Teigland	ab61a6d85d	move wipe_outdated_pvs to vg_write and implement it based on a device, not based on a pv struct (which is not available when the device is not a part of the vg.) currently only the vgremove command wipes outdated pvs until more advanced recovery is added in a subsequent commit	2019-06-07 15:54:04 -05:00
David Teigland	45b164f62c	create separate lvmcache update functions for read and write The vg read and vg write cases need to update lvmcache differently, so create separate functions for them. The read case now handles checking for outdated mdas and moves them aside into a new list to be repaired in a subsequent commit.	2019-06-07 15:54:04 -05:00
David Teigland	027e0e92e6	fix vg_commit return value The existing comment was desribing the correct behavior, but the code didn't match. The commit is successful if one mda was committed. Making it depend on the result of the internal lvmcache update was wrong.	2019-06-07 15:54:04 -05:00
David Teigland	2b241eb1f6	pvck: use new dump routines for old output Use the recently added dump routines to produce the old/traditional pvck output, and remove the code that had been used for that. The validation/checking done by the new routines means that new lines prefixed with CHECK are printed for incorrect values.	2019-06-05 16:28:52 -05:00
David Teigland	645dd27604	separate code for setting devices from metadata parsing Pull the code that sets devs for PVs out of the metadata parsing code and call it separately.	2019-05-23 11:57:38 -05:00
David Teigland	8c87dda195	locking: unify global lock for flock and lockd There have been two file locks used to protect lvm "global state": "ORPHANS" and "GLOBAL". Commands that used the ORPHAN flock in exclusive mode: pvcreate, pvremove, vgcreate, vgextend, vgremove, vgcfgrestore Commands that used the ORPHAN flock in shared mode: vgimportclone, pvs, pvscan, pvresize, pvmove, pvdisplay, pvchange, fullreport Commands that used the GLOBAL flock in exclusive mode: pvchange, pvscan, vgimportclone, vgscan Commands that used the GLOBAL flock in shared mode: pvscan --cache, pvs The ORPHAN lock covers the important cases of serializing the use of orphan PVs. It also partially covers the reporting of orphan PVs (although not correctly as explained below.) The GLOBAL lock doesn't seem to have a clear purpose (it may have eroded over time.) Neither lock correctly protects the VG namespace, or orphan PV properties. To simplify and correct these issues, the two separate flocks are combined into the one GLOBAL flock, and this flock is used from the locking sites that are in place for the lvmlockd global lock. The logic behind the lvmlockd (distributed) global lock is that any command that changes "global state" needs to take the global lock in ex mode. Global state in lvm is: the list of VG names, the set of orphan PVs, and any properties of orphan PVs. Reading this global state can use the global lock in sh mode to ensure it doesn't change while being reported. The locking of global state now looks like: lockd_global() previously named lockd_gl(), acquires the distributed global lock through lvmlockd. This is unchanged. It serializes distributed lvm commands that are changing global state. This is a no-op when lvmlockd is not in use. lockf_global() acquires an flock on a local file. It serializes local lvm commands that are changing global state. lock_global() first calls lockf_global() to acquire the local flock for global state, and if this succeeds, it calls lockd_global() to acquire the distributed lock for global state. Replace instances of lockd_gl() with lock_global(), so that the existing sites for lvmlockd global state locking are now also used for local file locking of global state. Remove the previous file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN). The following commands which change global state are now serialized with the exclusive global flock: pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove, vgcreate, vgextend, vgremove, vgreduce, vgrename, vgcfgrestore, vgimportclone, vgmerge, vgsplit Commands that use a shared flock to read global state (and will be serialized against the prior list) are those that use process_each functions that are based on processing a list of all VG names, or all PVs. The list of all VGs or all PVs is global state and the shared lock prevents those lists from changing while the command is processing them. The ORPHAN lock previously attempted to produce an accurate listing of orphan PVs, but it was only acquired at the end of the command during the fake vg_read of the fake orphan vg. This is not when orphan PVs were determined; they were determined by elimination beforehand by processing all real VGs, and subtracting the PVs in the real VGs from the list of all PVs that had been identified during the initial scan. This is fixed by holding the single global lock in shared mode while processing all VGs to determine the list of orphan PVs.	2019-04-29 13:01:05 -05:00
David Teigland	4e20ebd6a1	pvscan: ignore online for shared and foreign PVs Activation would not be allowed anyway, but we can check for these cases early and avoid wasted time in pvscan managing online files an attempting activation.	2019-03-05 15:19:05 -06:00
Zdenek Kabelac	9dfb1a11b7	cov: drop unneeded header file MAX macro no longer needed in pe_align.	2018-12-21 21:45:08 +01:00
David Teigland	904e1e3d26	Place the first PE at 1 MiB for all defaults . When using default settings, this commit should change nothing. The first PE continues to be placed at 1 MiB resulting in a metadata area size of 1020 KiB (for 4K page sizes; slightly smaller for larger page sizes.) . When default_data_alignment is disabled in lvm.conf, align pe_start at 1 MiB, based on a default metadata area size that adapts to the page size. Previously, disabling this option would result in mda_size that was too small for common use, and produced a 64 KiB aligned pe_start. . Customized pe_start and mda_size values continue to be set as before in lvm.conf and command line. . Remove the configure option for setting default_data_alignment at build time. . Improve alignment related option descriptions. . Add section about alignment to pvcreate man page. Previously, DEFAULT_PVMETADATASIZE was 255 sectors. However, the fact that the config setting named "default_data_alignment" has a default value of 1 (MiB) meant that DEFAULT_PVMETADATASIZE was having no effect. The metadata area size is the space between the start of the metadata area (page size offset from the start of the device) and the first PE (1 MiB by default due to default_data_alignment 1.) The result is a 1020 KiB metadata area on machines with 4KiB page size (1024 KiB - 4 KiB), and smaller on machines with larger page size. If default_data_alignment was set to 0 (disabled), then DEFAULT_PVMETADATASIZE 255 would take effect, and produce a metadata area that was 188 KiB and pe_start of 192 KiB. This was too small for common use. This is fixed by making the default metadata area size a computed value that matches the value produced by default_data_alignment.	2018-11-26 16:36:50 -06:00
David Teigland	3ae5569570	Add dm-writecache support dm-writecache is used like dm-cache with a standard LV as the cache. $ lvcreate -n main -L 128M -an foo /dev/loop0 $ lvcreate -n fast -L 32M -an foo /dev/pmem0 $ lvconvert --type writecache --cachepool fast foo/main $ lvs -a foo -o+devices LV VG Attr LSize Origin Devices [fast] foo -wi------- 32.00m /dev/pmem0(0) main foo Cwi------- 128.00m [main_wcorig] main_wcorig(0) [main_wcorig] foo -wi------- 128.00m /dev/loop0(0) $ lvchange -ay foo/main $ dmsetup table foo-main_wcorig: 0 262144 linear 7:0 2048 foo-main: 0 262144 writecache p 253:4 253:3 4096 0 foo-fast: 0 65536 linear 259:0 2048 $ lvchange -an foo/main $ lvconvert --splitcache foo/main $ lvs -a foo -o+devices LV VG Attr LSize Devices fast foo -wi------- 32.00m /dev/pmem0(0) main foo -wi------- 128.00m /dev/loop0(0)	2018-11-06 14:18:41 -06:00
Zdenek Kabelac	70e3d0a613	cov: remove unused assigns	2018-11-05 17:25:11 +01:00
David Teigland	117160b27e	Remove lvmetad Native disk scanning is now both reduced and async/parallel, which makes it comparable in performance (and often faster) when compared to lvm using lvmetad. Autoactivation now uses local temp files to record online PVs, and no longer requires lvmetad. There should be no apparent command-level change in behavior.	2018-07-11 11:26:42 -05:00
David Teigland	428514a07f	Drop --ignoreskippedcluster option It's no longer needed. Clustered VGs are now handled in the same way as foreign VGs, and as shared VGs that can't be accessed: - A command processing all VGs sees a clustered VG, prints a message ("Skipping clustered VG foo."), skips it, and does not fail. - A command where the clustered VG is explicitly named on the command line, prints a message and fails. "Cannot access clustered VG foo, see lvmlockd(8)." The option is listed in the set of ignored options for the commands that previously accepted it. (Removing it entirely would cause commands/scripts to fail if they set it.)	2018-06-15 15:59:34 -05:00
David Teigland	8eab37593e	Add cmd arg to more functions so that it can be used in the filter code	2018-06-15 11:03:55 -05:00
David Teigland	e53cfc6a88	lvmlockd: update method for changing clustered VG The previous method for forcibly changing a clustered VG to a local VG involved using -cn and locking_type 0. Since those options are deprecated, replace it with the same command used for other forced lock type changes: vgchange --locktype none --lockopt force.	2018-06-13 15:30:28 -05:00
David Teigland	17f5572bc9	Remove independent metadata areas in which metadata is stored in files on the local fs instead of on PVs.	2018-06-13 12:25:19 -05:00
David Teigland	981a3ba98e	Clean up repair and result values in vg_read Fix the confusing mix of input and output values in the single variable.	2018-06-12 11:08:26 -05:00
David Teigland	9a8c36b891	Fix use of orphan lock in commands vgreduce, vgremove and vgcfgrestore were acquiring the orphan lock in the midst of command processing instead of at the start of the command. (The orphan lock moved to being acquired at the start of the command back when pvcreate/vgcreate/vgextend were reworked based on pvcreate_each_device.) vgsplit also needed a small update to avoid reacquiring a VG lock that it already held (for the new VG name).	2018-06-12 09:46:11 -05:00
David Teigland	c4153a8dfc	Remove checking for locked VGs A few places were calling a function to check if a VG lock was held. The only place it was actually needed is for pvcreate which wants to do its own locking (and scanning) around process_each_pv. The locking/scanning exceptions for pvcreate in process_each_pv/vg_read can be enabled by just passing a couple of flags instead of checking if the VG is already locked. This also means that these special cases won't be enabled unknowingly in other places where they shouldn't be used.	2018-06-12 09:46:04 -05:00
David Teigland	3b6b7f8f9b	lvmlockd: skip repair lock upgrade for non shared vgs Only attempt lvmlockd lock upgrade for shared VGs.	2018-06-12 09:44:05 -05:00
David Teigland	a8759dc7a6	Remove unused cache management from locking This code was for managing lvmcache for clvm and it no longer does anything.	2018-06-08 12:30:43 -05:00
David Teigland	73b7e6fde7	Remove more code that was only used by liblvm2app	2018-06-08 09:29:11 -05:00
David Teigland	e4d9099e19	Remove more clvm code	2018-06-07 16:17:04 +01:00
David Teigland	3e781ea446	Remove clvmd and associated code More code reduction and simplification can follow.	2018-06-05 11:09:13 -05:00
David Teigland	09177b53dd	lvmlockd: clarify lock_type use for coverity Make it clearer when vg->lock_type will be used so coverity doesn't worry about it.	2018-06-01 13:15:22 -05:00
David Teigland	b6f0f20da2	lvmlockd: primarily use vg_is_shared to check if a vg uses an lvmlockd lock_type, instead of the equivalent but longer is_lockd_type.	2018-06-01 13:15:22 -05:00
Joe Thornber	dbba1e9b93	Merge branch 'master' into 2018-05-11-fork-libdm	2018-06-01 13:04:12 +01:00
David Teigland	fdaa7e2e87	vgs: add report field for shared equivalent to a non-empty -o locktype.	2018-05-31 10:23:03 -05:00
David Teigland	6cd0523337	lvmlockd: enable repairing shared VG while reading it When the lvmlockd lock is shared, upgrade it to ex when repair (writing) is needed during vg_read. Pass the lockd state through additional read-related functions so the instances of repair scattered through vg_read can be handled. (Temporary solution until the ad hoc repairs can be pulled out of vg_read into a top level, centralized repair function.)	2018-05-30 12:56:46 -05:00
David Teigland	0253f5a21d	fix id_write_format on non-uuid string orphan vgs using the vgname "#orphans" as the vgid, and valgrind complains about calling id_write_format on that invalid uuid.	2018-05-18 13:41:20 -05:00
David Teigland	286c9c78b4	liblvm2app: fix valgrind memory warning	2018-05-17 15:18:11 -05:00
Rick Elrod	8c453e2e5e	cleanup: fix grammar in output - less then -> less than This minor patch fixes grammar in a few messages which get printed to users. It also fixes the same grammar mistake in several comments. Signed-off-by: Rick Elrod <relrod@redhat.com> --	2018-05-17 10:37:45 +02:00
David Teigland	28d35e5c59	scan: fix missing close in lib lib was using dev_test_excl which wasn't closing the device. Switch code to new io layer with excl open. Also use exclusive open in some other places.	2018-05-16 14:48:30 -05:00
Joe Thornber	89fdc0b588	Merge branch 'master' into 2018-05-11-fork-libdm	2018-05-16 13:43:02 +01:00
Joe Thornber	7f97c7ea9a	build: Don't generate symlinks in include/ dir As we start refactoring the code to break dependencies (see doc/refactoring.txt), I want us to use full paths in the includes (eg, #include "base/data-struct/list.h"). This makes it more obvious when we're breaking abstraction boundaries, eg, including a file in metadata/ from base/	2018-05-14 10:30:20 +01:00
David Teigland	5c9dcd99fd	scan: remove unused args from label_read	2018-05-11 14:16:49 -05:00
David Teigland	bbb8040456	dev_cache: drop open_list devices are now held open only in bcache, so drop the dev_cache list of open devices which is unused.	2018-05-11 12:47:56 -05:00
David Teigland	57bb46c5e7	filter: use bcache for filter reads Filters are still applied before any device reading or the label scan, but any filter checks that want to read the device are skipped and the device is flagged. After bcache is populated, but before lvm looks for devices (i.e. before label scan), the filters are reapplied to the devices that were flagged above. The filters will then find the data they need in bcache.	2018-05-10 16:03:19 -05:00
David Teigland	c016b573ee	clvmd: separate saved_vg from vginfo The clvmd saved_vg data is independent from the normal lvm lvmcache vginfo data, so separate saved_vg from vginfo. Normal lvm doesn't need to use save_vg at all, and in clvmd, lvmcache changes on vginfo can be made without worrying about unwanted effects on saved_vg.	2018-05-03 14:54:48 -05:00
David Teigland	c1cd18f21e	Remove lvm1 and pool disk formats There are likely more bits of code that can be removed, e.g. lvm1/pool-specific bits of code that were identified using FMT flags. The vgconvert command can likely be reduced further. The lvm1-specific config settings should probably have some other fields set for proper deprecation.	2018-04-30 16:55:02 -05:00
David Teigland	029a76b4f8	clvmd: don't repair vg from vg_read in clvmd The mixed up vg repair code in vg_read was trying to repair a vg when vg_read was called by clvmd. The clvmd daemon isn't supposed to be repairing or writing a vg. (This is a temporary workaround; vg repair will soon be pulled out of vg_read so it can be called in a controlled way and consolidated instead of spread around.)	2018-04-30 15:56:51 -05:00
David Teigland	5b6e62dc1f	clvmd: drop old saved_vg when returning new saved_vg In some pvmove tests, clvmd uses the new (precommitted) saved_vg, but then requests the old saved_vg, and expects that the new saved_vg be returned instead of the old. So, when returning the new saved_vg, forget the old one so we don't return it again.	2018-04-26 14:57:45 -05:00
David Teigland	47bfac21ca	clvmd: skip dev rescan after full scan When clvmd does a full label scan just prior to calling _vg_read(), pass a new flag into _vg_read to indicate that the normal rescan of VG devs is not needed.	2018-04-25 16:39:43 -05:00
David Teigland	1fec86571f	clvmd: reuse a vg struct for sequential LV operations After reading a VG, stash it in lvmcache as "saved_vg". Before reading the VG again, try to use the saved_vg. The saved_vg is dropped on VG lock operations.	2018-04-25 16:39:43 -05:00
David Teigland	1409c4a1c2	clvm: rescan when VG or PV not found Rescan devices to update lvmcache content when clvmd vg_read doesn't find a VG or PV.	2018-04-20 16:09:49 -05:00
David Teigland	aee27dc7ba	scan: skip device rescan in vg_read For reporting commands (pvs,vgs,lvs,pvdisplay,vgdisplay,lvdisplay) we do not need to repeat the label scan of devices in vg_read if they all had matching metadata in the initial label scan. The data read by label scan can just be reused for the vg_read. This cuts the amount of device i/o in half, from two reads of each device to one. We have to be careful to avoid repairing the VG if we've skipped rescanning. (The VG repair code is very poor, and will be redone soon.)	2018-04-20 11:23:14 -05:00
David Teigland	9b6a62f944	lvmcache: simplify Recent changes allow some major simplification of the way lvmcache works and is used. lvmcache_label_scan is now called in a controlled fashion at the start of commands, and not via various unpredictable side effects. Remove various calls to it from other places. lvmcache_label_scan should not be called from anywhere during a command, because it produces an incorrect representation of PVs with no MDAs, and misclassifies them as orphans. This has been a long standing problem. The invalid flag and rescanning based on that is no longer used and removed. The 'force' variation is no longer needed and removed.	2018-04-20 11:22:48 -05:00
David Teigland	a9b0aa5c17	lvmetad: more fixes related to bcache Need to open devs prior to bcache io.	2018-04-20 11:22:48 -05:00
David Teigland	ddb5de7a98	clvm: fix bcache scan handling We can't let clvmd keep all scanned devs open, which prevents them from being removed. So drop the bcache data (and close fds) affter doing a label scan. Also set up bcache before the clvm-specific vg_read (which needs to rescan the vg's devs using bcache) and destroy the bcache after.	2018-04-20 11:22:48 -05:00
David Teigland	e49b114f7e	bcache: use wrappers for bcache read write in lvm Using a wrapper makes it easier to disable bcache if needed.	2018-04-20 11:22:47 -05:00
David Teigland	8065492046	bcache: do all writes through bcache	2018-04-20 11:22:47 -05:00
David Teigland	37471bb477	scan: skip extra scan in vg_read Drop an extra label scan in the recovery part of vg_read. This is a temporary improvement until the pending replacement for the broken recovery code burried in vg_read.	2018-04-20 11:22:46 -05:00
David Teigland	d9a77e8bb4	lvmcache: simplify metadata cache The copy of VG metadata stored in lvmcache was not being used in general. It pretended to be a generic VG metadata cache, but was not being used except for clvmd activation. There it was used to avoid reading from disk while devices were suspended, i.e. in resume. This removes the code that attempted to make this look like a generic metadata cache, and replaces with with something narrowly targetted to what it's actually used for. This is a way of passing the VG from suspend to resume in clvmd. Since in the case of clvmd one caller can't simply pass the same VG to both suspend and resume, suspend needs to stash the VG somewhere that resume can grab it from. (resume doesn't want to read it from disk since devices are suspended.) The lvmcache vginfo struct is used as a convenient place to stash the VG to pass it from suspend to resume, even though it isn't related to the lvmcache or vginfo. These suspended_vg* vginfo fields should not be used or touched anywhere else, they are only to be used for passing the VG data from suspend to resume in clvmd. The VG data being passed between suspend and resume is never modified, and will only exist in the brief period between suspend and resume in clvmd. suspend has both old (current) and new (precommitted) copies of the VG metadata. It stashes both of these in the vginfo prior to suspending devices. When vg_commit is successful, it sets a flag in vginfo as before, signaling the transition from old to new metadata. resume grabs the VG stashed by suspend. If the vg_commit happened, it grabs the new VG, and if the vg_commit didn't happen it grabs the old VG. The VG is then used to resume LVs. This isolates clvmd-specific code and usage from the normal lvm vg_read code, making the code simpler and the behavior easier to verify. Sequence of operations: - lv_suspend() has both vg_old and vg_new and stashes a copy of each onto the vginfo: lvmcache_save_suspended_vg(vg_old); lvmcache_save_suspended_vg(vg_new); - vg_commit() happens, which causes all clvmd instances to call lvmcache_commit_metadata(vg). A flag is set in the vginfo indicating the transition from the old to new VG: vginfo->suspended_vg_committed = 1; - lv_resume() needs either vg_old or vg_new to use in resuming LVs. It doesn't want to read the VG from disk since devices are suspended, so it gets the VG stashed by lv_suspend: vg = lvmcache_get_suspended_vg(vgid); If the vg_commit did not happen, suspended_vg_committed will not be set, and in this case, lvmcache_get_suspended_vg() will return the old VG instead of the new VG, and it will resume LVs based on the old metadata.	2018-04-20 11:22:45 -05:00
David Teigland	79c4971210	label_scan: remove extra label scan and read for orphan PVs When process_each_pv() calls vg_read() on the orphan VG, the internal implementation was doing an unnecessary lvmcache_label_scan() and two unnecessary label_read() calls on each orphan. Some of those unnecessary label scans/reads would sometimes be skipped due to caching, but the code was always doing at least one unnecessary read on each orphan. The common format_text case was also unecessarily calling into the format-specific pv_read() function which actually did nothing. By analyzing each case in which vg_read() was being called on the orphan VG, we can say that all of the label scans/reads in vg_read_orphans are unnecessary: 1. reporting commands: the information saved in lvmcache by the original label scan can be reported. There is no advantage to repeating the label scan on the orphans a second time before reporting it. 2. pvcreate/vgcreate/vgextend: these all share a common implementation in pvcreate_each_device(). That function already rescans labels after acquiring the orphan VG lock, which ensures that the command is using valid lvmcache information.	2018-04-20 11:22:45 -05:00
David Teigland	748f29b42a	scan: do scanning at the start of a command Move the location of scans to make it clearer and avoid unnecessary repeated scanning. There should be one scan at the start of a command which is then used through the rest of command processing. Previously, the initial label scan was called as a side effect from various utility functions. This would lead to it being called unnecessarily. It is an expensive operation, and should only be called when necessary. Also, this is a primary step in the function of the command, and as such it should be called prominently at the top level of command processing, not as a hidden side effect of a utility function. lvm knows exactly where and when the label scan needs to be done. Because of this, move the label scan calls from the internal functions to the top level of processing. Other specific instances of lvmcache_label_scan() are still called unnecessarily or unclearly by specific commands that do not use the common process_each functions. These will be improved in future commits. During the processing phase, rescanning labels for devices in a VG needs to be done after the VG lock is acquired in case things have changed since the initial label scan. This was being done by way of rescanning devices that had the INVALID flag set in lvmcache. This usually approximated the right set of devices, but it was not exact, and obfuscated the real requirement. Correct this by using a new function that rescans the devices in the VG: lvmcache_label_rescan_vg(). Apart from being inexact, the rescanning was extremely well hidden. _vg_read() would call ->create_instance(), _text_create_text_instance(), _create_vg_text_instance() which would call lvmcache_label_scan() which would call _scan_invalid() which repeats the label scan on devices flagged INVALID. lvmcache_label_rescan_vg() is now called prominently by _vg_read() directly.	2018-04-20 11:21:38 -05:00
David Teigland	a7cb76ae94	scan: use bcache for label scan and vg read New label_scan function populates bcache for each device on the system. The two read paths are updated to get data from bcache. The bcache is not yet used for writing. bcache blocks for a device are invalidated when the device is written.	2018-04-20 11:19:24 -05:00
Joe Thornber	00f1b208a1	[io paths] Unpick agk's aio stuff	2018-04-20 11:03:58 -05:00
Zdenek Kabelac	285413b502	cleanup: missing dots and indent	2018-03-15 11:01:04 +01:00
Zdenek Kabelac	d794444715	activation: check for prioritized_section Detect we are in prioritezed section instead of critical one, since these operation were supposed to NOT be happining during whole set of operation. This patch fixes verification of udev operations.	2018-03-15 11:01:04 +01:00
Alasdair G Kergon	9194610f42	device: Add ioflags parameter to transfer additional state. Flags are set on the initial I/O and passed to any callbacks that may in turn issue further I/O using the inherited flags.	2018-01-21 21:10:23 +00:00
Alasdair G Kergon	b96862ee11	metadata: Consistently skip metadata areas that failed. Even after writing some metadata encountered problems, some commands continue (rightly or wrongly) and attempt to make further changes. Once an mda is marked MDA_FAILED, don't try to use it again. This also applies when reverting, where one loop already skips failed mdas but the other doesn't. This fixes some device open_count warnings on relevant failure paths.	2017-12-12 17:52:45 +00:00
Alasdair G Kergon	e4805e4883	device: categorise block i/o Introduce enum dev_io_reason to categorise block device I/O in debug messages so it's obvious what it is for. DEV_IO_SIGNATURES /* Scanning device signatures / DEV_IO_LABEL / LVM PV disk label / DEV_IO_MDA_HEADER / Text format metadata area header / DEV_IO_MDA_CONTENT / Text format metadata area content / DEV_IO_FMT1 / Original LVM1 metadata format / DEV_IO_POOL / Pool metadata format / DEV_IO_LV / Content written to an LV / DEV_IO_LOG / Logging messages */	2017-12-04 23:45:26 +00:00
Alasdair G Kergon	b5f62a143d	metadata: Eliminate redundant nested VG metadata Only lv_committed() now uses vg->vg_committed and it appears redundant if its contents match the enclosing VG so don't waste cycles creating it when that's known to be true when no write lock is held so the struct won't get modified.	2017-11-14 15:38:55 +00:00
Alasdair G Kergon	00acae12a4	metadata: Remove unused vg.cft_precommitted The precommitted metadata config_tree is now only referenced from a single function so just use a local variable instead.	2017-11-14 01:22:09 +00:00
Alasdair G Kergon	6bf0f04ae2	log: Improve various device-related messages - Use 'lvmcache' consistently instead of 'metadata cache' - Always use 5 characters for source line number - Remember to convert uuids into printable form - Use <no name> rather than (null) when VG has no name.	2017-11-13 19:45:33 +00:00
Zdenek Kabelac	3076a839a5	cleanup: drop unneeded headerfiles Coverity reported these are no longer in use.	2017-11-07 21:26:11 +01:00
Alasdair G Kergon	84aca4201e	vgsplit: Fix detection of moved PVs. vgsplit shares the vg_rename code so that must only set the PV_MOVED_VG flag introduced in commit `486ed10848` ("vgmerge: Fix intermediate metadata corruption") on PVs that moved.	2017-10-27 22:53:43 +01:00
Alasdair G Kergon	f3ae99dcc0	liblvm: Move lib code used exclusively into metadata-liblvm.c Also remove some redundant function definitions from metadata.h.	2017-10-18 19:29:32 +01:00
Alasdair G Kergon	f1cc5b12fd	tidy: Add missing underscores to statics.	2017-10-18 15:58:13 +01:00
Alasdair G Kergon	146745ad88	device: Separate errors for dev not found and filtered. Replaced the confusing device error message "not found (or ignored by filtering)" by either "not found" or "excluded by a filter". (Later we should be able to say which filter.) Left the the liblvm code paths alone.	2017-10-17 02:12:41 +01:00
David Teigland	6ac1e04b3a	replicator: remove the code It has not been used in a long time and is not expected to be used further.	2017-10-13 16:20:42 -05:00
Alasdair G Kergon	486ed10848	vgmerge: Fix intermediate metadata corruption vgmerge suffers from a similar problem to the one fixed in commit `8146548d25` ("vgsplit: Fix intermediate metadata corruption.") When merging, splitting or renaming VGs, use a new PV status flag PV_MOVED_VG to mark the PVs that hold metadata with the old VG name and use this to provide PV-level granularity instead of incorrectly assuming all PVs in the VG are the same.	2017-10-06 02:20:45 +01:00
Zdenek Kabelac	48ce8c7a49	tidy: drop unneeded cast Avoid casting to the same type.	2017-07-20 11:20:44 +02:00
Zdenek Kabelac	4a2994b7b1	tidy: name all parameters	2017-07-20 11:20:26 +02:00
Zdenek Kabelac	0bf836aa14	tidy: prefer not using else after return clang-tidy: avoid using 'else' after return - give more readable code, and also saves indention level.	2017-07-20 11:18:29 +02:00
Zdenek Kabelac	c440bb0742	debug: check for fail in id validation	2017-06-27 00:27:36 +02:00
Zdenek Kabelac	1bd4b0059b	cleanup: use display_percent Replace occurence of %.2f with call of display_percent function.	2017-06-24 17:44:42 +02:00
David Teigland	c98a25aab1	print warning about in-use orphans Warn about a PV that has the in-use flag set, but appears in the orphan VG (no VG was found referencing it.) There are a number of conditions that could lead to this: . The PV was created with no mdas and is used in a VG with other PVs (with metadata) that have not yet appeared on the system. So, no VG metadata is found by lvm which references the in-use PV with no mdas. . vgremove could have failed after clearing mdas but before clearing the in-use flag. In this case, the in-use flag needs to be manually cleared on the PV. . The PV may have damanged/unrecognized VG metadata that lvm could not read. . The PV may have no mdas, and the PVs with the metadata may have damaged/unrecognized metadata.	2017-06-01 11:18:42 -05:00
David Teigland	f3c90e90f8	disable repairing in-use flag on orphan PVs A PV holding VG metadata that lvm can't understand (e.g. damaged, checksum error, unrecognized flag) will appear as an in-use orphan, and will be cleared by this repair code. Disable this repair until the code can keep track of these problematic PVs, and distinguish them from actual in-use orphans.	2017-06-01 09:53:14 -05:00
David Teigland	7a0f46e2f8	add comment about PV in-use repair copied from commit message for `d97f1c89de`	2017-05-23 16:59:46 -05:00
Alasdair G Kergon	fbe7464df5	metadata: Unlock VG on more _vg_make_handle error paths Internal error: VG lock vg0 must be requested before vg3, not after. Internal error: 3 device(s) were left open and have been closed.	2017-05-23 01:38:02 +01:00
Alasdair G Kergon	80900dcf76	metadata: Fix metadata repair when devs still missing. _check_reappeared_pv() incorrectly clears the MISSING_PV flags of PVs with unknown devices. While one caller avoids passing such PVs into the function, the other doesn't. Move the check inside the function so it's not forgotten. Without this patch, if the normal VG reading code tries to repair inconsistent metadata while there is an unknown PV, it incorrectly considers the missing PVs no longer to be missing and produces incorrect 'pvs' output omitting the missing PV, for example. Easy reproducer: Create a VG with 3 PVs pv1, pv2, pv3. Hide pv2. Run vgreduce --removemissing. Reinstate the hidden PV pv2 and at the same time hide a different PV pv3. Run 'pvs' - incorrect output. Run 'pvs' again - correct output. See https://bugzilla.redhat.com/1434054	2017-05-11 02:17:34 +01:00
David Teigland	d45531712d	vg_read: check for NULL dev to avoid segfault There are certain situations (not fully understood) where is_missing_pv() is false, but pv->dev is NULL, so this adds a check for NULL pv->dev after is_missing_pv() to avoid a segfault.	2017-05-10 10:45:41 -05:00
David Teigland	19267fa6aa	lvmlockd: test mode doesn't work The --test option is not yet compatible with shared VGs because changes are made in lvmlockd that cannot be reversed or faked.	2017-02-13 08:20:10 -06:00
Zdenek Kabelac	a3579aafc5	cleanup: use matching signed number comparation	2017-02-13 10:06:19 +01:00
Zdenek Kabelac	b6301aa977	cleanup: use fall through gcc gets 'selective' on having commented fall through case.	2017-02-13 10:06:18 +01:00
Zdenek Kabelac	377288fe03	cleanup: reuse existing code	2017-01-03 14:55:16 +01:00
Zdenek Kabelac	c5aeb21015	cleanup: zero baton in struct initilizer	2016-12-09 15:15:02 +01:00
Zdenek Kabelac	1a4f13eb6e	cleanup: add some dots and use display_lvname Just some more VG/LV printing.	2016-11-25 15:01:27 +01:00
Zdenek Kabelac	1d58074d9f	debug: more stacktrace corrections Continue previous patch dropping some unneeded stack traces after printed log_error/warn messages.	2016-11-25 14:58:28 +01:00
Peter Rajnoha	070c0d31ab	metadata: fix automatic updates of PV extension headers to newest version Before, the automatic update from older to newer version of PV extension header happened within vg_write call. This may have caused problems under some circumnstances where there's a code in between vg_write and vg_commit which may have failed. In such situation, we reverted precommitted metadata and put back the state to working version of VG metadata. However, we don't have revert for PV write operation at the moment. So if we updated PV headers already and we reverted vg_write due to failure in subsequent code (before vg_commit), we ended up with lost VG metadata (because old metadata pointers got reset by the PV write operation). To minimize problematic situations here, we should put vg_write and vg_commit that is done after PV header rewrites as close to each other as possible. This patch moves the automatic PV header rewrite for new extension header part from vg_write to _vg_read where it's done the same way as we do any other VG repairs if detected during VG read operation (under VG write lock).	2016-07-26 16:22:55 +02:00
Zdenek Kabelac	4e1bf7acd3	coverity: add some tests for function results Even though they cannot normally happen...	2016-07-13 21:52:14 +02:00
David Teigland	ff3c4ed1c0	lvmetad: two phase vg_remove Apply the same idea as vg_update. Before doing the VG remove on disk, invalidate the VG in lvmetad. After the VG is removed, remove the VG in lvmetad. If the command fails after removing the VG on disk, but before removing the VG metadata from lvmetad, then a subsequent command will see the INVALID flag and not use the stale metadata from lvmetad.	2016-06-28 02:30:36 +01:00
David Teigland	a7c45ddc59	lvmetad: two phase vg_update Previously, a command sent lvmetad new VG metadata in vg_commit(). In vg_commit(), devices are suspended, so any memory allocation done by the command while sending to lvmetad, or by lvmetad while updating its cache could deadlock if memory reclaim was triggered. Now lvmetad is updated in unlock_vg(), after devices are resumed. The new method for updating VG metadata in lvmetad is in two phases: 1. In vg_write(), before devices are suspended, the command sends lvmetad a short message ("set_vg_info") telling it what the new VG seqno will be. lvmetad sees that the seqno is newer than the seqno of its cached VG, so it sets the INVALID flag for the cached VG. If sending the message to lvmetad fails, the command fails before the metadata is committed and the change is not made. If sending the message succeeds, vg_commit() is called. 2. In unlock_vg(), after devices are resumed, the command sends lvmetad the standard vg_update message with the new metadata. lvmetad sees that the seqno in the new metadata matches the seqno it saved from set_vg_info, and knows it has the latest copy, so it clears the INVALID flag for the cached VG. If a command fails between 1 and 2 (after committing the VG on disk, but before sending lvmetad the new metadata), the cached VG retains the INVALID flag in lvmetad. A subsequent command will read the cached VG from lvmetad, see the INVALID flag, ignore the cached copy, read the VG from disk instead, update the lvmetad copy with the latest copy from disk, (this clears the INVALID flag in lvmetad), and use the correct VG metadata for the command. (This INVALID mechanism already existed for use by lvmlockd.)	2016-06-28 02:30:31 +01:00
David Teigland	cc3e7c7c31	lvmetad: remove unused code for other format types lvmetad is no longer used at all with the lvm1 format, so the text format is the only one that uses lvmetad.	2016-06-28 02:30:25 +01:00
David Teigland	ebd2758dab	vgimportclone: add native command This is cleaner and more efficient than the script. The args and usage are unchanged.	2016-06-22 13:13:10 -05:00
David Teigland	01156de6f7	lvmcache: add optional dev arg to lvmcache_info_from_pvid A number of places are working on a specific dev when they call lvmcache_info_from_pvid() to look up an info struct based on a pvid. In those cases, pass the dev being used to lvmcache_info_from_pvid(). When a dev is specified, lvmcache_info_from_pvid() will verify that the cached info it's using matches the dev being processed before returning the info. Calling code will not mistakenly get info for the wrong dev when duplicate devs exist. This confusion was happening when scanning labels when duplicate devs existed. label_read for the first dev would add an info struct to lvmcache for that dev/pvid. label_read for the second dev would see the pvid in lvmcache from first dev, and mistakenly conclude that the label_read from the second dev can be skipped because it's already been done. By verifying that the dev for the cached pvid matches the dev being read, this mismatch is avoided and the label is actually read from the second duplicate.	2016-06-07 15:15:47 -05:00
David Teigland	5dc2ed0c71	vgreduce: use process_each_vg	2016-05-25 16:41:59 -05:00
David Teigland	9b640c3684	pvscan: use process_each_vg for autoactivate This refactors the code for autoactivation. Previously, as each PV was found, it would be sent to lvmetad, and the VG would be autoactivated using a non-standard VG processing function (the "activation_handler") called via a function pointer from within the lvmetad notification path. Now, any scanning that the command needs to do (scanning only the named device args, or scanning all devices when there are no args), is done first, before any activation is attempted. During the scans, the VG names are saved. After scanning is complete, process_each_vg is used to do autoactivation of the saved VG names. This makes pvscan activation much more similar to activation done with vgchange or lvchange. The separate autoactivate phase also means that if lvmetad is disabled (either before or during the scan), the command can continue with the activation step by simply not using lvmetad and reverting to disk scanning to do the activation.	2016-05-23 11:57:32 -05:00
David Teigland	e2d823eced	metadata: move warning message about repairing VG Move the message to just before the repair is going to happen to avoid printing the message in cases where repair is skipped.	2016-05-06 09:00:00 -05:00
David Teigland	8b7a78c728	lvmcache: improve duplicate PV handling Wait to compare and choose alternate duplicate devices until after all devices are scanned. During scanning, the first duplicate dev is kept in lvmcache, and others are kept in a new list (_found_duplicate_devs). After all devices are scanned, compare all the duplicates available for a given PVID and decide which is best. If the dev used in lvmcache is changed, drop the old dev from lvmcache entirely and rescan the replacement dev. Previously the VG metadata from the old dev was kept in lvmcache and only the dev was replaced. A new config setting devices/allow_changes_with_duplicate_pvs can be set to 0 which disallows modifying a VG or activating LVs in it when the VG contains PVs with duplicate devices. Set to 1 is the old behavior which allowed the VG to be changed. The logic for which of two devs is preferred has changed. The primary goal is to choose a device that is currently in use if the other isn't, e.g. by an active LV. . prefer dev with fs mounted if the other doesn't, else . prefer dev that is dm if the other isn't, else . prefer dev in subsystem if the other isn't If neither device is preferred by these rules, then don't change devices in lvmcache, leaving the one that was found first. The previous logic for preferring a device was: . prefer dev in subsystem if the other isn't, else . prefer dev without holders if the other has holders, else . prefer dev that is dm if the other isn't	2016-05-06 09:00:00 -05:00
Zdenek Kabelac	ed9162cd88	cleanup: enhance warning message Add WARNING: for log_warn. Show device name which is marked missing.	2016-05-05 23:55:18 +02:00
David Teigland	3c53acb378	metadata: fix segfault when filters reject devices Checking for devices uses is_missing_pv() to check if there is a device for the PV. is_missing_pv() is based on the MISSING_PV flag, which does not always correspond to !pv->dev. When using lvmetad, a command like: pvs --config 'devices/filter=["a\|/dev/sdb\|", "r\|.*\|"]' will cause a number of PVs to have NULL pv->dev, but not the MISSING_PV flag. So, NULL pv->dev needs to also be checked.	2016-04-27 12:13:26 -05:00
Peter Rajnoha	379874a2d0	cleanup: do not mention segment in warning message if device not found for a PV when checking used/assumed devs for an LV [0] fedora/~ # pvs --config 'devices/filter=["a\|/dev/sda\|", "r\|.\|"]' WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices. WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices. PV VG Fmt Attr PSize PFree /dev/sda lvm2 --- 128.00m 128.00m [unknown] fedora lvm2 a-m 19.49g 0 Probably not worth mentioning "segments" here, just state that devices for an LV can't be all found during the check - it's less mysterious for user then: [0] fedora/~ # pvs --config 'devices/filter=["a\|/dev/sda\|", "r\|.\|"]' WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Couldn't find all devices for LV fedora/root while checking used and assumed devices. WARNING: Couldn't find all devices for LV fedora/swap while checking used and assumed devices. PV VG Fmt Attr PSize PFree /dev/sda lvm2 --- 128.00m 128.00m [unknown] fedora lvm2 a-m 19.49g 0	2016-04-25 11:44:24 +02:00
Peter Rajnoha	9d976c0002	metadata: log warning instead of error if device not found while checking used and assumed devs When checking assumed PVs against real devices used for LVs and if there's no device assigned for an assumed PV (e.g. due to filters), do log_warn instead of log_error and continue checking LV segments and associated assumed PVs further, just like we do log_warn elsewhere in this situation. This way user will see the warning for each LV which couldn't be checked completely against real PVs used. Before, we logged only the very first occurence of missing device for an LV in a VG and we returned from the function doing this check for all the LVs in VG immediately which may be a bit misleading because it didn't tell user about all the other LVs and whether they could be checked or not. For example, we have this setup: [0] fedora/~ # pvs PV VG Fmt Attr PSize PFree /dev/sda lvm2 --- 128.00m 128.00m /dev/vda2 fedora lvm2 a-- 19.49g 0 [0] fedora/~ # lvs -o+devices LV VG Attr LSize Devices root fedora -wi-ao---- 19.00g /dev/vda2(0) swap fedora -wi-ao---- 500.00m /dev/vda2(4864) Before this patch (only the very first LV in a VG is logged to have a problem while checking used and assumed devices): [0] fedora/~ # pvs --config 'devices/filter=["a\|/dev/sda\|", "r\|.\|"]' WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. Couldn't find device for segment belonging to fedora/root while checking used and assumed devices. PV VG Fmt Attr PSize PFree /dev/sda lvm2 --- 128.00m 128.00m [unknown] fedora lvm2 a-m 19.49g 0 With this patch applied (all LVs where we hit problem while checking used and assumed devices are logged and it's warning, not error): [0] fedora/~ # pvs --config 'devices/filter=["a\|/dev/sda\|", "r\|.\|"]' WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter. WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices. WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices. PV VG Fmt Attr PSize PFree /dev/sda lvm2 --- 128.00m 128.00m [unknown] fedora lvm2 a-m 19.49g 0	2016-04-25 11:27:28 +02:00
Zdenek Kabelac	8c4b717f4d	coverity: drop abadoing object As mempool is destroyed on by caller don't bother for mempool freeing here.	2016-04-22 01:13:35 +02:00
David Teigland	5e9e43074a	lvmetad: rework command connection setup and checking The lvmetad connection is created within the init_connections() path during command startup, rather than via the old lvmetad_active() check. The old lvmetad_active() checks are replaced with lvmetad_used() which is a simple check that tests if the command is using/connected to lvmetad. The old lvmetad_set_active(cmd, 0) calls, which stopped the command from using lvmetad (to revert to disk scanning), are replaced with lvmetad_make_unused(cmd).	2016-04-19 14:00:02 -05:00
David Teigland	a6a32a7c0e	metadata: don't repair shared VGs When the in-use flag looks like it needs to be repaired.	2016-04-19 09:19:32 -05:00
Peter Rajnoha	94f78e0183	coverity: fix some issues reported by coverity for recent code	2016-03-22 16:03:55 +01:00
Peter Rajnoha	f231bdb20b	metadata: use own mem pool to report PV device mismatch in VG	2016-03-21 14:39:11 +01:00
Peter Rajnoha	03b0a78640	dev: detect mismatch between devices used and devices assumed for an LV It's possible for an LVM LV to use a device during activation which then differs from device which LVM assumes based on metadata later on. For example, such device mismatch can occur if LVM doesn't have complete view of devices during activation or if filters are misbehaving or they're incorrectly set during activation. This patch adds code that can detect this mismatch by creating VG UUID and LV UUID index while scanning devices for device cache. The VG UUID index maps VG UUID to a device list. Each device in the list has a device layered above as a holder which is an LVM LV device and for which we know the VG UUID (and similarly for LV UUID index). We can acquire VG and LV UUID by reading /sys/block/<dm_dev_name>/dm/uuid. So these indices represent the actual state of PV device use in the system by LVs and then we compare that to what LVM assumes based on metadata. For example: [0] fedora/~ # lsblk /dev/sdq /dev/sdr /dev/sds /dev/sdt NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdq 65:0 0 104M 0 disk \|-vg-lvol0 253:2 0 200M 0 lvm `-mpath_dev1 253:3 0 104M 0 mpath sdr 65:16 0 104M 0 disk `-mpath_dev1 253:3 0 104M 0 mpath sds 65:32 0 104M 0 disk \|-vg-lvol0 253:2 0 200M 0 lvm `-mpath_dev2 253:4 0 104M 0 mpath sdt 65:48 0 104M 0 disk `-mpath_dev2 253:4 0 104M 0 mpath In this case the vg-lvol0 is mapped onto sdq and sds becauset this is what was available and seen during activation. Then later on, sdr and sdt appeared and mpath devices were created out of sdq+sdr (mpath_dev1) and sds+sdt (mpath_dev2). Now, LVM assumes (correctly) that mpath_dev1 and mpath_dev2 are the PVs that should be used, not the mpath components (sdq/sdr, sds/sdt). [0] fedora/~ # pvs Found duplicate PV xSUix1GJ2SK82ACFuKzFLAQi8xMfFxnO: using /dev/mapper/mpath_dev1 not /dev/sdq Using duplicate PV /dev/mapper/mpath_dev1 from subsystem DM, replacing /dev/sdq Found duplicate PV MvHyMVabtSqr33AbkUrobq1LjP8oiTRm: using /dev/mapper/mpath_dev2 not /dev/sds Using duplicate PV /dev/mapper/mpath_dev2 from subsystem DM, ignoring /dev/sds WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdq, /dev/sds instead of /dev/mapper/mpath_dev1, /dev/mapper/mpath_dev2. PV VG Fmt Attr PSize PFree /dev/mapper/mpath_dev1 vg lvm2 a-- 100.00m 0 /dev/mapper/mpath_dev2 vg lvm2 a-- 100.00m 0	2016-03-21 11:40:40 +01:00
Peter Rajnoha	9918d95490	metadata: do not issue warning message about PV dev size being 0 when the device has gone just after VG read There's a window between doing VG read and checking PV device size against real device size. If the device is removed in this window, the dev cache still holds struct device and pv->dev still references that and that PV is not marked as missing. However, if we're trying to get size for such device, the open fails because that device doesn't exists anymore. We called existing pv_dev_size in _check_pv_dev_sizes fn. But pv_dev_size assigned a size of 0 if the dev_get_size it called failed (because the device is gone). So call the dev_get_size directly and check for the return code in _check_pv_dev_sizes and go further only if we really know the device size. This is to avoid confusing warning messages like: Device /dev/sdd1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes.	2016-03-10 13:11:15 +01:00
David Teigland	2d5dc6512e	dbus: add notification from commands When a command modifies a PV or VG, or changes the activation state of an LV, it will send a dbus notification when the command is finished. This can be enabled/disabled with a config setting.	2016-03-07 10:06:09 -06:00
Peter Rajnoha	8a601454e1	metadata: automatically remove invalid (dangling) historical LVs Historical LV is valid as long as there is at least one live LV among its ancestors. If we find any invalid (dangling) historical LVs, remove them automatically.	2016-03-03 13:50:59 +01:00
Peter Rajnoha	1297b0c8be	metadata: also validate historical LVs in VG in vg_validate and check_lv_segments	2016-03-03 13:50:59 +01:00
Peter Rajnoha	fc628e92ba	metadata: also look at historical LVs when checking LV name availability Live LVs and historical LVs are in one namespace and the name needs to be unique in whole VG.	2016-03-03 13:50:59 +01:00
Peter Rajnoha	ff6e124a33	conf: add metadata/lvs_history_timeout configuration setting	2016-03-03 13:50:59 +01:00
Peter Rajnoha	74272e163d	metadata: add vg_strip_outdated_historical_lvs fn and call it during VG read The vg_strip_outdated_historical_lvs iterates over the list of historical LVs we have and it shoots down the ones which are outdated. Configuration hook to set the timeout will be in subsequent patch.	2016-03-03 13:50:59 +01:00
Peter Rajnoha	f833a6d074	metadata: add historical_glv_remove	2016-03-03 13:50:57 +01:00
Peter Rajnoha	c45af2df4e	metadata: add find_historical_glv fn The find_historical_glv is helper function that looks up historical LV in struct volume_group's historical_lvs list and returns it if found.	2016-03-03 13:46:39 +01:00
Peter Rajnoha	790b2e8748	metadata: create historical LVs when LVs are removed and interconnect with live LVs When an LV is being removed, we create an instance of "struct historical_logical_volume" wrapped up in "struct generic_logical_volume". All instances of "struct historical_logical_volume" are then recorded in "historical_lvs" list which is part of "struct volume_group". The "historical LV" is then interconnected with "live LVs" to connect a history chain for the live LV.	2016-03-03 11:26:51 +01:00
Zdenek Kabelac	e04a0184cb	cleanup: use lv_is_partial Check for PARTIAL_LV flag in standard way.	2016-03-03 10:17:03 +01:00
David Teigland	172bad0d56	Use a common message for a used PV Change some inconsistent messages and adopt the new wording "PV %s is used by" in place of "PV %s is marked as belonging to" or "PV %s belongs to".	2016-02-25 14:23:41 -06:00
David Teigland	a77ded3001	replace pvcreate_params with pvcreate_each_params "pvcreate_each_params" was a temporary name used to transition from the old "pvcreate_params". Remove the old pvcreate_params struct and rename the new pvcreate_each_params struct to pvcreate_params. Rename various pvcreate_each_params terms to simply pvcreate_params.	2016-02-25 09:14:10 -06:00
David Teigland	4de6caf5b5	redefine pvcreate structs New pv_create_args struct contains all the specific parameters for creating a PV, independent of the command.	2016-02-25 09:14:10 -06:00
David Teigland	c201ee09bd	metadata: add fixme about code used only by liblvm	2016-02-25 09:14:10 -06:00
David Teigland	a9940bd3c9	vgcreate: use the common toollib pv create Use the new pvcreate_each_device() function from toollib, previously added for pvcreate, in place of the old pvcreate_vol(). This also requires shifting the location where the lock is acquired for the new VG name. The lock for the new VG is supposed to be acquired before pvcreate. This means splitting the vg_lock_newname() out of vg_create(), and calling vg_lock_newname() directly before pvcreate, and then calling the remainder of vg_create() after pvcreate. The new function vg_lock_and_create() now does vg_lock_newname() + vg_create(), like the previous version of vg_create(). The lock on the new VG name is released before the pvcreate and reacquired after the pvcreate because pvcreate needs to reset lvmcache, which doesn't work when locks are held. An exception could likely be made for the new VG name lock, which would allow vgcreate to hold the new VG name lock across the pvcreate step.	2016-02-25 09:14:09 -06:00
David Teigland	71671778ab	toollib: add two phase pv processing code This is common code for handling PV create/remove that can be shared by pvcreate/vgcreate/vgextend/pvremove. This does not change any commands to use the new code. - Pull out the hidden equivalent of process_each_pv into an actual top level process_each_pv. - Pull the prompts to the top level, and do not run any prompts while locks are held. The orphan lock is reacquired after any prompts are done, and the devices being created are checked for any change made while the lock was not held. Previously, pvcreate_vol() was the shared function for creating a PV for pvcreate, vgcreate, vgextend. Now, it will be toollib function pvcreate_each_device(). pvcreate_vol() was called effectively as a helper, from within vgcreate and vgextend code paths. pvcreate_each_device() will be called at the same level as other process_each functions. One of the main problems with pvcreate_vol() is that it included a hidden equivalent of process_each_pv for each device being created: pvcreate_vol() -> _pvcreate_check() -> find_pv_by_name() -> get_pvs() -> get_pvs_internal() -> _get_pvs() -> get_vgids() -> /* equivalent to process_each_pv */ dm_list_iterate_items(vgids) vg = vg_read_internal() dm_list_iterate_items(&vg->pvs) pvcreate_each_device() reorganizes the code so that each-VG-each-PV loop is done once, and uses the standard process_each_pv function at the top level of the function.	2016-02-25 09:14:09 -06:00
David Teigland	5dd615c41e	metadata: use pv_write_list for _check_old_pv_ext_for_vg The _check_old_pv_ext_for_vg() function only needs to do pv_write(), so it can use the simpler pv_list structs on the pv_write_list.	2016-02-25 09:14:09 -06:00
David Teigland	bafbc72c8c	metadata: refactor part of add_pv_to_vg This shifts the use of the 'pv_to_write' struct and the 'pvcreate_params' struct to the one caller of add_pv_to_vg, which is made static.	2016-02-25 09:14:09 -06:00
David Teigland	5e5ad77f5f	vg_write: add list of pvs to write The vg->pv_write_list contains pv_list structs for which vg_write() should call pv_write(). The new list will replace vg->pvs_to_write that contains vg_to_create structs which are used to perform higher-level pvcreate-related operations. The higher level pvcreate operations will be moved out of vg_write() to higher levels.	2016-02-25 09:14:09 -06:00
Zdenek Kabelac	dbc71dc05e	gcc: cleanup some sign warnings When comparing unsigned with int, the comparision is made as 'unsigned' type, so make it rather explicit which type is being compared.	2016-02-23 12:25:25 +01:00
Peter Rajnoha	ecfa465366	metadata: ask for confirmation before really initializing/removing PV that is marked as belonging to a VG Ask for confirmation when using pvcreate/pvremove on a PV which is marked as belonging to a VG, just like we do in case of a PV which belongs to known VG: $ pvcreate -ff /dev/sda Really INITIALIZE physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n /dev/sda: physical volume not initialized $ pvremove -ff /dev/sda Really WIPE LABELS from physical volume "/dev/sda" that is marked as belonging to a VG [y/n]? n /dev/sda: physical volume label not removed	2016-02-18 14:33:54 +01:00
Peter Rajnoha	065526c590	metadata: add missing _repair_inconsinstent_vg call during PV ext repair	2016-02-17 10:19:55 +01:00
Peter Rajnoha	b077e7374f	metadata: do not repair missing PV_EXT_USED flag for PVs belonging to foreign VG The host that owns foreign VGs is responsible for fixing up PV_EXT_USED flag - the same already applies to repairing any inconsistent VG. This patch also moves the iteration over vg->pvs inside _check_or_repair_pv_ext fn - it's cleaner this way.	2016-02-17 10:19:24 +01:00
Peter Rajnoha	13f3e92632	refactor: add common _is_foreign_vg fn	2016-02-16 13:44:48 +01:00
Peter Rajnoha	2f00d57e6f	vg: automatically update to newest PV ext version during vg_write	2016-02-15 12:44:46 +01:00
Peter Rajnoha	531ced90dc	metadata: _vg_read: check if PV_EXT_USED flag is set correctly for non-orphan PVs and do a repair if needed The same check as we already do for orphan PVs, just the other way round now: if the PV is surely part of some VG and any PV the VG contains does not have the PV_EXT_USED flag set, repair it. For example - /dev/sda here is in VG vg and it's incorrectly not marked as used by PV_EXT_USED flag: pvs --binary -o pv_ext_vsn,pv_in_use WARNING: Volume Group vg is not consistent. WARNING: Repairing Physical Volume /dev/sda that is in Volume Group vg but not marked as used. PV VG Fmt Attr PSize PFree ExtVsn PInUse /dev/sda vg lvm2 a-- 124.00m 124.00m 2 1	2016-02-15 12:44:46 +01:00
Peter Rajnoha	e0b1415105	metadata: check for PV extension version before doing any checks on PV extension flags PV header extension versions: 0 - the original PV without any extensions 1 - bootloader area support added 2 - PV_EXT_USED flag support added So do the associated checks related to PV_EXT_USED flag only if PV header extension found is of version 2 and higher.	2016-02-15 12:44:46 +01:00
Peter Rajnoha	d97f1c89de	metadata: _vg_read: check if PV_EXT_USED flag is set correctly for orphan PVs and do a repair if needed If we know that the PV is orphan, meaning there's at least one MDA on that PV which does not reference any VG and at the same time there's PV_EXT_USED flag set, we're certainly in an inconsistent state and we need to fix this. For example, such situation can happen during vgremove/vgreduce if we removed/reduced the VG, but we haven't written PV headers yet because vgremove stopped abruptly for whatever reason just before writing new PV headers with updated state, including PV extension flags (and so the PV_EXT_USED flag). However, in case the PV has no MDAs at all, we can't double-check whether the PV_EXT_USED is correct or not - if that PV is marked as used, it's either: - really used (but other disks with MDAs are missing) - or the error state as described above is hit User needs to overwrite the PV header directly if it's really clear the PV having no MDAs does not belong to any VG and at the same time it's still marked as being in use (pvcreate -ff <dev_name> will fix this). For example - /dev/sda here has 1 MDA, orphan and is incorrectly marked with PV_EXT_USED flag: $ pvs --binary -o+pv_in_use WARNING: Found inconsistent standalone Physical Volumes. WARNING: Repairing flag incorrectly marking Physical Volume /dev/sda as used. PV VG Fmt Attr PSize PFree InUse /dev/sda lvm2 --- 128.00m 128.00m 0	2016-02-15 12:44:46 +01:00
Peter Rajnoha	b6e3080fff	pv: _pvcreate_write: do label removal and zeroing only if creating a new PV	2016-02-15 12:44:46 +01:00
Peter Rajnoha	73f1d444c8	pv: issue different message of different type when we're overwriting existing PV header instead of creating a new one Scenario: $ pvcreate /dev/sda Physical volume "/dev/sda" successfully created We're adding the PV to a VG. Before this patch: $ vgcreate vg /dev/sda Physical volume "/dev/sda" successfully created Volume group "vg" successfully created With this path applied: $ vgcreate vg /dev/sda Volume group "vg" successfully created ...and verbose log containing: "Physical volume "/dev/sda" successfully written"	2016-02-15 12:44:46 +01:00
Peter Rajnoha	52999133a3	pv: check for the PV_EXT_USED flag and deny pvcreate/pvchange/pvremove/vgcreate on such PV (unless forced) Make sure we won't use a PV that is already marked as used. Normally, VG metadata would stop us from doing that, but we can run into a situation where such metadata is missing because PVs with MDAs are missing and the PVs left are the ones with 0 MDAs. (/dev/sda in this example has 0 MDAs and it belongs to a VG, but other PVs with MDA are missing) $ pvs -o pv_name,pv_mda_count /dev/sda PV #PMda /dev/sda 0 $ pvcreate /dev/sda PV '/dev/sda' is marked as belonging to a VG but its metadata is missing. Can't initialize PV '/dev/sda' without -ff. $ pvchange -u /dev/sda PV '/dev/sda' is marked as belonging to a VG but its metadata is missing. Can't change PV '/dev/sda' without -ff. Physical volume /dev/sda not changed 0 physical volumes changed / 1 physical volume not changed $ pvremove /dev/sda PV '/dev/sda' is marked as belonging to a VG but its metadata is missing. (If you are certain you need pvremove, then confirm by using --force twice.) $ vgcreate vg /dev/sda Physical volume '/dev/sda' is marked as belonging to a VG but its metadata is missing. Unable to add physical volume '/dev/sda' to volume group 'vg'.	2016-02-15 12:44:46 +01:00
Peter Rajnoha	10128c9bd6	metadata: schedule PV for header rewrite if adding a PV to VG or restoring VG When adding PV to VG, we need to rewrite PV header as there's a flip in PV_EXT_USED flag. The same applies if we're restoring VG from backup.	2016-02-15 12:44:46 +01:00
Peter Rajnoha	2950adc2ab	metadata: add_pv_to_vg: add 'new_pv' arg to state if the PV is about to be created	2016-02-15 12:44:46 +01:00

1 2 3 4 5 ...

918 Commits