shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2025-01-14 23:24:55 +03:00

Author	SHA1	Message	Date
Alasdair Kergon	47dfe904ab	Use 'SINGLENODE' instead of 'dead' in clvmd singlenode messages. Ignore snapshots when performing mirror recovery beneath an origin. Pass LCK_ORIGIN_ONLY flag around cluster. Add suspend_lv_origin and resume_lv_origin using LCK_ORIGIN_ONLY.	2010-08-17 19:25:05 +00:00
Alasdair Kergon	7f5b44b423	Allow internal suspend and resume of origin without its snapshots.	2010-08-17 16:25:32 +00:00
Jonathan Earl Brassow	51e294c501	Fix for bug 612291: dm devices of split off mirror images are not removed DM devices were not handled properly on nodes in a cluster that were not where the splitmirrors command was issued. This was happening because suspend_lv/resume_lv were being used in a place where activate_lv should have been used. When the suspend/resume are issued on (effectively) new LVs, their 'resource' (UUID) is not located in the lv_hash. Thus, both operations turn into no-ops. You can see this from the output of clvmd from one of the remote nodes: <snip> do_suspend_lv, lock not already held <snip> do_resume_lv, lock not already held 'activate_lv' enjoins the other nodes in the cluster to process the lock and activate the new LV. clvmd output from remote node as follows: do_lock_lv: resource 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S', cmd = 0x19 LCK_LV_ACTIVATE (READ\|LV\|NONBLOCK), flags = 0x84 (DMEVENTD_MONITOR ), memlock = 1 sync_lock: 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S' mode:1 flags=1 sync_lock: returning lkid 27b0001 Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Reviewed-by: Petr Rockai <prockai@redhat.com>	2010-08-16 18:02:14 +00:00
Mike Snitzer	2b3a4adff8	Change default alignment of pe_start to 1MB. The new standard in the storage industry is to default alignment of data areas to 1MB. fdisk, parted, and mdadm have all been updated to this default. Update LVM to align the PV's data area start (pe_start) to 1MB. This provides a more useful default than the previous default of 64K (which generally ended up being a 192K pe_start once the first metadata area was created). Before this patch: # pvs -o name,vg_mda_size,pe_start PV VMdaSize 1st PE /dev/sdd 188.00k 192.00k After this patch: # pvs -o name,vg_mda_size,pe_start PV VMdaSize 1st PE /dev/sdd 1020.00k 1.00m The heuristic for setting the default alignment for LVM data areas is: - If the default value (1MB) is a multiple of the detected alignment then just use the default. - Otherwise, use the detected value. In practice this means we'll almost always use 1MB -- that is unless: - the alignment was explicitly specified with --dataalignment - or MD's full stripe width, or the {minimum,optimal}_io_size exceeds 1MB - or the specified/detected value is not a power-of-2	2010-08-12 04:11:48 +00:00
Jonathan Earl Brassow	c63e78714a	Fix for bug 619221 - log device splitting regression An incorrect fix on July 13, 2010 for an annoyance has caused a regression. The offending check-in was part of the 2.02.71 release of LVM. That check-in caused any PVs specified on the command line to be ignored when performing a mirror split. This patch reverses the aforementioned check-in (solving the regressions) and posits a new solution to the list reversal problem. The original problem was that we would always take the lowest mimage LVs from a mirror when performing a split, but what we really want is to take the highest mimage LVs. This patch accomplishes that by working through the list in reverse order - choosing the higher numbered mimages first. (This also reduces the amount of processing necessary.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Reviewed-by: Takahiro Yasui <takahiro.yasui@hds.com>	2010-08-06 15:38:32 +00:00
Jonathan Earl Brassow	4a5195c38a	Taka's fix for handling failure of all mirrored log devices and all but one mirror leg. <patch header> To handle a double failure of a mirrored log, Jon's two patches are commited, however, lvconvert command can't still handle an error when mirror leg and mirrored log got failure at the same time. [Patch]: Handle both devices of a mirrored log failing (bug 607347) posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00009.html commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00027.html [Patch]: Handle both devices of a mirrored log failing (bug 607347) - additional fix posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00093.html commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00101.html In the second patch, the target type of mirrored log is replaced with error target when remove_log is set to 1, but this procedure should be also used in other cases such as the number of mirror leg is 1. This patch relocates the procedure to the main path. In addition, I added following three changes. - Removed tmp_orphan_lvs handling procedure It seems that _delete_lv() can handle detached_log_lv properly without adding mirror legs in mirrored log to tmp_orphan_lvs. Therefore, I removed the procedure. - Removed vg_write()/vg_commit() Metadata is saved by vg_write()/vg_commit() just after detached_log_lv is handled. Therefore, I removed vg_write()/vg_commit(). - With Jon's second patch, we think that we don't have to call remove_mirror_log() in _lv_update_mirrored_log() because will be handled remove_mirror_images() in _lvconvert_mirrors_repaire(). </patch header> Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com> Reviewed-by: Petr Rockai <prockai@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>	2010-08-02 21:07:40 +00:00
Jonathan Earl Brassow	0864378250	Disallow mirrored logs in cluster mirrors. The cluster log daemon (cmirrord) is not multi-threaded and can handle only one request at a time. When a log is stacked on top of a mirror (which itself contains a 'core' log), it creates a situation that cannot be solved without threading. When the top level mirror issues a "resume", the log daemon attempts to read from the log device to retrieve the log state. However, the log is a mirror which, before issuing the read, attempts to determine the 'sync' status of the region of the mirror which is to be read. This sync status request cannot be completed by the daemon because it is blocked on a read I/O to the very mirror requesting the sync status.	2010-08-02 19:03:45 +00:00
Dave Wysochanski	c0789ae1b4	Remove irrelevant comments relating to vg_mda_copies.	2010-07-30 16:47:27 +00:00
Jonathan Earl Brassow	56fe3c6176	It's not enough to check for the kernel module in the case of cluster mirrors, we must also check that the log daemon (cmirrord) is running. The log module can be auto-loaded, but the daemon cannot be "auto-started". Failing to check for the daemon produces cryptic messages that customers have a hard time deciphering. (The system messages do report that the log daemon is not running, but people don't seem to find this message easily.) Here are examples of what is printed when the module is available, but the log daemon has not been started. [root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg Shared cluster mirrors are not available. [root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v Setting logging type to disk Finding volume group "vg" Archiving volume group "vg" metadata (seqno 3). Creating logical volume lv Executing: /sbin/modprobe dm-log-userspace Cluster mirror log daemon is not running Shared cluster mirrors are not available. Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).	2010-07-21 13:40:21 +00:00
Jonathan Earl Brassow	8d983d6f2d	Fix for bug 614164: No check for existing name when splitting mirror The user could use the same name as an existing LV when specifying a name for an LV split off from a mirror. This causes all sorts of issues.	2010-07-13 22:24:39 +00:00
Jonathan Earl Brassow	659f47f76a	Fix for bugs: 612248 & 612291 Split mirror issues The main problem with these bugs was that the newly split off LV was not being suspended properly. This meant that the memlock count was not being balanced, the DM devices were not being renamed, and some DM devices which should have been removed were not. I've also renamed some of the variables and added comments to make things clearer as to what is going on. (I can break this patch in two if it means easier review.)	2010-07-13 21:48:16 +00:00
Jonathan Earl Brassow	ddedf42d21	Failed to test for the case where a log was requested to be removed even though there was no log. A simple run through the in-tree test suite would have caught this. :( - if (lv_is_mirrored(detached_log_lv) && + if (detached_log_lv && lv_is_mirrored(detached_log_lv) && Also, made some cosmetic changes suggested by kabi after my last check-in (e.g. s/return 0/return_0/ and adding an error message).	2010-07-09 17:57:51 +00:00
Dave Wysochanski	b3a13b17cb	Add log_error when strdup fails in {vg\|lv}_change_tag(). Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-07-09 16:57:44 +00:00
Alasdair Kergon	2d3164a59f	Use __attribute__ consistently throughout.	2010-07-09 15:34:40 +00:00
Alasdair Kergon	d8d4a1d694	Remove superfluous fn prototypes.	2010-07-09 15:21:10 +00:00
Jonathan Earl Brassow	0b18937cbe	Finish fix for bug 607347: failing both redundant mirror log legs... A previous check-in added logic to handle the case where both images of a mirrored log failed. It solved the problem by simply removing the log entirely - leaving the parent mirror with a 'core' log. This worked for most cases. However, if there was a small delay between the failures of the two mirrored log devices, the mirror would hang, LVM would hang, and no additional LVM commands could be issued. When the first leg of the log fails, it signals the need for repair. Before 'lvconvert --repair' is run by dmeventd, the second leg fails. 'lvconvert' would see both devices as failed and try to remove the log entirely. When it came time to suspend the parent mirror to update the configuration, the suspend would hang because it couldn't get any I/O through the mirrored log, which was plugged waiting for corrective action. The solution is to replace the log with an error target to clear any pending writes before removing it. This allows the parent mirror to suspend and make the proper changes.	2010-07-09 15:08:12 +00:00
Dave Wysochanski	264091a239	Pass metadataignore to pv_create, pv_setup, _mda_setup, and add_mda. Pass metadataignore through PV creation / setup paths. As a result of this cleanup, we can remove the unnecessary setting of mda_ignore bits inside pvcreate_single(), after call to pv_create. For now, just set metadataignore to '0' in some places. This is equivalent to the prior functionality, although the 0 is given by the caller not hardcoded in _mda_setup() call. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-07-08 18:24:29 +00:00
Dave Wysochanski	d324fd94df	Init mda->list in mda_copy. This patch should be no functional change as all callers initialize mda->list.	2010-07-08 17:41:46 +00:00
Dave Wysochanski	94cb2db723	Add warning to vgextend and pvchange if metadataignore given on cmdline. Warn the user then change the value of vg_mda_copies. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-07-07 18:59:45 +00:00
Alasdair Kergon	23aa1c6524	Adjust auto-metadata repair and caching logic to try to cope with empty mdas. - If a PV contained empty mdas, the auto-recovery code was not kicking in. - The 'inconsistent' state was getting lost when metadata was cached so recovery didn't kick in. But leave the behaviour alone when using precommitted metadata because of a warning in a confusing FIXME. In my testing, pvs and vgs didn't repair inconsistent metadata like they used to do. (How many other tools fail similarly now?) And there should be no need to cache inconsistent metadata because it is supposed to get repaired under the protection of a write lock immediately it is discovered. This code is in need of a redesign based on first principles. I still see bugs in this code and this commit is risky.	2010-07-07 02:53:16 +00:00
Alasdair Kergon	c1589415f0	fix code in 2nd mda unignore loop to match 1st loop	2010-07-06 20:09:38 +00:00
Alasdair Kergon	389cbc3686	s/flags/mda/	2010-07-06 17:29:50 +00:00
Alasdair Kergon	2508b4000b	shorten mesg	2010-07-06 17:27:32 +00:00
Alasdair Kergon	e5164acc9c	fix jumbled args in 'Adjusting' message	2010-07-06 17:26:08 +00:00
Alasdair Kergon	38c6e8faf6	Randomly select which mdas to use or ignore. Add some missing standard configure.in checks.	2010-07-05 22:23:15 +00:00
Alasdair Kergon	ed2630dce5	Add printf format attributes to yes_no_prompt & dm_{sn,as}printf and fix a calle	2010-07-02 21:16:50 +00:00
Alasdair Kergon	17cffe2bf9	improve vgmetadatacopies unmanaged message	2010-06-30 20:03:52 +00:00
Dave Wysochanski	d1b068a68a	Check for missing_pv in vg_remove loop. If a pv is missing, we should just skip it rather than checking the device size and failing the vgremove.	2010-06-30 19:55:43 +00:00
Alasdair Kergon	a5053f6ada	more mda ignore cleanups	2010-06-30 19:28:35 +00:00
Dave Wysochanski	261a3e7e78	Refactor vg_remove_check to place pv removal into separate function.	2010-06-30 18:03:52 +00:00
Alasdair Kergon	36e28d4b4b	more metadataignore message/code cleanup	2010-06-30 17:13:05 +00:00
Alasdair Kergon	51c4e1ea4b	revert that	2010-06-30 14:54:29 +00:00
Alasdair Kergon	c8f03c38fe	suppress useless compiler warning	2010-06-30 14:52:29 +00:00
Dave Wysochanski	f0c79d95f5	Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg.	2010-06-30 14:48:07 +00:00
Alasdair Kergon	d13b495217	Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg.	2010-06-30 14:27:40 +00:00
Alasdair Kergon	b16b4d92a7	Improve various log messages.	2010-06-30 13:51:11 +00:00
Dave Wysochanski	fdcf749779	Add --metadataignore to pvcreate. Allow metadataignore flag to be passed in to pvcreate. Ideally, more refactoring of the mda allocation / initialization is warranted, but for now, we just add another parameter to 'add_mda' to take an existing mda ignored flag. We need to do this or pv_write loses the state of the mda 'ignored' flag before copying and writing to disk.	2010-06-30 12:17:24 +00:00
Dave Wysochanski	ed38f47f23	Improve logging for setting --vgmetadatacopies. Example of logging: metadata/metadata.c:1127 Setting mda_copies = 3 on vg vgtest metadata/pv_manip.c:296 /dev/loop2 0: 0 25: NULL(0:0) metadata/pv_manip.c:296 /dev/loop3 0: 0 25: NULL(0:0) metadata/pv_manip.c:296 /dev/loop4 0: 0 25: NULL(0:0) metadata/metadata.c:1072 Adjusting ignored mdas on vg vgtest, vg_mda_used_count=5, vg_mda_copies=3 metadata/metadata.c:1015 Setting ignore flag for 2 mdas on vg vgtest metadata/metadata.c:4151 Setting mda ignored flag for metadata_locn /dev/loop2. metadata/metadata.c:4151 Setting mda ignored flag for metadata_locn /dev/loop3.	2010-06-29 22:41:28 +00:00
Dave Wysochanski	3b0585be3b	Improve logging for metadata ignore by printing device name. Print device name when setting or clearing metadata ignore bit. Example: label/label.c:160 /dev/loop2: lvm2 label detected cache/lvmcache.c:1136 lvmcache: /dev/loop2: now in VG #orphans_lvm2 (#orphans_lvm2) metadata/metadata.c:4142 Setting mda ignored flag for metadata_locn /dev/loop2. format_text/text_label.c:318 Skipping mda with ignored flag on device /dev/loop2 at offset 4096	2010-06-29 22:37:32 +00:00
Dave Wysochanski	ebe1a8080c	Add some log_verbose debug statements related to metadataignore. Logging isn't ideal, especially for mda_set_ignore. Ideally we'd like to display the device name and offset in this case but this requires a bit more work and a per-format 'mda_description' function pointer definition (we don't have access to mda_context in metadata.c).	2010-06-29 22:25:58 +00:00
Dave Wysochanski	593bb6470b	Move code into pv_change_metadataignore library function. In preparation to call this from both pvcreate as well as pvchange, move the guts of metadataignore into a library function. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-29 21:32:44 +00:00
Dave Wysochanski	996360df6b	Allow 'all' and 'unmanaged' values for --vgmetadatacopies. Allowing an 'all' and 'unmanaged' value is more intuitive, and provides a simple way for users to get back to original LVM behavior of metadata written to all PVs in the volume group. If the user requests "--vgmetadatacopies unmanaged", this instructs LVM not to manage the ignore bits to achieve a specific number of metadata copies in the volume group. The user is free to use "pvchange --metadataignore" to control the mdas on a per-PV basis. If the user requests "--vgmetadatacopies all", this instructs LVM to do 2 things: 1) clear all ignore bits, and 2) set the "unmanaged" policy going forward. Internally, we use the special MAX_UINT32 value to indicate 'all'. This 'just' works since it's the largest value possible for the field and so all 'ignore' bits on all mdas in the VG will get cleared inside _vg_metadata_balance(). However, after we've called the _vg_metadata_balance function, we check for the special 'all' value, and if set, we write the "unmanaged" value into the metadata. As such, the 'all' value is never written to disk. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:40:01 +00:00
Dave Wysochanski	e52586db1c	Update check in vg_split_mdas to account for ignored mdas list. The check in vg_split_mdas will trigger an error if the 'from' vg list is empty. However, this might be ok in some instances now that we have ignored mdas. Relax this check so an error is triggered only in the case where there's truly no more mdas in the 'from' vg. One example of where this makes a difference is with vgreduce. If we try to vgreduce a PV with un-ignored mdas, this should trigger the balancing function to un-ignore mdas on another PV in the VG. However, we don't get to vg_write() before we fail because this list size check fails, and we see an error message indicating: "Cannot remove final metadata area ..." Another example is with vgsplit into a new VG, where the PVs being moved contain all ignored mdas. We must move the mdas on fid->metadata_areas_ignored from 'vg_from' to 'vg_to'. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:38:56 +00:00
Dave Wysochanski	6cb9642200	Ensure fid mda lists are populated correctly during vgextend. The vgextend path calls add_pv_to_vg(). Inside add_pv_to_vg(), we must ensure we pass the correct mdas list into pv_setup(), as copies of mdas are placed on the vg->fid list. If we don't place the mdas on the correct vg->fid list, the various counts may be incorrect and the metadata balance algorithm will not work when called from vg_write() path. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:38:39 +00:00
Dave Wysochanski	074c1880ff	Implement _vg_adjust_ignored_mdas and call from vg_write() path. Compare the value of the newly added vg_mda_copies field (--vgmetadatacopies parameter) with the current count of in-use mdas and ignoring or unignoring mdas as necessary to get to the target count. Also, as a safety check before returning, ensure we have at least one mda enabled. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:37:54 +00:00
Dave Wysochanski	12910a5a29	Add vg get/set methods for VG metadata copies. This patch adds the get and partially implemented set function. The 'set' function should probably ignore or un-ignore metadata areas based on new values. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:36:56 +00:00
Dave Wysochanski	eb93c134a7	Add mda_copies to VG structures and initialization. Add a field to struct volume_group to later implement metadata balancing: - mda_copies: target # of non-ignored mdas in the VG; default 0 (do not control pv 'ignore mdas' bit. This patch just adds the parameter to the structures with the default values but does not modify any commands. Should be no functional change. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:36:37 +00:00
Dave Wysochanski	43526902ae	Before committing each mda, arrange mdas so ignored mdas get committed first. Arrange mdas so mdas that are to be ignored come first. This is an optimization that ensures consistency on disk for the longest period of time. This was noted by agk in review of the v4 patchset of pvchange-based mda balance. Note the following example for an explanation of the background: Assume the initial state on disk is as follows: PV0 (v1, non-ignored) PV1 (v1, non-ignored) PV2 (v1, non-ignored) PV3 (v1, non-ignored) If we did not sort the list, we would have a commit sequence something like this: PV0 (v2, non-ignored) PV1 (v2, ignored) PV2 (v2, ignored) PV3 (v2, non-ignored) After the commit of PV0's mdas, we'd have an on-disk state like this: PV0 (v2, non-ignored) PV1 (v1, non-ignored) PV2 (v1, non-ignored) PV3 (v1, non-ignored) This is an inconsistent state of the disk. If the machine fails, the next time it was brought back up, the auto-correct mechanism in vg_read would update the metadata on PV1-PV3. However, if possible we try to avoid inconsistent on-disk states. Clearly, because we did not sort, we have a greater chance of on-disk inconsistency - from the time the commit of PV0 is complete until the time PV3 is complete. We could improve the amount of time the on-disk state is consistent by simply sorting the commit order as follows: PV1 (v2, ignored) PV2 (v2, ignored) PV0 (v2, non-ignored) PV3 (v2, non-ignored) Thus, after the first PV is committed (in this case PV1), on-disk we would have: PV0 (v1, non-ignored) PV1 (v2, ignored) PV2 (v1, non-ignored) PV3 (v1, non-ignored) This is clearly a consistent state. PV1 will be read but the mda will be ignored. All other PVs contain v1 metadata, and no auto-correct will be required. In fact, if we commit all PVs with ignored mdas first, we'll only have an inconsistent state when we start writing non-ignored PVs, and thus the chances we'll get an inconsistent state on disk is much less with the sorted method. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:35:49 +00:00
Dave Wysochanski	b6ec0f9b92	Refactor vg_commit() to add _vg_commit_mdas(). Factor out calling mda->ops->vg_commit() for each mda. No functional change. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:35:33 +00:00
Dave Wysochanski	4e7f1a3a9d	Update _vg_read and _text_create_text_instance to use fid_add_mda[s]. When we are constructing the vg, we may need to adjust the list of metadata_areas if there are ignored mdas. At label read time, we do not read the metadata of ignored mdas, and as a result, they do not get placed on vg->fid->metadata_areas inside _text_create_text_instance since lvmcache does not have these areas attached to vginfo->infos. However, when we're checking the pvids inside _vg_read, after having read another metadata area from another PV, we do have the opportunity to update the metadata_area and metadata_areas_ignored lists based on the read metadata_area. We need accurate mda lists for the reporting functions that count the ignored mdas, as well as general correctness of mda balancing. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>	2010-06-28 20:35:17 +00:00

1 2 3 4 5 ...

773 Commits