shaba/lvm2 - lvm2 - Gitea: Git with a cup of tea

shaba/lvm2

mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00

Author	SHA1	Message	Date
Zdenek Kabelac	25e6ab87d8	Add thin_pool dm message support Experimental support for kernel message via resume sequence.	2011-10-17 14:16:25 +00:00
Zdenek Kabelac	5668fe04d9	Add _thin_validate_device_id	2011-10-17 14:15:26 +00:00
Zdenek Kabelac	5668fd6a7a	Swap parameters Use metadata uuid first (match kernel target).	2011-10-17 14:15:01 +00:00
Zdenek Kabelac	df6b1b8fe6	Drop old check for transaction_id (revert)	2011-10-17 14:14:33 +00:00
Milan Broz	ad2432dc68	Fix alignment warning in bitcount calculation for raid segment.	2011-10-17 13:15:35 +00:00
Jonathan Earl Brassow	a551de6152	Use a more correct macro for 'seg_is_linear' It is better to check 'seg->area_count == 1' than '!seg->stripe_size'.	2011-10-14 14:21:32 +00:00
Jonathan Earl Brassow	3b032963d5	cmirrord now returns log name to kernel in CTR so it can be registered Version 2 of the userspace log protocol accepts return information during the DM_ULOG_CTR exchange. The return information contains the name of the log device that is being used (if there is one). The kernel can then register the device via 'dm_get_device'. Amoung other things, this allows for userspace to assemble a correct dependency tree of devices - critical for LVM handling of suspend/resume calls. Also, update dm-log-userspace.h to match the kernel header associated with this protocol change. (Includes a version inc.)	2011-10-14 14:18:49 +00:00
Jonathan Earl Brassow	6635332e1b	Update stale libdm/misc/dm-log-userspace.h The upstream kernel version that this file mirrors has changed, here is the commit message: commit 86a54a4802df10d23ccd655e2083e812fe990243 Author: Jonathan Brassow <jbrassow@redhat.com> Date: Thu Jan 13 19:59:52 2011 +0000 dm log userspace: add version number to comms This patch adds a 'version' field to the 'dm_ulog_request' structure. The 'version' field is taken from a portion of the unused 'padding' field in the 'dm_ulog_request' structure. This was done to avoid changing the size of the structure and possibly disrupting backwards compatibility. The version number will help notify user-space daemons when a change has been made to the kernel/userspace log API. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-14 14:04:05 +00:00
Zdenek Kabelac	0395dd2250	Use pool for dm_tree allocation Using the same pool allocation strategy as we use for vg, so dm_tree structure is part of the pool itself.	2011-10-14 13:34:19 +00:00
Zdenek Kabelac	7f815706ca	Fix lv_info open_count test When verify_udev_operations was disable, code for stacking fs operation for lvm links was completely disable - but this code was also used for collecting information, that a new node is being created. Add a new flag which is set when a creation of lv symlinks is requested which should restore old behaviour of lv_info function, that has called fs_sync() before quere for open count on device.	2011-10-14 13:23:47 +00:00
Zdenek Kabelac	5ba3b21921	Remove unused variables	2011-10-11 10:06:57 +00:00
Zdenek Kabelac	7a6600b148	Use constant for the repeated dlid size specification	2011-10-11 10:02:28 +00:00
Zdenek Kabelac	1ba44957bf	Add some fixme locking Code here is using thread write protected variable without locking. So add locking, for proper synchronization and a FIXME, since the code needs closer look.	2011-10-11 09:56:44 +00:00
Zdenek Kabelac	8a706f836d	Simplify worker loop Do not reacquire mutex several times without a real reason. Code readability is also better.	2011-10-11 09:54:39 +00:00
Zdenek Kabelac	96de8adcc9	Use barrier instead of mutex Barrier is supposed to be used in situation like this and replace tricky mutex usage, where mutex has been unlocked by a different thread than the locking thread.	2011-10-11 09:26:04 +00:00
Zdenek Kabelac	61a45c7b3a	Add FIXMEs for init_test Usage of thread unprotected init_test is not correct and needs probably lvm lock since it part of lvm library. Current implementation may probably fail with test mode and actually create something unexpectedly (and vice versa).	2011-10-11 09:23:48 +00:00
Zdenek Kabelac	da0ec96159	Update	2011-10-11 09:20:17 +00:00
Zdenek Kabelac	0448a9265a	Limit thread stack Since default thread stack size is around 8MB and clvmd creates for now thread for message, clvmd may easily reach multi GB size of in-memory locked pages (runs with mlockall()). This patch significantly reduces memory usage to just tens of MB, and now different reasons are the cause of server overloading. Now we are running out of free file descriptors mostly.	2011-10-11 09:18:49 +00:00
Zdenek Kabelac	dde1ca1ef1	Update whats new	2011-10-11 09:14:51 +00:00
Zdenek Kabelac	57f4dfc653	Reduce preallocated stack size Go with just 64KiB for stack. Closer inspection should be made, whether we actually need to play with settings at all. Since default stack size is 8MB and gets mapped via page locking thus, it seems there is no big help with preallocation of stack to some value.	2011-10-11 09:13:39 +00:00
Zdenek Kabelac	d4f134b8f6	Check for refresh_filter failure Properly detect if the filters were refreshed properly. (May needs few more fixes ??) Filter refresh may fail because it may be out of free file descriptors when clvmd gets overloaded.	2011-10-11 09:09:00 +00:00
Zdenek Kabelac	8187aff8b9	Add missing log_error for alloc failure	2011-10-11 09:06:09 +00:00
Zdenek Kabelac	efe62a3411	Use condition instead of sleep Replace usleep with pthread condition to increase speed testing (for simplicity just 1 condition for all locks). Use thread mutex also for unlock resource (so it wakes up awaiting threads) Better check some error states and return error in fail case with unlocked mutex.	2011-10-11 09:05:20 +00:00
Zdenek Kabelac	df251f14dc	Use shorter way for if()	2011-10-11 09:03:33 +00:00
Zdenek Kabelac	3df790d9fd	Skip backtrace after log_error	2011-10-11 09:02:20 +00:00
Zdenek Kabelac	2abe28a8c6	Replace with debug Since the dm_tree_create already reports reason of error, use log_debug for this message.	2011-10-11 09:01:38 +00:00
Zdenek Kabelac	de75bc6688	Improve backtrace reporting Add <backtrace> so the function appears logged for the fail path.	2011-10-11 08:59:42 +00:00
Zdenek Kabelac	4007ac814f	Change message severity Using log_warn to report missing symlinks as warning, since the command itself returns as successful, we should not produce log_error(). log_warn is better fit here.	2011-10-11 08:57:13 +00:00
Zdenek Kabelac	409bf6e6d8	Skip r assignment Cosmetic, since r is already 0 for the error path, no need to assign it there, and r is assigned to 1 after switch command. Also makes the code more readable.	2011-10-11 08:54:01 +00:00
Zdenek Kabelac	5940327f3a	Reindent some thin functions	2011-10-11 08:51:56 +00:00
Zdenek Kabelac	a4c1c0d26f	Remove test for first_time with FIXME Workaround for the current code with big FIXME, since proper solution for pvmove needs to be developed. Commiting this only for the purpose to get cluster testing covered.	2011-10-11 08:51:02 +00:00
Jonathan Earl Brassow	f60175c308	Add the ability to convert LVs of "mirror" segtype to "raid1" segtype. Example: ~> lvconvert --type raid1 vg/mirror_lv Steps to convert "mirror" to "raid1" 1) Allocate a RAID metadata LV for each mirror image from the same PVs on which they are located. 2) Clear the metadata LVs. This involves writing LVM metadata, so we don't change any aspects of the mirror LV before this so that the user can easily remove LVs from the failed convert attempt while retaining the original mirror. 3) Remove the mirror log, if it exists. 4) Add metadata LVs to mirror LV 5) Rename mirror sub-lvs (s/mimage/rimage/) 6) Change flags and segtype from mirror to raid1	2011-10-07 14:56:01 +00:00
Jonathan Earl Brassow	d3582e0252	Add the ability to convert linear LVs to RAID1 Example: ~> lvconvert --type raid1 -m 1 vg/lv The following steps are performed to convert linear to RAID1: 1) Allocate a metadata device from the same PV as the linear device to provide the metadata/data LV pair required for all RAID components. 2) Allocate the required number of metadata/data LV pairs for the remaining additional images. 3) Clear the metadata LVs. This performs a LVM metadata update. 4) Create the top-level RAID LV and add the component devices. We want to make any failure easy to unwind. This is why we don't create the top-level LV and add the components until the last step. Should anything happen before that, the user could simply remove the unnecessary images. Also, we want to ensure that the metadata LVs are cleared before forming the array to prevent stale information from polluting the new array. A new macro 'seg_is_linear' was added to allow us to distinguish linear LVs from striped LVs.	2011-10-07 14:52:26 +00:00
Jonathan Earl Brassow	a80192b6a7	Allow 'nosync' extension of mirrors. This patch allows a mirror to be extended without an initial resync of the extended portion. It compliments the existing '--nosync' option to lvcreate. This action can be done implicitly if the mirror was created with the '--nosync' option, or explicitly if the '--nosync' option is used when extending the device. Here are the operational criteria: 1) A mirror created with '--nosync' should extend with 'nosync' implicitly [EXAMPLE]# lvs vg; lvextend -L +5G vg/lv ; lvs vg LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg Mwi-a-m- 5.00g lv_mlog 100.00 Extending 2 mirror images. Extending logical volume lv to 10.00 GiB Logical volume lv successfully resized LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg Mwi-a-m- 10.00g lv_mlog 100.00 2) The 'M' attribute ('M' signifies a mirror created with '--nosync', while 'm' signifies a mirror created w/o '--nosync') must be preserved when extending a mirror created with '--nosync'. See #1 for example of 'M' attribute. 3) A mirror created without '--nosync' should extend with 'nosync' only when '--nosync' is explicitly used when extending. [EXAMPLE]# lvs vg; lvextend -L +5G vg/lv; lvs vg LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg mwi-a-m- 20.00m lv_mlog 100.00 Extending 2 mirror images. Extending logical volume lv to 5.02 GiB Logical volume lv successfully resized LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg mwi-a-m- 5.02g lv_mlog 0.39 vs. [EXAMPLE]# lvs vg; lvextend -L +5G vg/lv --nosync; lvs vg LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg mwi-a-m- 20.00m lv_mlog 100.00 Extending 2 mirror images. Extending logical volume lv to 5.02 GiB Logical volume lv successfully resized LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg Mwi-a-m- 5.02g lv_mlog 100.00 4) The 'm' attribute must change to 'M' when extending a mirror created without '--nosync' is extended with the '--nosync' option. (See #3 examples above.) 5) An inactive mirror's sync percent cannot be determined definitively, so it must not be allowed to skip resync. Instead, the extend should ask the user if they want to extend while performing a resync. [EXAMPLE]# lvchange -an vg/lv [EXAMPLE]# lvextend -L +5G vg/lv Extending 2 mirror images. Extending logical volume lv to 10.00 GiB vg/lv is not active. Unable to get sync percent. Do full resync of extended portion of vg/lv? [y/n]: y Logical volume lv successfully resized 6) A mirror that is performing recovery (as opposed to an initial sync) - like after a failure - is not allowed to extend with either an implicit or explicit nosync option. [You can simulate this with a 'corelog' mirror because when it is reactivated, it must be recovered every time.] [EXAMPLE]# lvcreate -m1 -L 5G -n lv vg --nosync --corelog WARNING: New mirror won't be synchronised. Don't read what you didn't write! Logical volume "lv" created [EXAMPLE]# lvs vg LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg Mwi-a-m- 5.00g 100.00 [EXAMPLE]# lvchange -an vg/lv; lvchange -ay vg/lv; lvs vg LV VG Attr LSize Pool Origin Snap% Move Log Copy% Convert lv vg Mwi-a-m- 5.00g 0.08 [EXAMPLE]# lvextend -L +5G vg/lv Extending 2 mirror images. Extending logical volume lv to 10.00 GiB vg/lv cannot be extended while it is recovering. 7) If 'no' is selected in #5 or if the condition in #6 is hit, it should not result in the mirror being resized or the 'm/M' attribute being changed. NOTE: A mirror created with '--nosync' behaves differently than one created without it when performing an extension. The former cannot be extended when the mirror is recovering (unless in-active), while the latter can. This is a reasonable thing to do since recovery of a mirror doesn't take long (at least in the case of an on-disk log) and it would cause far more time in degraded mode if the extension w/o '--nosync' was allowed. It might be reasonable to add the ability to force the operation in the future. This should /not/ force a nosync extension, but rather force a sync'ed extension. IOW, the user would be saying, "Yes, yes... I know recovery won't take long and that I'll be adding significantly to the time spent in degraded mode, but I need the extra space right now!".	2011-10-06 15:32:26 +00:00
Jonathan Earl Brassow	b19f01212e	Fix splitmirror in cluster having different DM/LVM views of storage. This patch also does some clean-up of the splitmirrors code. I've attempted to clean-up the splitmirrors code to make it easier to understand with fewer operations. I've tried to reduce the number of metadata operations without compromising the intermediate stages which are necessary for easy clean-up in the even of failure. These changes now correctly handle cluster situations - including exclusive cluster mirrors. Whereas before, a splitmirror operation would result in remote nodes having LVM commands report the newly split LV with a proper name while DM commands would report the old (pre-split) names of the device. IOW, there was a kernel/userspace mismatch.	2011-10-06 14:55:39 +00:00
Jonathan Earl Brassow	6c0b0e5d9a	Revert initial solution to bug 733114 - I/O error message during splitmirror The original commit comments can be located via this git commit ID: `7d8e615c0b` There were three possible solutions to the original problem proposed in the initial check-in. The one chosen was as follows: 2) Do like _remove_mirror_images does and suspend the original, then suspend the sub-lv (the error target), then resume the sub-lv, and finally resume the original LV. This seems like extra pointless operations to me, but it doesn't produce the error message (although, I'm not sure why) and it allows us to leave the visible flag in place. Turns out, the cluster also views the extra suspend/resume operations as pointless too and ignores them. So, this solution doesn't work in a cluster. Further, I've noticed that in addition to the remote cluster nodes still getting I/O errors from scanning the error target, they also have a different LVM and DM views of the same LV. IOW, while the LVM level (gotten from the LVM metadata) sees the correct name for the newly split LV, device-mapper still maintains the old names. Because the original fix failed to completely fix the problem (or work-around it) and because a better solution must be found to address the additional cluster issue of device renaming, I am reverting the above mentioned commit.	2011-10-06 14:49:16 +00:00
Jonathan Earl Brassow	83c606ae30	This patch fixes issues with improper udev flags on sub-LVs. The current code does not always assign proper udev flags to sub-LVs (e.g. mirror images and log LVs). This shows up especially during a splitmirror operation in which an image is split off from a mirror to form a new LV. A mirror with a disk log is actually composed of 4 different LVs: the 2 mirror images, the log, and the top-level LV that "glues" them all together. When a 2-way mirror is split into two linear LVs, two of those LVs must be removed. The segments of the image which is not split off to form the new LV are transferred to the top-level LV. This is done so that the original LV can maintain its major/minor, UUID, and name. The sub-lv from which the segments were transferred gets an error segment as a transitory process before it is eventually removed. (Note that if the error target was not put in place, a resume_lv would result in two LVs pointing to the same segment! If the machine crashes before the eventual removal of the sub-LV, the result would be a residual LV with the same mapping as the original (now linear) LV.) So, the two LVs that need to be removed are now the log device and the sub-LV with the error segment. If udev_flags are not properly set, a resume will cause the error LV to come up and be scanned by udev. This causes I/O errors. Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races when we are trying to remove those LVs. This is especially bad during failure conditions. When the mirror is suspended, the top-level along with its sub-LVs are suspended. The changes (now 2 linear devices and the yet-to-be-removed log and error LV) are committed. When the resume takes place on the original LV, there are no longer links to the other sub-lvs through the LVM metadata. The links are implicitly handled by querying the kernel for a list of dependencies. This is done in the '_add_dev' function (which is recursively called for each dependency found) - called through the following chain: _add_dev dm_tree_add_dev_with_udev_flags <* DM / LVM divide *> _add_dev_to_dtree _add_lv_to_dtree _create_partial_dtree _tree_action dev_manager_activate _lv_activate_lv _lv_resume lv_resume_if_active When udev flags are calculated by '_get_udev_flags', it is done by referencing the 'logical_volume' structure. Those flags are then passed down into 'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'. Unfortunately, when '_add_dev' is finding the dependencies, it has no way to calculate their proper udev_flags. This is because it is below the DM/LVM divide - it doesn't have access to the logical_volume structure. In fact, '_add_dev' simply reuses the udev_flags given for the initial device! This virtually guarentees the udev_flags are wrong for all the dependencies unless they are reset by some other mechanism. The current code provides no such mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices - which it isn't - entries already in the tree are simply passed over, failing to reset any udev_flags. The solution must retain its implicit nature of discovering dependencies and be able to go back over the dependencies found to properly set the udev_flags. My solution simply calls a new function before leaving '_add_new_lv_to_dtree' that iterates over the dtree nodes to properly reset the udev_flags of any children. It is important that this function occur after the '_add_dev' has done its job of querying the kernel for a list of dependencies. It is this list of children that we use to look up their respective LVs and properly calculate the udev_flags. This solution has worked for single machine, cluster, and cluster w/ exclusive activation.	2011-10-06 14:45:40 +00:00
Jonathan Earl Brassow	a391248427	Fix vgsplit when there are mirrors that have mirrored logs. The problem as reported by "ben <benscott@nwlink.com>" on lvm-devel: vgsplit fails with mirrored mirror log #lvs --all -o lv_name,lv_attr,devices LV Attr Devices MyMirror mwi-- [MyMirror_mimage_0] Iwi--- /dev/sdq(0) [MyMirror_mimage_1] Iwi--- /dev/sdo(0) [MyMirror_mimage_2] Iwi--- /dev/sdi(0) [MyMirror_mlog] mwi--- [MyMirror_mlog_mimage_0] Iwi--- /dev/sds(0) [MyMirror_mlog_mimage_1] Iwi--- /dev/sde(0) #vgsplit -v "TestA" "TestB" "/dev/sdq" "/dev/sdo" "/dev/sdi" "/dev/sds" "/dev/sde" Checking for volume group "TestA" Checking for new volume group "TestB" Archiving volume group "TestA" metadata (seqno 213). Can't split mirror MyMirror between two Volume Groups AFTER FIX: [root@bp-01 ~]# lvs -a -o name,vg_name,devices vg new Volume group "new" not found Skipping volume group new LV VG Devices lv vg lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] vg /dev/sdb1(0) [lv_mimage_1] vg /dev/sdc1(0) [lv_mlog] vg lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] vg /dev/sdh1(0) [lv_mlog_mimage_1] vg /dev/sdi1(0) [root@bp-01 ~]# vgsplit vg new /dev/sd[bchi]1 New volume group "new" successfully split from "vg" [root@bp-01 ~]# lvs -a -o name,vg_name,devices vg new LV VG Devices lv new lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] new /dev/sdb1(0) [lv_mimage_1] new /dev/sdc1(0) [lv_mlog] new lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] new /dev/sdh1(0) [lv_mlog_mimage_1] new /dev/sdi1(0)	2011-10-06 14:17:45 +00:00
Zdenek Kabelac	151ed8d935	Add more validation to config parser Do not leave it for vgvalidate().	2011-10-06 11:06:36 +00:00
Zdenek Kabelac	565a4bfc49	Move defines to header Make limits for thin data_block_size and device_id part of public API. FIXME: read them possible from some kernel header file in the future ? But we may need to support different values for different versions ?	2011-10-06 11:05:56 +00:00
Alasdair Kergon	ad9c59e2e9	Clarify multi-name device filter pattern matching explanation in lvm.conf.5.	2011-10-04 20:49:24 +00:00
Alasdair Kergon	c540e5d1b8	Clarify multi-name device filter pattern matching explanation in lvm.conf.5.	2011-10-04 20:45:36 +00:00
Zdenek Kabelac	460c599143	Name changes typo zeroeing->zeroing add size low_water_mark->low_water_mark_size so it's more obvious its sector related variable.	2011-10-04 16:22:38 +00:00
Zdenek Kabelac	c0b9c64a77	Use capital letters	2011-10-04 12:39:59 +00:00
Zdenek Kabelac	01ef6510b0	Missed rename pool->thin_pool Fix compilation	2011-10-03 19:10:52 +00:00
Zdenek Kabelac	04a4715cb8	Add code to activate thin target Code to zero pool metadata lv when pool is created. Add code to create thin target via message sending. (Revert is missing)	2011-10-03 18:43:39 +00:00
Zdenek Kabelac	d35a117e4b	Add simple function for lookup of some free device_id Initial simple implementation for finding some free device_id.	2011-10-03 18:39:17 +00:00
Zdenek Kabelac	a00cb3a6b0	Add lvm functions for sending messages. Functions are currently only needed for thin provissioning.	2011-10-03 18:37:47 +00:00
Zdenek Kabelac	e0ea24be1f	Add intial code to check transaction_id Fix typy in transaction_id. Add this as node property, so it could be easily checked on resume. Code is not yet finished.	2011-10-03 18:34:52 +00:00
Zdenek Kabelac	97bde15a9f	Display transaction_id for thin_pool	2011-10-03 18:31:03 +00:00

1 2 3 4 5 ...

5999 Commits